Shaoxiong Ji (@shaoxiongji) 's Twitter Profile
Shaoxiong Ji

@shaoxiongji

ID: 902775078122733568

calendar_today30-08-2017 06:08:46

13 Tweet

90 Takipçi

308 Takip Edilen

HPLT (@hplt_eu) 's Twitter Profile Photo

First HPLT release is out 🥳! - monolingual collection = 75 languages, 7.6 TB of documents, JSONL compressed files - bilingual collection = 18 language pairs, 96M sentence pairs, 1.4 billion English tokens, TMX and TXT compressed formats Check it out! hplt-project.org/datasets/

HPLT (@hplt_eu) 's Twitter Profile Photo

First datasets, then models! Initial HPLT models (LLMs and MT) are out: hplt-project.org/models, some still running 🏃 We explain what we are doing in the deliverables section: hplt-project.org/deliverables Meanwhile, we keep cooking IA peta-data-bytes 🥘, enriching, dashboarding 📊

Shaoxiong Ji (@shaoxiongji) 's Twitter Profile Photo

🚀 Exciting Announcement! Introducing the #HPLT language resources – a massive multilingual dataset from Common Crawl Foundation & Internet Archive, featuring monolingual & bilingual corpora. Our collection spans 75 languages with ≈5.6 trillion word tokens! 🌐 #LLMs #NLP

Shaoxiong Ji (@shaoxiongji) 's Twitter Profile Photo

🚀 We're thrilled to release a new model checkpoint for MaLA-500, an LLM spanning 534 languages! 🔥 It fills the gap for low-resource languages, outperforming existing models. 📜paper arxiv.org/abs/2401.13303… 💻model huggingface.co/MaLA-LM/mala-5… HPLT UTTER #LLMs #MaLA500

FeYuan (@t_feyuan) 's Twitter Profile Photo

🚀Exciting new! Introducing LLaMAX, a powerful LLM with enhanced translation performance across all 101 languages. 🔥 LLaMAX provides a better starting point for multilingual tasks and lots of analysis on the multilingual continual pre-training. huggingface.co/papers/2407.05…

🚀Exciting new!

Introducing LLaMAX, a powerful LLM with enhanced translation performance across all 101 languages.

🔥   LLaMAX provides a better starting point for multilingual tasks and lots of analysis on the multilingual continual pre-training.

huggingface.co/papers/2407.05…
UKP Lab (@ukplab) 's Twitter Profile Photo

🚀 Shape the future of mental health research! 🚀 Are you passionate about making a difference in the field of mental health, through cutting-edge research in #AI and #NLProc? Join a new independent research group at TU Darmstadt as PhD candidate! 🧠 informatik.tu-darmstadt.de/ukp/ukp_home/j…

🚀 Shape the future of mental health research! 🚀

Are you passionate about making a difference in the field of mental health, through cutting-edge research in #AI and #NLProc? Join a new independent research group at <a href="/TUDarmstadt/">TU Darmstadt</a> as PhD candidate!

🧠 informatik.tu-darmstadt.de/ukp/ukp_home/j…
Helsinki-NLP (@helsinkinlp) 's Twitter Profile Photo

🚀 Excited to introduce EMMA-500! 🌍✨ A multilingual model continue-trained on 546 languages, enhancing coverage for low-resource languages. With the MaLA corpus and Llama 2 7B, we're pushing boundaries in cross-lingual transfer. Check it out: huggingface.co/MaLA-LM

HPLT (@hplt_eu) 's Twitter Profile Photo

🚀 INTRODUCING THE LATEST HPLT MONOLINGUAL DATASETS! TL;DR: 🔍 4.5 PB of web crawls 📄 21 billion documents 💝 careful extraction, dedup, annotation and cleaning 💥 193 languages! Explore and download the new HPLT Monolingual Datasets NOW! hplt-project.org/datasets/v2.0 #HPLT

Timothee Mickus (@linguistickus) 's Twitter Profile Photo

Work from Zihao Li, with the help of Shaoxiong Ji vs and Jörg Tiedemann — how good is machine translation as a pretraining multilingual objective? Looking forward to the #EMNLP2024 poster session, Nov 12, 14h00!

Shaoxiong Ji (@shaoxiongji) 's Twitter Profile Photo

I'll be moving to ELLIS Institute Finland and the University of Turku this fall and looking to hire 2 PhD researchers and 1 Postdoctoral Researcher to work on exciting topics related to Large Language Models, AI for Health, or Multimodal Learning. ats.talentadore.com/apply/a-postdo…