Loïck BOURDOIS (@bdsloick) 's Twitter Profile
Loïck BOURDOIS

@bdsloick

FAT5 boy
@huggingface Fellow 🤗

ID: 1446784639113306118

linkhttps://lbourdois.github.io/blog/ calendar_today09-10-2021 10:28:50

253 Tweet

223 Takipçi

180 Takip Edilen

Eran Malach (@eranmalach) 's Twitter Profile Photo

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧵

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. 
Arxiv: arxiv.org/pdf/2510.14826
🧵
Guilherme Penedo (@gui_penedo) 's Twitter Profile Photo

New dataset release: 🌐FineWiki This is an updated and better extracted version of Wikipedia, covering 325+ languages. Unlike the old dataset from 2023, we kept all the math content, tables, properly rendered templates, and extracted key facts. Examples and highlights below.

New dataset release: 🌐FineWiki

This is an updated and better extracted version of Wikipedia, covering 325+ languages.

Unlike the old dataset from 2023, we kept all the math content, tables, properly rendered templates, and extracted key facts.

Examples and highlights below.
tomaarsen (@tomaarsen) 's Twitter Profile Photo

🤗 Sentence Transformers is joining Hugging Face! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer! Details in 🧵

🤗 Sentence Transformers is joining <a href="/huggingface/">Hugging Face</a>! 🤗 

This formalizes the existing maintenance structure, as I've  personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer!

Details in 🧵
Shayne Longpre (@shayneredford) 's Twitter Profile Photo

📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by

📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality.

The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer:

🌍Are scaling laws different by
Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,

We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️

Featuring our protagonist SmolLM3, we cover:

🧭 Strategy on whether to train your own LLM and burn all your VC money

🪨 Pretraining,
João Maria Janeiro (@joaomjaneiro) 's Twitter Profile Photo

🚨New Paper AI at Meta 🚨 You want to train a largely multilingual model, but languages keep interfering and you can’t boost performance? Using a dense model is suboptimal when mixing many languages, so what can you do? You can use our new architecture Mixture of Languages! 🧵1/n

🚨New Paper <a href="/AIatMeta/">AI at Meta</a> 🚨
You want to train a largely multilingual model, but languages keep interfering and you can’t boost performance? Using a dense model is suboptimal when mixing many languages, so what can you do?

You can use our new architecture Mixture of Languages!
🧵1/n