Loïck BOURDOIS (@bdsloick) Twitter Tweets • TwiCopy

Loïck BOURDOIS

@bdsloick

+ Follow

FAT5 boy
@huggingface Fellow 🤗

ID: 1446784639113306118

linkhttps://lbourdois.github.io/blog/ calendar_today09-10-2021 10:28:50

253 Tweet

223 Takipçi

180 Takip Edilen

Eran Malach

@eranmalach

2 months ago

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧵

thumb_up_off_alt124

chat_bubble_outline1

repeat32

shareShare

Guilherme Penedo

@gui_penedo

2 months ago

New dataset release: 🌐FineWiki This is an updated and better extracted version of Wikipedia, covering 325+ languages. Unlike the old dataset from 2023, we kept all the math content, tables, properly rendered templates, and extracted key facts. Examples and highlights below.

thumb_up_off_alt550

chat_bubble_outline17

repeat77

shareShare

tomaarsen

@tomaarsen

a month ago

🤗 Sentence Transformers is joining Hugging Face! 🤗 This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer! Details in 🧵

🤗 Sentence Transformers is joining <a href="/huggingface/">Hugging Face</a>! 🤗

This formalizes the existing maintenance structure, as I've personally led the project for the past two years on behalf of Hugging Face. I'm super excited about the transfer!

Details in 🧵

thumb_up_off_alt359

chat_bubble_outline24

repeat46

shareShare

Shayne Longpre

@shayneredford

a month ago

📢Thrilled to introduce ATLAS 🗺️: scaling laws beyond English, for pretraining, finetuning, and the curse of multilinguality. The largest public, multilingual scaling study to-date—we ran 774 exps (10M-8B params, 400+ languages) to answer: 🌍Are scaling laws different by

thumb_up_off_alt131

chat_bubble_outline6

repeat37

shareShare

Lewis Tunstall

@_lewtun

a month ago

We've just published the Smol Training Playbook: a distillation of hard earned knowledge to share exactly what it takes to train SOTA LLMs ⚡️ Featuring our protagonist SmolLM3, we cover: 🧭 Strategy on whether to train your own LLM and burn all your VC money 🪨 Pretraining,

thumb_up_off_alt453

chat_bubble_outline20

repeat84

shareShare

João Maria Janeiro

@joaomjaneiro

a month ago

🚨New Paper AI at Meta 🚨 You want to train a largely multilingual model, but languages keep interfering and you can’t boost performance? Using a dense model is suboptimal when mixing many languages, so what can you do? You can use our new architecture Mixture of Languages! 🧵1/n

🚨New Paper <a href="/AIatMeta/">AI at Meta</a> 🚨
You want to train a largely multilingual model, but languages keep interfering and you can’t boost performance? Using a dense model is suboptimal when mixing many languages, so what can you do?

You can use our new architecture Mixture of Languages!
🧵1/n

thumb_up_off_alt22

chat_bubble_outline3

repeat11

shareShare