Anton Lozhkov (@anton_lozhkov) Twitter Tweets • TwiCopy

✨NEW in Hugging Face Datasets v3.3 🔥 Process datasets using async functions in .map() ! Crazy useful to use AI models like R1 from DeepSeek ...maybe to fine-tune smaller models later ? Screenshot of the full colab in comments

✨NEW in <a href="/huggingface/">Hugging Face</a> Datasets v3.3 🔥

Process datasets using async functions in .map() !
Crazy useful to use AI models like R1 from <a href="/deepseek_ai/">DeepSeek</a>

...maybe to fine-tune smaller models later ?

Screenshot of the full colab in comments

thumb_up_off_alt95

chat_bubble_outline4

repeat15

shareShare

Loubna Ben Allal

@loubnabenallal1

5 months ago

🚀 New dataset drop: DCLM-Edu We filtered DCLM using FineWeb-Edu’s classifier to create a cleaner dataset optimized for smol models (like SmolLM2 135M/360M). Why? Small models are sensitive to noise and can benefit from heavily curated data.

thumb_up_off_alt176

chat_bubble_outline5

repeat38

shareShare

Quentin Gallouédec

@qgallouedec

5 months ago

Have we found a way to beat DeepSeek-R1? 💣 Check hf.co/blog/open-r1/u… 🧵[0/10] Let's dive into our latest progress in Open R1.

thumb_up_off_alt267

chat_bubble_outline6

repeat38

shareShare

Leandro von Werra

@lvwerra

5 months ago

Introducing: ⚡️OlympicCoder⚡️ Beats Claude 3.7 and is close to o1-mini/R1 on olympiad level coding with just 7B parameters! Let that sink in! Read more about its training dataset, the new IOI benchmark, and more in Open-R1 progress report #3.

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat217

shareShare

elie

@eliebakouch

5 months ago

Gemma3 technical report detailed analysis 💎 1) Architecture choices: > No more softcaping, replace by QK-Norm > Both Pre AND Post Norm > Wider MLP than Qwen2.5, ~ same depth > SWA with 5:1 and 1024 (very small and cool ablation on the paper!) > No MLA to save KV cache, SWA do

thumb_up_off_alt481

chat_bubble_outline12

repeat93

shareShare

Lewis Tunstall

@_lewtun

5 months ago

It's pretty outrageous that a 250M parameter model can correctly convert screenshots of quantum field theory equations to LaTeX 🤯 Wish I had this when I was a student!

thumb_up_off_alt663

chat_bubble_outline20

repeat70

shareShare

Loubna Ben Allal

@loubnabenallal1

5 months ago

Build your code assistant at home with our new code pretraining datasets: 📚 Stack-Edu – 125B tokens of educational code across 15 programming languages, aka the FineWeb-Edu of code 🐛 GitHub Issues – 11B tokens of discussions from GitHub issues 📊 Kaggle Notebooks – 2B tokens

thumb_up_off_alt229

chat_bubble_outline7

repeat45

shareShare

Thomas Wolf

@thom_wolf

5 months ago

Generating high-quality code is the basis for code assistants but also for almost all Agentic-AI approaches That's why I'm very excited to see 2025 starting to be the year of high-performance code generation in *open-source* LLMs After our latest release 'OlympicCoder' beat

thumb_up_off_alt189

chat_bubble_outline7

repeat30

shareShare

Hugo Larcher

@hugoch

4 months ago

🧠 LLM inference isn’t just about latency — it’s about consistency under load. Different workloads, configs, and hardware = very different real-world performances. At Hugging Face 🤗 we built inference-benchmarker — a simple tool to stress-test LLM inference servers. 🧵 (1/2)

thumb_up_off_alt39

chat_bubble_outline2

repeat13

shareShare

Georgia Channing

@cgeorgiaw

3 months ago

It's all well and good that OpenAI acquired Windsurf for $3 billion—probably for their massive repository of source code data. But have you heard of BigCode? 🧵 Here's why the BigCode matters:

thumb_up_off_alt15

chat_bubble_outline2

repeat2

shareShare

tenderizzation

@tenderizzation

2 months ago

NCCL sending the loss value from the last pipeline parallel stage back to rank 0 so the user can print it

thumb_up_off_alt321

chat_bubble_outline6

repeat20

shareShare