Anton Lozhkov (@anton_lozhkov) 's Twitter Profile
Anton Lozhkov

@anton_lozhkov

Open-sourcing Language Models @huggingface ✨

ID: 2982581051

calendar_today17-01-2015 14:25:23

438 Tweet

2,2K Followers

322 Following

Quentin Lhoest 🤗 (@lhoestq) 's Twitter Profile Photo

✨NEW in Hugging Face Datasets v3.3 🔥 Process datasets using async functions in .map() ! Crazy useful to use AI models like R1 from DeepSeek ...maybe to fine-tune smaller models later ? Screenshot of the full colab in comments

✨NEW in <a href="/huggingface/">Hugging Face</a> Datasets v3.3 🔥

Process datasets using async functions in .map() !
Crazy useful to use AI models like R1 from <a href="/deepseek_ai/">DeepSeek</a> 

...maybe to fine-tune smaller models later ?

Screenshot of the full colab in comments
Loubna Ben Allal (@loubnabenallal1) 's Twitter Profile Photo

🚀 New dataset drop: DCLM-Edu We filtered DCLM using FineWeb-Edu’s classifier to create a cleaner dataset optimized for smol models (like SmolLM2 135M/360M). Why? Small models are sensitive to noise and can benefit from heavily curated data.

🚀 New dataset drop: DCLM-Edu

We filtered DCLM using FineWeb-Edu’s classifier to create a cleaner dataset optimized for smol models (like SmolLM2 135M/360M).

Why? Small models are sensitive to noise and can benefit from heavily curated data.
Quentin Gallouédec (@qgallouedec) 's Twitter Profile Photo

Have we found a way to beat DeepSeek-R1? 💣 Check hf.co/blog/open-r1/u… 🧵[0/10] Let's dive into our latest progress in Open R1.

Leandro von Werra (@lvwerra) 's Twitter Profile Photo

Introducing: ⚡️OlympicCoder⚡️ Beats Claude 3.7 and is close to o1-mini/R1 on olympiad level coding with just 7B parameters! Let that sink in! Read more about its training dataset, the new IOI benchmark, and more in Open-R1 progress report #3.

Introducing: ⚡️OlympicCoder⚡️

Beats Claude 3.7 and is close to o1-mini/R1 on olympiad level coding with just 7B parameters! Let that sink in!

Read more about its training dataset, the new IOI benchmark, and more in Open-R1 progress report #3.
elie (@eliebakouch) 's Twitter Profile Photo

Gemma3 technical report detailed analysis 💎 1) Architecture choices: > No more softcaping, replace by QK-Norm > Both Pre AND Post Norm > Wider MLP than Qwen2.5, ~ same depth > SWA with 5:1 and 1024 (very small and cool ablation on the paper!) > No MLA to save KV cache, SWA do

Gemma3 technical report detailed analysis 💎

1) Architecture choices:
&gt; No more softcaping, replace by QK-Norm
&gt; Both Pre AND Post Norm
&gt; Wider MLP than Qwen2.5, ~ same depth
&gt; SWA with 5:1 and 1024 (very small and cool ablation on the paper!)
&gt; No MLA to save KV cache, SWA do
Lewis Tunstall (@_lewtun) 's Twitter Profile Photo

It's pretty outrageous that a 250M parameter model can correctly convert screenshots of quantum field theory equations to LaTeX 🤯 Wish I had this when I was a student!

It's pretty outrageous that a 250M parameter model can correctly convert screenshots of quantum field theory equations to LaTeX 🤯

Wish I had this when I was a student!
Loubna Ben Allal (@loubnabenallal1) 's Twitter Profile Photo

Build your code assistant at home with our new code pretraining datasets: 📚 Stack-Edu – 125B tokens of educational code across 15 programming languages, aka the FineWeb-Edu of code 🐛 GitHub Issues – 11B tokens of discussions from GitHub issues 📊 Kaggle Notebooks – 2B tokens

Build your code assistant at home with our new code pretraining datasets:

📚 Stack-Edu – 125B tokens of educational code across 15 programming languages, aka the FineWeb-Edu of code
🐛 GitHub Issues – 11B tokens of discussions from GitHub issues
📊 Kaggle Notebooks – 2B tokens
Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Generating high-quality code is the basis for code assistants but also for almost all Agentic-AI approaches That's why I'm very excited to see 2025 starting to be the year of high-performance code generation in *open-source* LLMs After our latest release 'OlympicCoder' beat

Generating high-quality code is the basis for code assistants but also for almost all Agentic-AI approaches 

That's why I'm very excited to see 2025 starting to be the year of high-performance code generation in *open-source* LLMs

After our latest release 'OlympicCoder' beat
Hugo Larcher (@hugoch) 's Twitter Profile Photo

🧠 LLM inference isn’t just about latency — it’s about consistency under load. Different workloads, configs, and hardware = very different real-world performances. At Hugging Face 🤗 we built inference-benchmarker — a simple tool to stress-test LLM inference servers. 🧵 (1/2)

Georgia Channing (@cgeorgiaw) 's Twitter Profile Photo

It's all well and good that OpenAI acquired Windsurf for $3 billion—probably for their massive repository of source code data. But have you heard of BigCode? 🧵 Here's why the BigCode matters: