Manuel Mager (Turatemai) (@pywirrarika) 's Twitter Profile
Manuel Mager (Turatemai)

@pywirrarika

Applied Scientist | Amazon AWS
Posts are my own opinion.

ID: 223694721

linkhttp://code.kiutz.com calendar_today07-12-2010 02:33:19

3,3K Tweet

939 Takipçi

1,1K Takip Edilen

Unsloth AI (@unslothai) 's Twitter Profile Photo

You can now run Qwen3-235B-A22B-2507 with our Dynamic 2-bit GGUFs! The full 250GB model gets reduced to just 88GB (-65% size). Achieve >5 tokens/s on 89GB unified memory or 80GB RAM + 8GB VRAM. GGUFs: huggingface.co/unsloth/Qwen3-…

You can now run Qwen3-235B-A22B-2507 with our Dynamic 2-bit GGUFs!

The full 250GB model gets reduced to just 88GB (-65% size).

Achieve >5 tokens/s on 89GB unified memory or 80GB RAM + 8GB VRAM.

GGUFs: huggingface.co/unsloth/Qwen3-…
UW NLP (@uwnlp) 's Twitter Profile Photo

Fascinating work bridging cognitive science + NLP! PrefPalette decomposes preferences into interpretable attributes (humor, empathy, formality) with dynamic weighting. 46.6% better than GPT-4o with explainability. This opens new directions in alignment and personalization.

Mihir Prabhudesai (@mihirp98) 's Twitter Profile Photo

🚨 The era of infinite internet data is ending, So we ask: 👉 What’s the right generative modelling objective when data—not compute—is the bottleneck? TL;DR: ▶️Compute-constrained? Train Autoregressive models ▶️Data-constrained? Train Diffusion models Get ready for 🤿 1/n

🚨 The era of infinite internet data is ending, So we ask:

👉 What’s the right generative modelling objective when data—not compute—is the bottleneck?

TL;DR:

▶️Compute-constrained? Train Autoregressive models

▶️Data-constrained? Train Diffusion models

Get ready for 🤿  1/n
Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet! Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving: ✅ Improved performance in logical reasoning, math, science & coding

🚀 We’re excited to introduce Qwen3-235B-A22B-Thinking-2507 — our most advanced reasoning model yet!

Over the past 3 months, we’ve significantly scaled and enhanced the thinking capability of Qwen3, achieving:
✅ Improved performance in logical reasoning, math, science & coding
Chujie Zheng (@chujiezheng) 's Twitter Profile Photo

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀

📄 huggingface.co/papers/2507.18…
Yoshinari Fujinuma (@akkikiki) 's Twitter Profile Photo

StanfordのMixture of Experts (MoE)の講義は超おすすめ(5回ぐらいヘビロテした)/ Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4:... youtu.be/LPv1KfUXLCo?si… via YouTube

SomosNLP (@somosnlp_) 's Twitter Profile Photo

📢🧵 Thread | Celebrating Ibero-American NLP at #ACL2025! 🇪🇸🇵🇹🇲🇽🇨🇴🇧🇷🇦🇷 ... 🌎 The SomosNLP @ACL  community is showing up strong at ACL 2025 ! We want to highlight all the papers from our vibrant community 👇 #NLProc #NLP #IberoAmerica

📢🧵 Thread | Celebrating Ibero-American NLP at #ACL2025! 🇪🇸🇵🇹🇲🇽🇨🇴🇧🇷🇦🇷 ... 🌎

The <a href="/SomosNLP_/">SomosNLP @ACL</a>  community is showing up strong at <a href="/aclmeeting/">ACL 2025</a> !

We want to highlight all the papers from our vibrant community 👇

#NLProc #NLP #IberoAmerica
Muhammad AbdulMageed (@mageed) 's Twitter Profile Photo

What if the future of AI was fundamentally inequitable for an entire continent? This is the critical question we pose in our latest work (#ACL2025). We undertake a comprehensive empirical evaluation of leading LLMs on Sahara, our extensive benchmark that we collect using mostly

What if the future of AI was fundamentally inequitable for an entire continent? 

This is the critical question we pose in our latest work (#ACL2025). We undertake a comprehensive empirical evaluation of leading LLMs on Sahara, our extensive benchmark that we collect using mostly
David Ifeoluwa Adelani 🇳🇬 (@davlanade) 's Twitter Profile Photo

Today, we are presenting 3 papers #ACL2025: 1) Injongo: A multicultural intent detection for African lang. Room 1.15-16 (Multilingualism)@ 14:00 2) BRIGHTER: Emotion classification(Nedjma Ousidhoum نجمة أوسيدهم @ACL2025) Hall B (resources) @ 14:00 3) Global MMLU (Shivalika Singh Sara Hooker) poster @ 10:30

Qwen (@alibaba_qwen) 's Twitter Profile Photo

🦥 Qwen3-Coder-Flash: Qwen3-Coder-30B-A3B-Instruct 💚 Just lightning-fast, accurate code generation. ✅ Native 256K context (supports up to 1M tokens with YaRN) ✅ Optimized for platforms like Qwen Code, Cline, Roo Code, Kilo Code, etc. ✅ Seamless function calling & agent

🦥 Qwen3-Coder-Flash: Qwen3-Coder-30B-A3B-Instruct
💚 Just lightning-fast, accurate code generation.
✅ Native 256K context (supports up to 1M tokens with YaRN)
✅ Optimized for platforms like Qwen Code, Cline, Roo Code, Kilo Code, etc.
✅ Seamless function calling &amp; agent
Yoshua Bengio (@yoshua_bengio) 's Twitter Profile Photo

Pleased to see this new Alignment Project, where I serve as an expert advisor, launched by the UK's AI Security Institute and supported by the Canadian AI Safety Institute and many others. I encourage my fellow researchers to apply for funding, compute and support from int’l experts.

Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Long-form AI reading is back and we’ve just dropped the ultimate summer read. Inspired by the likes of Stripe Press, we’re proud to announce the first book from HF Press: a carefully crafted, book-length PDF edition of the Ultra-Scale Playbook. Over 200 dense pages to learn the

Long-form AI reading is back and we’ve just dropped the ultimate summer read.

Inspired by the likes of Stripe Press, we’re proud to announce the first book from HF Press: a carefully crafted, book-length PDF edition of the Ultra-Scale Playbook.

Over 200 dense pages to learn the
Shruti (@heyshrutimishra) 's Twitter Profile Photo

AI Industry Made $57 Billion Mistake and No One’s Talking About It. While GPT-5 headlines kept you distracted... NVIDIA quietly released a bold claim: → Small Language Models (SLMs) are the future of AI agents Cheaper, faster and just as capable for 80% of real-world tasks.

AI Industry Made $57 Billion Mistake and No One’s Talking About It.

While GPT-5 headlines kept you distracted...

NVIDIA quietly released a bold claim:
→ Small Language Models (SLMs) are the future of AI agents

Cheaper, faster and just as capable for 80% of real-world tasks.
Sebastian Raschka (@rasbt) 's Twitter Profile Photo

So, I did some coding this week... - Qwen3 Coder Flash (30B-A3B) - Mixture-of-Experts setup with 128 experts, 8 active per token - In pure PyTorch (optimized for human readability) - in a standalone Jupyter notebook - Runs on a single A100

So, I did some coding this week...
- Qwen3 Coder Flash (30B-A3B)
- Mixture-of-Experts setup with 128 experts, 8 active per token
- In pure PyTorch (optimized for human readability)
- in a standalone Jupyter notebook
- Runs on a single A100