slime (@slime_framework) 's Twitter Profile
slime

@slime_framework

The LLM post-training framework for RL Scaling. github.com/THUDM/slime

ID: 1964369157199482880

calendar_today06-09-2025 16:44:52

10 Tweet

125 Followers

3 Following

slime (@slime_framework) 's Twitter Profile Photo

Remember the MoE “Routing Replay” trick in the GSPO paper? slime is the first framework to ship it — just flip --use-routing-replay. PR: github.com/THUDM/slime/pu…

slime (@slime_framework) 's Twitter Profile Photo

We added fault-tolerant rollouts to slime—recover from transient failures without nuking your run. PR: github.com/THUDM/slime/pu…

slime (@slime_framework) 's Twitter Profile Photo

slime image upgraded to sglang v0.5.4.post1 - we now have an initial implementation of training and updating MTP during RL. - with the latest torch_memory_saver, we can now offload the draft model — previously it had to stay on GPU.

slime (@slime_framework) 's Twitter Profile Photo

Ant AQ-Team AQ-MedAI InclusionAI and SGLang RL Team SGLang just helped land Kimi-K2-Instruct RL on slime — fully wired up and running on 256× H20 141GB 🚀 Huge shout-out to yngao, yzlnew, 汉松 from AQ Team and Ji Li, Yefei Chen from the SGLang RL Team for

slime (@slime_framework) 's Twitter Profile Photo

Super excited to see RLVE built on slime: 400 adaptive, verifiable environments that keep RL at the capability frontier!

slime (@slime_framework) 's Twitter Profile Photo

We just got ~100× faster GAE by borrowing ideas from chunked linear attention and turning GAE into a chunked scan problem. Code: github.com/THUDM/slime/p/… Detailed write-up (Chinese): zhuanlan.zhihu.com/p/197523728942…

slime (@slime_framework) 's Twitter Profile Photo

slime v0.2.0 is here 🎉 Huge thanks to all contributors & users who pushed this release forward ❤️ Highlights: • New FSDP training backend • Full-stack FP8 (train + infer) & MTP training during RL • Tools to reduce train–infer mismatch: custom IS, routing replay(R2/R3), true

Yiping Wang (@ypwang61) 's Twitter Profile Photo

8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀! ⭕Circle packing: AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276 Ours (DeepSeek-R1-0528-Qwen3-8B) : 2.63598308 🔗in🧵 [1/n]

8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀!

⭕Circle packing:
AlphaEvolve (Gemini-2.0-Flash/Pro)
  : 2.63586276
Ours (DeepSeek-R1-0528-Qwen3-8B)
  : 2.63598308

🔗in🧵
[1/n]
slime (@slime_framework) 's Twitter Profile Photo

We’ve added SGLang PD disaggregation to slime! Use --prefill-num-servers to split prefill and decode servers, making multi-turn RL rollouts more controllable under heavy prefill load. github.com/THUDM/slime/pu…