slime (@slime_framework) Twitter Tweets • TwiCopy

slime

@slime_framework

3 months ago

This is amazing! One day we hope slime will power online RL at scale—still a long road ahead, but we’re on it.

thumb_up_off_alt4

chat_bubble_outline2

repeat0

shareShare

slime

@slime_framework

3 months ago

Remember the MoE “Routing Replay” trick in the GSPO paper? slime is the first framework to ship it — just flip --use-routing-replay. PR: github.com/THUDM/slime/pu…

thumb_up_off_alt16

chat_bubble_outline0

repeat5

shareShare

slime

@slime_framework

3 months ago

We added fault-tolerant rollouts to slime—recover from transient failures without nuking your run. PR: github.com/THUDM/slime/pu…

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

slime

@slime_framework

3 months ago

Proudly powered by slime.

thumb_up_off_alt20

chat_bubble_outline1

repeat3

shareShare

slime

@slime_framework

2 months ago

We just bumped our latest slime image to SGLang v0.5.3post3 and Megatron-LM 0.14.0.

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

slime image upgraded to sglang v0.5.4.post1 - we now have an initial implementation of training and updating MTP during RL. - with the latest torch_memory_saver, we can now offload the draft model — previously it had to stay on GPU.

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

slime

@slime_framework

a month ago

Ant AQ-Team AQ-MedAI InclusionAI and SGLang RL Team SGLang just helped land Kimi-K2-Instruct RL on slime — fully wired up and running on 256× H20 141GB 🚀 Huge shout-out to yngao, yzlnew, 汉松 from AQ Team and Ji Li, Yefei Chen from the SGLang RL Team for

thumb_up_off_alt76

chat_bubble_outline0

repeat9

shareShare

slime

@slime_framework

a month ago

Super excited to see RLVE built on slime: 400 adaptive, verifiable environments that keep RL at the capability frontier!

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

slime

@slime_framework

a month ago

On-policy RL without the mystery gap. Thanks to the SGLang team LMSYS Org, slime now match training and inference bit-for-bit (0 KL)!

thumb_up_off_alt26

chat_bubble_outline0

repeat5

shareShare

slime

@slime_framework

a month ago

We just got ~100× faster GAE by borrowing ideas from chunked linear attention and turning GAE into a chunked scan problem. Code: github.com/THUDM/slime/p/… Detailed write-up (Chinese): zhuanlan.zhihu.com/p/197523728942…

thumb_up_off_alt21

chat_bubble_outline0

repeat7

shareShare

slime

@slime_framework

24 days ago

We just added amem support to further save memory! github.com/THUDM/slime/pu…

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

slime

@slime_framework

23 days ago

slime v0.2.0 is here 🎉 Huge thanks to all contributors & users who pushed this release forward ❤️ Highlights: • New FSDP training backend • Full-stack FP8 (train + infer) & MTP training during RL • Tools to reduce train–infer mismatch: custom IS, routing replay(R2/R3), true

thumb_up_off_alt71

chat_bubble_outline3

repeat7

shareShare

Yiping Wang

@ypwang61

19 days ago

8B model can outperform AlphaEvolve on open optimization problems by scaling compute for inference or test-time RL🚀! ⭕Circle packing: AlphaEvolve (Gemini-2.0-Flash/Pro) : 2.63586276 Ours (DeepSeek-R1-0528-Qwen3-8B) : 2.63598308 🔗in🧵 [1/n]

thumb_up_off_alt167

chat_bubble_outline4

repeat46

shareShare

slime

@slime_framework

13 days ago

We’ve added SGLang PD disaggregation to slime! Use --prefill-num-servers to split prefill and decode servers, making multi-turn RL rollouts more controllable under heavy prefill load. github.com/THUDM/slime/pu…

thumb_up_off_alt31

chat_bubble_outline0

repeat5

shareShare

slime

@slime_framework

12 days ago

Congrats! 🎉🎉🎉

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare