Xeophon (@thexeophon) 's Twitter Profile
Xeophon

@thexeophon

AI, LLMs

ID: 3382720905

calendar_today19-07-2015 07:35:14

20,20K Tweet

6,6K Followers

847 Following

Xeophon (@thexeophon) 's Twitter Profile Photo

Notes: - Two models, R1-Zero (V3-Base + RL, no SFT), R1 (SFT [CoT from R1-Zero] -> RL [reasoning] -> SFT [general] -> RL [alignment, reasoning]) - Six distillation models, i.e., SFT from R1 on Qwen, Llama. Outperforms RL-only on those models, RL on distilled models would improve