Silun
@silunwang
LLM Post-training
ID: 2634851233
13-07-2014 05:00:42
24 Tweet
47 Followers
131 Following
Hot topics in RL On-policy RL Everyone faces training rollout mismatch - Truncated importance sampling: fengyao.notion.site/off-policy-rl#… - IcePop: doubled-ended importance ratio clipping - Rollout Routing Replay: arxiv.org/abs/2510.11370 Efficient rollout systems design PipelineRL: