@synth_labs : Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 550% 🧵👇1/10 • TwiCopy

SynthLabs

@synth_labs

+ Follow

Scaling Up Good Synthetic Reasoning

We're hiring! ➡️ jobs.synthlabs.ai

ID: 1524469250253066240

linkhttps://www.SynthLabs.ai calendar_today11-05-2022 19:20:00

127 Tweet

14,14K Followers

47 Following

SynthLabs

@synth_labs

2 months ago

Our new method (ALP) monitors solve rates across RL rollouts and applies inverse difficulty penalties during RL training. Result? Models learn an implicit difficulty estimator—allocating 5x more tokens to hard vs easy problems, cutting overall usage by 50% 🧵👇1/10

thumb_up_off_alt33

chat_bubble_outline2

repeat8

shareShare