Rohan Paul (@rohanpaul_ai) 's Twitter Profile
Rohan Paul

@rohanpaul_ai

💼 Engineer.

📚 I write daily on actionable AI developments.

🗞️ Subscribe and instantly get a 1300+page Python book → rohan-paul.com

ID: 2588345408

linkhttp://www.rohan-paul.com calendar_today25-06-2014 22:38:54

36,36K Tweet

63,63K Takipçi

780 Takip Edilen

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Qwen2.5-Math-7B-Instruct can scale to o1 level accuracy in only 32 rollouts. This paper's methods has a 4–16x better scaling rate over our deterministic search counterparts. Current inference-time scaling often relies on imperfect reward models that cause “reward hacking.”

Qwen2.5-Math-7B-Instruct can scale to o1 level accuracy in only 32 rollouts.

This paper's methods has a 4–16x better scaling rate over our deterministic search counterparts.

Current inference-time scaling often relies on imperfect reward models that cause “reward hacking.”