Nived Rajaraman (@nived_rajaraman) 's Twitter Profile
Nived Rajaraman

@nived_rajaraman

EECS PhD student at Berkeley. Former intern at Deepmind. Reinforcement learning. I organize the BLISS seminar bliss.eecs.berkeley.edu/Seminar/index.…

ID: 2612375550

linkhttps://nivedr.github.io/ calendar_today08-07-2014 21:38:16

9 Tweet

77 Followers

111 Following

Banghua Zhu (@banghuaz) 's Twitter Profile Photo

I'll be at #NeurIPS2023, and the academic job market this year! RT will be greatly appreciated! I work on statistics and information theory, with applications in robust statistics, offline RL, game theory, human-AI interactions and LLMs. I'm recently working on better

I'll be at #NeurIPS2023, and the academic job market this year! RT will be greatly appreciated!

I work on statistics and information theory, with applications in robust statistics, offline RL, game theory, human-AI interactions and LLMs.

I'm recently working on better
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Scaling Test-Time Compute Without Verification or RL is Suboptimal "In this paper, we prove that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed

Scaling Test-Time Compute Without Verification
or RL is Suboptimal

"In this paper, we prove that finetuning LLMs with verifier-based (VB) methods based on RL or search is far superior to verifier-free (VF) approaches based on distilling or cloning search traces, given a fixed
Amrith Setlur (@setlur_amrith) 's Twitter Profile Photo

🚨 RL or distillation/SFT: what to use to train next reasoning model? Which 📈 perf faster as we scale test compute? We answer these in a principled way so you don't have to burn GPUs🔥. 🎯 Ans: RL w/ rewards or verification >> SFT/distillation 😱 arxiv.org/pdf/2502.12118 🧵⤵️

🚨 RL or distillation/SFT: what to use to train next reasoning model? Which 📈 perf faster as we scale test compute?

We answer these in a principled way so you don't have to burn GPUs🔥.

🎯 Ans: RL w/ rewards or verification >> SFT/distillation 😱
arxiv.org/pdf/2502.12118 🧵⤵️
Eric Zhao (@ericzhao28) 's Twitter Profile Photo

Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new Google AI paper, we instead focus on scaling the search axis. By just randomly sampling 200x & self-verifying, Gemini 1.5 ➡️ o1 performance. The secret: self-verification is easier at scale!

Thinking for longer (e.g. o1) is only one of many axes of test-time compute. In a new <a href="/Google_AI/">Google AI</a> paper, we instead focus on scaling the search axis. By just randomly sampling 200x &amp; self-verifying, Gemini 1.5 ➡️ o1 performance. The secret: self-verification is easier at scale!
Aviral Kumar (@aviral_kumar2) 's Twitter Profile Photo

@nived_rajaraman will give an oral talk at the VerifAI workshop on why RL or verification is needed to effectively scale test-time compute! Lots of interesting insights in this paper! At VerifAI workshop, 3:45pm, April 27 arxiv.org/abs/2502.12118 x.com/setlur_amrith/…

@nived_rajaraman will give an oral talk at the VerifAI workshop on why RL or verification is needed to effectively scale test-time compute! Lots of interesting  insights in this paper!

At VerifAI workshop, 3:45pm, April 27 

arxiv.org/abs/2502.12118

x.com/setlur_amrith/…
Nived Rajaraman (@nived_rajaraman) 's Twitter Profile Photo

The abstract submission deadline for FoPt has been extended to the 21st of May (11:59pm UTC). Submission website: openreview.net/group?id=learn…

Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025! 📝Soliciting abstracts that advance foundational understanding of reasoning in language models, from theoretical analyses to rigorous empirical studies. 📆 Deadline: Sept 3, 2025

Announcing the first workshop on Foundations of Language Model Reasoning (FoRLM) at NeurIPS 2025!

📝Soliciting abstracts that advance foundational understanding of reasoning in language models, from theoretical analyses to rigorous empirical studies. 

📆 Deadline: Sept 3, 2025
Dylan Foster 🐢 (@canondetortugas) 's Twitter Profile Photo

Excited to announce our NeurIPS ’25 tutorial: Foundations of Imitation Learning: From Language Modeling to Continuous Control With Adam Block & Max Simchowitz (Max Simchowitz)

Excited to announce our NeurIPS ’25 tutorial:

Foundations of Imitation Learning: From Language Modeling to Continuous Control 

With Adam Block &amp; Max Simchowitz (<a href="/max_simchowitz/">Max Simchowitz</a>)