Prabhat Nagarajan (@prabhatmn) 's Twitter Profile
Prabhat Nagarajan

@prabhatmn

@MSFTResearch Intern. Reinforcement Learning PhD student @UAlberta/@AmiiThinks. Previously: @SonyAI_global intern

ID: 388421900

linkhttp://prabhatnagarajan.com calendar_today10-10-2011 18:30:00

95 Tweet

262 Followers

330 Following

Finding The Frame Workshop (@rlframeworkshop) 's Twitter Profile Photo

Thanks to everyone who submitted; we enjoyed reviewing another fantastic set of papers this year! Check out all 27 accepted papers on our website 👉sites.google.com/view/findingth…

Finding The Frame Workshop (@rlframeworkshop) 's Twitter Profile Photo

The Most Thought-Provoking paper award goes to Thinking is Another Form of Control 🏆 Congratulations to  Josiah Hanna and Nicholas E. Corrado! 🎉 Check out their paper here: openreview.net/pdf/599e1574f4…

Finding The Frame Workshop (@rlframeworkshop) 's Twitter Profile Photo

We have a really exciting lineup of invited speakers this year 🔥 Kicking us off we have Prof. Erin Talvitie (Harvey Mudd College), whose talk is titled: 20 Years of Asking the Wrong Questions in Model-based reinforcement learning. Abstract in 🧵

We have a really exciting lineup of invited speakers this year 🔥 Kicking us off we have Prof. Erin Talvitie (Harvey Mudd College), whose talk is titled: 20 Years of Asking the Wrong Questions in Model-based reinforcement learning. Abstract in 🧵
Marlos C. Machado (@marloscmachado) 's Twitter Profile Photo

* RLC Full Papers:* (These are great papers!) - Deep RL track (Thu): Deep Reinforcement Learning with Gradient Eligibility Traces by E. Elelimy - Foundations track (Fri): An Analysis of Action-Value Temporal-Difference Methods That Learn State Values by B. Daley and P. Nagarajan

Prabhat Nagarajan (@prabhatmn) 's Twitter Profile Photo

Tomorrow (08/08) at RL_Conference, I will be presenting our (Brett Daley, Marlos C. Machado Martha White) work: "An Analysis of Action-Value Temporal-Difference Methods That Learn State Values" in CCIS 1-160 from 11:45-12:30. Paper: arxiv.org/pdf/2507.09523. Poster: 43.

Prabhat Nagarajan (@prabhatmn) 's Twitter Profile Photo

Have people seen this prescient 2001 post by Richard Sutton on self-verification? "An AI system can create and maintain knowledge only to the extent that it can verify that knowledge itself". This sentiment underpins much LLM reasoning research today. incompleteideas.net/IncIdeas/Keyto…

Barack Obama (@barackobama) 's Twitter Profile Photo

Jane Goodall had a remarkable ability to inspire us to connect with the natural wonders of our world, and her groundbreaking work on primates and the importance of conservation opened doors for generations of women in science. Michelle and I are thinking of all those who loved

Jens Tuyls (@jenstuyls) 's Twitter Profile Photo

Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!

Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!
Hamid Maei (@hamidmaei) 's Twitter Profile Photo

Here is my response about RL comment after watching the podcast of Dwarkesh Patel interviewing Andrej Karpathy and his comment on RL: x.com/dwarkesh_sp/st… RL is a powerful idea, but let's be thoughtful about when we actually need it. RL has an incredible 120+ year journey in animal

John Langford (@johnclangford) 's Twitter Profile Photo

A key claim here arxiv.org/abs/2511.05963 is that next token prediction has no inherent preference for a heliocentric Copernicus theory en.wikipedia.org/wiki/Copernica… over a geocentric Ptolemy en.wikipedia.org/wiki/Ptolemy#A… theory of observations. Predicting the next latent fixes that.