Prabhat Nagarajan
@prabhatmn
@MSFTResearch Intern. Reinforcement Learning PhD student @UAlberta/@AmiiThinks. Previously: @SonyAI_global intern
ID: 388421900
http://prabhatnagarajan.com 10-10-2011 18:30:00
95 Tweet
262 Followers
330 Following
The Most Thought-Provoking paper award goes to Thinking is Another Form of Control 🏆 Congratulations to Josiah Hanna and Nicholas E. Corrado! 🎉 Check out their paper here: openreview.net/pdf/599e1574f4…
Tomorrow (08/08) at RL_Conference, I will be presenting our (Brett Daley, Marlos C. Machado Martha White) work: "An Analysis of Action-Value Temporal-Difference Methods That Learn State Values" in CCIS 1-160 from 11:45-12:30. Paper: arxiv.org/pdf/2507.09523. Poster: 43.
Have people seen this prescient 2001 post by Richard Sutton on self-verification? "An AI system can create and maintain knowledge only to the extent that it can verify that knowledge itself". This sentiment underpins much LLM reasoning research today. incompleteideas.net/IncIdeas/Keyto…
Here is my response about RL comment after watching the podcast of Dwarkesh Patel interviewing Andrej Karpathy and his comment on RL: x.com/dwarkesh_sp/st… RL is a powerful idea, but let's be thoughtful about when we actually need it. RL has an incredible 120+ year journey in animal
A key claim here arxiv.org/abs/2511.05963 is that next token prediction has no inherent preference for a heliocentric Copernicus theory en.wikipedia.org/wiki/Copernica… over a geocentric Ptolemy en.wikipedia.org/wiki/Ptolemy#A… theory of observations. Predicting the next latent fixes that.