Prabhat Nagarajan (@prabhatmn) Twitter Tweets • TwiCopy

Prabhat Nagarajan

@prabhatmn

+ Follow

@MSFTResearch Intern. Reinforcement Learning PhD student @UAlberta/@AmiiThinks. Previously: @SonyAI_global intern

ID: 388421900

linkhttp://prabhatnagarajan.com calendar_today10-10-2011 18:30:00

95 Tweet

262 Followers

330 Following

Finding The Frame Workshop

@rlframeworkshop

4 months ago

Thanks to everyone who submitted; we enjoyed reviewing another fantastic set of papers this year! Check out all 27 accepted papers on our website 👉sites.google.com/view/findingth…

thumb_up_off_alt13

chat_bubble_outline0

repeat5

shareShare

The Most Thought-Provoking paper award goes to Thinking is Another Form of Control 🏆 Congratulations to Josiah Hanna and Nicholas E. Corrado! 🎉 Check out their paper here: openreview.net/pdf/599e1574f4…

thumb_up_off_alt22

chat_bubble_outline0

repeat10

shareShare

Finding The Frame Workshop

@rlframeworkshop

4 months ago

We have a really exciting lineup of invited speakers this year 🔥 Kicking us off we have Prof. Erin Talvitie (Harvey Mudd College), whose talk is titled: 20 Years of Asking the Wrong Questions in Model-based reinforcement learning. Abstract in 🧵

thumb_up_off_alt27

chat_bubble_outline1

repeat4

shareShare

Marlos C. Machado

@marloscmachado

4 months ago

* RLC Full Papers:* (These are great papers!) - Deep RL track (Thu): Deep Reinforcement Learning with Gradient Eligibility Traces by E. Elelimy - Foundations track (Fri): An Analysis of Action-Value Temporal-Difference Methods That Learn State Values by B. Daley and P. Nagarajan

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Prabhat Nagarajan

@prabhatmn

4 months ago

Tomorrow (08/08) at RL_Conference, I will be presenting our (Brett Daley, Marlos C. Machado Martha White) work: "An Analysis of Action-Value Temporal-Difference Methods That Learn State Values" in CCIS 1-160 from 11:45-12:30. Paper: arxiv.org/pdf/2507.09523. Poster: 43.

thumb_up_off_alt27

chat_bubble_outline0

repeat6

shareShare

Prabhat Nagarajan

@prabhatmn

3 months ago

Have people seen this prescient 2001 post by Richard Sutton on self-verification? "An AI system can create and maintain knowledge only to the extent that it can verify that knowledge itself". This sentiment underpins much LLM reasoning research today. incompleteideas.net/IncIdeas/Keyto…

thumb_up_off_alt27

chat_bubble_outline1

repeat4

shareShare

Barack Obama

@barackobama

2 months ago

Jane Goodall had a remarkable ability to inspire us to connect with the natural wonders of our world, and her groundbreaking work on primates and the importance of conservation opened doors for generations of women in science. Michelle and I are thinking of all those who loved

thumb_up_off_alt34,34K

chat_bubble_outline951

repeat3,3K

shareShare

Jens Tuyls

@jenstuyls

2 months ago

Can the knowledge in language model representations guide the search for novel behaviors? We find that exploration with a simple, principled, representation-based bonus improves diversity and pass@k rates for inference-time and post-training!

thumb_up_off_alt88

chat_bubble_outline1

repeat19

shareShare

Hamid Maei

@hamidmaei

2 months ago

Here is my response about RL comment after watching the podcast of Dwarkesh Patel interviewing Andrej Karpathy and his comment on RL: x.com/dwarkesh_sp/st… RL is a powerful idea, but let's be thoughtful about when we actually need it. RL has an incredible 120+ year journey in animal

thumb_up_off_alt28

chat_bubble_outline1

repeat3

shareShare

Peter Stone

@peterstone_tx

a month ago

Proud of our latest Nature publication, led by the Sony AI ethics team!

thumb_up_off_alt19

chat_bubble_outline0

repeat5

shareShare

John Langford

@johnclangford

a month ago

A key claim here arxiv.org/abs/2511.05963 is that next token prediction has no inherent preference for a heliocentric Copernicus theory en.wikipedia.org/wiki/Copernica… over a geocentric Ptolemy en.wikipedia.org/wiki/Ptolemy#A… theory of observations. Predicting the next latent fixes that.

thumb_up_off_alt8

chat_bubble_outline1

repeat4

shareShare