
Sarath Sreedharan
@sarath_ssreedh
Assistant Professor, CSU
ID: 953043682424336384
http://sarathsreedharan.com/ 15-01-2018 23:18:15
158 Tweet
328 Followers
286 Following






We (Been Kim John Hewitt Neel Nanda Noah Fiedel Oyvind Tafjord) propose a research direction called 🤖agentic interpretability: we can and should ask and help AI systems to build mental models of us which will help us to build mental models of the LLMs. arxiv.org/abs/2506.12152


Really proud to share the fact that my student will be presenting her work on generating deceptive behavior on the Talking Robotics podcast. Special shout-out to Anagha Kulkarni for helping her get started on this topic.


