Nikhil Prakash (@nikhil07prakash) 's Twitter Profile
Nikhil Prakash

@nikhil07prakash

CS Ph.D. @KhouryCollege with @davidbau, working on DNN interpretability.

ID: 834030478042738689

linkhttps://nix07.github.io/ calendar_today21-02-2017 13:22:16

990 Tweet

476 Takipçi

2,2K Takip Edilen

Nikhil Prakash (@nikhil07prakash) 's Twitter Profile Photo

How do language models track mental states of each character in a story, often referred to as Theory of Mind? Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it

How do language models track mental states of each character in a story, often referred to as Theory of Mind?

Our recent work takes a step in demystifing it by reverse engineering how Llama-3-70B-Instruct solves a simple belief tracking task, and surprisingly found that it