
Amil Dravid
@_amildravid
PhD @Berkeley_AI
ID: 1639674617378795520
http://avdravid.github.io 25-03-2023 17:04:54
170 Tweet
601 Followers
459 Following



How do LMs track what humans believe? In our new work, we show they use a pointer-like mechanism we call lookback. Super proud of this work by Nikhil Prakash and team! This is the most intricate piece of LM reverse engineering I’ve seen!

“How will my model behave if I change the training data?” Recent(-ish) work w/ Logan Engstrom: we nearly *perfectly* predict ML model behavior as a function of training data, saturating benchmarks for this problem (called “data attribution”).



We’re proud to announce three new tenure-track assistant professors joining TTIC in Fall 2026: Yossi Gandelsman (Yossi Gandelsman), Will Merrill (William Merrill), and Nick Tomlin (Nicholas Tomlin). Meet them here: buff.ly/JH1DFtT


The vision mechanistic interpretability workshop Mechanistic Interpretability for Vision @ CVPR2025 earlier this month at CVPR was very informative and fun! Looking forward to seeing this community grow. Thank you to the speakers and organizers trevordarrell David Bau Tamar Rott Shaham Yossi Gandelsman Joanna


Thank you very much to our wonderful speakers and attendees Mechanistic Interpretability for Vision @ CVPR2025 who made the workshop a huge success. We hope to see you again next year! The workshop recording link can be accessed at: youtu.be/LTh86RMAWsI?si….

In a recent paper, physicists used two predictable factors to reproduce the “creativity” seen from image-generating AI. Webb Wright reports: quantamagazine.org/researchers-un…
