Javier Ferrando (@javifer_96) 's Twitter Profile
Javier Ferrando

@javifer_96

Interpretability Researcher

ID: 934601338062811136

linkhttps://javiferran.github.io/personal/ calendar_today26-11-2017 01:54:57

85 Tweet

748 Followers

539 Following

Adam Karvonen (@a_karvonen) 's Twitter Profile Photo

Sparse Autoencoders (SAEs) are popular, with 10+ new approaches proposed in the last year. How do we know if we are making progress? The field has relied on imperfect proxy metrics. We are releasing SAE Bench, a suite of 8 SAE evaluations! Project co-led with Can Rager 🧵

Sparse Autoencoders (SAEs) are popular, with 10+ new approaches proposed in the last year. How do we know if we are making progress? The field has relied on imperfect proxy metrics.

We are releasing SAE Bench, a suite of 8 SAE evaluations!

Project co-led with <a href="/can_rager/">Can Rager</a> 

🧵
Bart Bussmann (@bartbussmann) 's Twitter Profile Photo

Excited to share our work on Matryoshka SAEs - a new variant of sparse autoencoders that learn features at multiple levels of abstraction by splitting the dictionary into nested groups of latents of increasing size! w/ Patrick Leask and Neel Nanda 🪆🪆🪆

Excited to share our work on Matryoshka SAEs - a new variant of sparse autoencoders that learn features at multiple levels of abstraction by splitting the dictionary into nested groups of latents of increasing size!

w/ <a href="/paanarle/">Patrick Leask</a> and <a href="/NeelNanda5/">Neel Nanda</a> 

🪆🪆🪆
Oskar Obeso (@obalcells) 's Twitter Profile Photo

Imagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.

Neel Nanda (@neelnanda5) 's Twitter Profile Photo

I'm excited that, this year, interpretability finally works well enough to be practically useful in the real world! We found that, with enough effort into dataset construction, simple linear probes are cheap, real-time, token level hallucination detectors and beat baselines

Thomas Fel (@napoolar) 's Twitter Profile Photo

🕳️🐇Into the Rabbit Hull – Part II Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: the Minkowski Representation Hypothesis

🕳️🐇Into the Rabbit Hull – Part II

Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: 
the Minkowski Representation Hypothesis