Javier Ferrando (@javifer_96) Twitter Tweets • TwiCopy

Javier Ferrando

@javifer_96

+ Follow

Interpretability Researcher

ID: 934601338062811136

linkhttps://javiferran.github.io/personal/ calendar_today26-11-2017 01:54:57

85 Tweet

748 Followers

539 Following

Adam Karvonen

@a_karvonen

a year ago

Sparse Autoencoders (SAEs) are popular, with 10+ new approaches proposed in the last year. How do we know if we are making progress? The field has relied on imperfect proxy metrics. We are releasing SAE Bench, a suite of 8 SAE evaluations! Project co-led with Can Rager 🧵

thumb_up_off_alt221

chat_bubble_outline6

repeat22

shareShare

Bart Bussmann

@bartbussmann

a year ago

Excited to share our work on Matryoshka SAEs - a new variant of sparse autoencoders that learn features at multiple levels of abstraction by splitting the dictionary into nested groups of latents of increasing size! w/ Patrick Leask and Neel Nanda 🪆🪆🪆

thumb_up_off_alt376

chat_bubble_outline5

repeat51

shareShare

Javier Ferrando

@javifer_96

10 months ago

Working with Neel as a MATS Research scholar has been a fantastic experience. I highly recommend applying!

thumb_up_off_alt33

chat_bubble_outline2

repeat2

shareShare

Oskar Obeso

@obalcells

3 months ago

Imagine if ChatGPT highlighted every word it wasn't sure about. We built a streaming hallucination detector that flags hallucinations in real-time.

thumb_up_off_alt5,5K

chat_bubble_outline139

repeat395

shareShare

Neel Nanda

@neelnanda5

3 months ago

I'm excited that, this year, interpretability finally works well enough to be practically useful in the real world! We found that, with enough effort into dataset construction, simple linear probes are cheap, real-time, token level hallucination detectors and beat baselines

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat118

shareShare

Thomas Fel

@napoolar

a month ago

🕳️🐇Into the Rabbit Hull – Part II Continuing our interpretation of DINOv2, the second part of our study concerns the geometry of concepts and the synthesis of our findings toward a new representational phenomenology: the Minkowski Representation Hypothesis

thumb_up_off_alt323

chat_bubble_outline5

repeat52

shareShare