Michael Toker (@michael_toker) Twitter Tweets • TwiCopy

Michael Toker

@michael_toker

+ Follow

PhD student @Technion NLP lab - Developing explainability methods to gain a better understanding of LLMs

ID: 1522869113181413378

linkhttps://tokeron.github.io/ calendar_today07-05-2022 09:21:40

91 Tweet

115 Followers

508 Following

Yonatan Belinkov

@boknilev

8 months ago

Since people have been asking - the #blackboxNLP workshop will return this year, to be held with #emnlp2025. This workshop is all about interpreting and analyzing NLP models (and yes, this includes LLMs). More details soon, follow BlackboxNLP

thumb_up_off_alt73

chat_bubble_outline1

repeat11

shareShare

Tal Haklay

@tal_haklay

8 months ago

Our paper "Position-Aware Circuit Discovery" got accepted to ACL! 🎉 Huge thanks to my collaborators🙏 Hadas Orgad @ ICML David Bau Aaron Mueller Yonatan Belinkov See you in Vienna! 🇦🇹 #ACL2025 ACL 2025

Our paper "Position-Aware Circuit Discovery" got accepted to ACL! 🎉

Huge thanks to my collaborators🙏
<a href="/OrgadHadas/">Hadas Orgad @ ICML</a> <a href="/davidbau/">David Bau</a> <a href="/amuuueller/">Aaron Mueller</a> <a href="/boknilev/">Yonatan Belinkov</a>

See you in Vienna! 🇦🇹 #ACL2025 <a href="/aclmeeting/">ACL 2025</a>

thumb_up_off_alt185

chat_bubble_outline1

repeat30

shareShare

Michael Toker

@michael_toker

7 months ago

Great work on unlearning in LLMs—very neat method and an impressive contribution.

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Dana Arad 🎗️

@dana_arad4

7 months ago

Tried steering with SAEs and found that not all features behave as expected? Check out our new preprint - "SAEs Are Good for Steering - If You Select the Right Features" 🧵

thumb_up_off_alt166

chat_bubble_outline7

repeat32

shareShare

Technion CS NLP

@technion_cs_nlp

7 months ago

Congrats Hadas Orgad and Alex on your beautiful wedding! Wishing you a lifetime of joy, love, and highly-cited collaborations 💍✨

Congrats <a href="/OrgadHadas/">Hadas Orgad</a> and Alex on your beautiful wedding!
Wishing you a lifetime of joy, love, and highly-cited collaborations 💍✨

thumb_up_off_alt25

chat_bubble_outline2

repeat1

shareShare

Nitay Calderon

@nitcal

7 months ago

Preferences drive modern LLM research and development: from model alignment to evaluation. But how well do we understand them? Excited to share our new preprint: Multi-domain Explainability of Preferences arxiv.org/abs/2505.20088 Roi Reichart Liat 🧵👇 1/11

thumb_up_off_alt32

chat_bubble_outline2

repeat16

shareShare

Yaniv Nikankin

@ynikankin

7 months ago

VLMs perform better when answering questions about text than when answering the same questions about images - but why? and how can we fix it? We investigate this gap from a mechanistic interpretability perspective, and use our findings to close a third of it! 🧵

thumb_up_off_alt148

chat_bubble_outline1

repeat25

shareShare

Yonatan Belinkov

@boknilev

7 months ago

After discussing the Llada paper today, we were wondering: is this just masked language modeling Bert style? Main differences seem to be: (a) training with varying masking budgets; (b) inference with gradual unmasking determined by confidence. arxiv.org/abs/2502.09992

thumb_up_off_alt243

chat_bubble_outline11

repeat26

shareShare

Zorik Gekhman

@zorikgekhman

6 months ago

Now accepted to #COLM2025! We formally define hidden knowledge in LLMs and show its existence in a controlled study. We even show that a model can know the answer yet fail to generate it in 1,000 attempts 😵 Looking forward to presenting and discussing our work in person.

thumb_up_off_alt58

chat_bubble_outline1

repeat14

shareShare

Itay Itzhak

@itay_itzhak_

6 months ago

🚨New paper alert🚨 🧠 Instruction-tuned LLMs show amplified cognitive biases — but are these new behaviors, or pretraining ghosts resurfacing? Excited to share our new paper, accepted to CoLM 2025🎉! See thread below 👇 #BiasInAI #LLMs #MachineLearning #NLProc

thumb_up_off_alt74

chat_bubble_outline3

repeat24

shareShare

Yonatan Belinkov

@boknilev

5 months ago

BlackboxNLP is the workshop on interpreting and analyzing NLP models (including LLMs, VLMs, etc). We accept full papers and extended abstracts. The workshop is highly attended; great exposure for your finished work or feedback on work in progress. #emnlp2025 at Sujhou, China!

thumb_up_off_alt29

chat_bubble_outline0

repeat6

shareShare

David Bau

@davidbau

5 months ago

Announcing a deep net interpretability talk series! Every week you will find new talks on recent research in the science of neural networks. The first few are posted: @jack_merulllo_, Roy Rinberg, and me. At the NDIF Youtube Channel: youtube.com/@NDIFTeam.

thumb_up_off_alt167

chat_bubble_outline6

repeat23

shareShare

Michael Toker

@michael_toker

2 months ago

Excited to share our recent work: DeLeaker! We're tackling semantic leakage in T2I models.

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

David Bau

@davidbau

a month ago

At the #Neurips2025 mechanistic interpretability workshop I gave a brief talk about Venetian glassmaking, since I think we face a similar moment in AI research today. Here is a blog post summarizing the talk: davidbau.com/archives/2025/…

thumb_up_off_alt503

chat_bubble_outline20

repeat96

shareShare

Yonatan Belinkov

@boknilev

20 days ago

Interested in changes in perception of the term NLP vs LLMs. Which statement do you agree with? - NLP and LLMs are just different things these days - LLMs are a subset of NLP - NLP is deprecated due to LLMs

thumb_up_off_alt12

chat_bubble_outline1

repeat9

shareShare