Mor Geva (@megamor2) 's Twitter Profile
Mor Geva

@megamor2

ID: 850356925535531009

linkhttps://mega002.github.io/ calendar_today07-04-2017 14:37:44

368 Tweet

1,1K Followers

474 Following

Christopher Potts (@chrisgpotts) 's Twitter Profile Photo

The Linear Representation Hypothesis is now widely adopted despite its highly restrictive nature. Here, Csordás Róbert, Atticus Geiger, Christopher Manning & I present a counterexample to the LRH and argue for more expressive theories of interpretability: arxiv.org/abs/2408.10920

Mor Geva (@megamor2) 's Twitter Profile Photo

Emergence of "knowledge heads" in EleutherAI's Pythia 6.9B model, encoding mappings from countries to their capitals. Love this video, created by Amit Elhelo.

Riley Goodside (@goodside) 's Twitter Profile Photo

I asked ChatGPT “how many r’s in strawberry?” then ignored it and blindly replied “wrong” 35 times. Its successive answers were 2, 1, 3, 2, 2, 3, 2, 2, 3, 3, 2, 4, 2, 2, 2, 3, 1, 2, 3, 2, 2, 3, 4, 2, 1, 2, 3, 2, 2, 3, 2, 4, 2, 3, 2, and 1.

I asked ChatGPT “how many r’s in strawberry?” then ignored it and blindly replied “wrong” 35 times.

Its successive answers were 2, 1, 3, 2, 2, 3, 2, 2, 3, 3, 2, 4, 2, 2, 2, 3, 1, 2, 3, 2, 2, 3, 4, 2, 1, 2, 3, 2, 2, 3, 2, 4, 2, 3, 2, and 1.
Wyatt Walls (@lefthanddraft) 's Twitter Profile Photo

Interesting paper about LLM counting. Concludes: "it would impossible to have transformers count arbitrarily well and for long contexts, without increasing the architecture size considerably" First paper I've seen that gives insight about why LLMs are so bad at this and suggests

Interesting paper about LLM counting. Concludes: "it would impossible to have transformers count arbitrarily well and for long contexts, without increasing the architecture size considerably"

First paper I've seen that gives insight about why LLMs are so bad at this and suggests
Singularity's Child gonzo/ai (@shoecatladder) 's Twitter Profile Photo

Mor Geva Riley Goodside Very interesting. It looks like the intent of the paper is to build this capability into the transformer to get it to one-shot it Otherwise, in order to make up for the transformer not having short-term memory outside of the token stream, you have to do something like

<a href="/megamor2/">Mor Geva</a> <a href="/goodside/">Riley Goodside</a> Very interesting. It looks like the intent of the paper is to build this capability into the transformer to get it to one-shot it

Otherwise, in order to make up for the transformer not having short-term memory outside of the token stream, you have to do something like
Riley Goodside (@goodside) 's Twitter Profile Photo

The reason LLMs say there's two r's in "strawberry" isn't (just) tokenization — they struggle with counting generally, e.g. "horse" in the example shown. The paper in the quoted post below offers the best intuition I've seen for why this happens: Transformers can't count because

The reason LLMs say there's two r's in "strawberry" isn't (just) tokenization — they struggle with counting generally, e.g. "horse" in the example shown.

The paper in the quoted post below offers the best intuition I've seen for why this happens: Transformers can't count because
Leshem Choshen 🤖🤗 (@lchoshen) 's Twitter Profile Photo

Human feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?🧐 We (15 orgs) gathered the key issues and next steps. Envisioning a community-driven feedback platform, like Wikipedia alphaxiv.org/abs/2408.16961 🧵

Human feedback is critical for aligning LLMs, so why don’t we collect it in the open ecosystem?🧐
We (15 orgs) gathered the key issues and next steps.
Envisioning
a community-driven feedback platform, like Wikipedia

alphaxiv.org/abs/2408.16961
🧵
Hila Chefer (@hila_chefer) 's Twitter Profile Photo

Introducing✨Still-Moving✨—our work from Google DeepMind that lets you apply *any* image customization method to video models🎥 Personalization (DreamBooth)🐶stylization (StyleDrop) 🎨 ControlNet🖼️—ALL in one method! Plus… you can control the amount of generated motion🏃‍♀️ 🧵👇

Mor Geva (@megamor2) 's Twitter Profile Photo

Some parameter vectors in MLP layers encode info about certain concepts (e.g. Harry Potter). Unlearning methods do not erase this info but reduce the activations of the vectors. Then jailbreak amplifies the activations to bypass unlearning. arxiv.org/abs/2406.11614 Yihuai Hong

Some parameter vectors in MLP layers encode info about certain concepts (e.g. Harry Potter).

Unlearning methods do not erase this info but reduce the activations of the vectors. Then jailbreak amplifies the activations to bypass unlearning.

arxiv.org/abs/2406.11614
<a href="/YihuaiH91773/">Yihuai Hong</a>
Clement Neo (@_clementneo) 's Twitter Profile Photo

🧠🖼️ New paper on interpreting VLMs!  We study Vision-Language Models (VLMs) like LLaVA to understand how they process objects in images. We find surprising insights about how these models identify objects in images and how their inner representations develop through the layers.

🧠🖼️ New paper on interpreting VLMs! 

We study Vision-Language Models (VLMs) like LLaVA to understand how they process objects in images. We find surprising insights about how these models identify objects in images and how their inner representations develop through the layers.
Mor Geva (@megamor2) 's Twitter Profile Photo

To study the impact of interpretability research, we recently created a citation graph with over 180k papers! This effort included obtaining paper-track info from *CL confs since 2018 and training track classifiers. Code and graph are now available at: github.com/mmarius/interp…

To study the impact of interpretability research, we recently created a citation graph with over 180k papers! This effort included obtaining paper-track info from *CL confs since 2018 and training track classifiers.

Code and graph are now available at: github.com/mmarius/interp…
Cas (Stephen Casper) (@stephenlcasper) 's Twitter Profile Photo

🧵🧵 With these two recent papers, I am starting to be convinced that mechanistic interpretability can be useful for performing and evaluating unlearning.

Marius Mosbach (@mariusmosbach) 's Twitter Profile Photo

I'll be at #EMNLP2024 next week to present our work on the impact of interpretability and analysis research on NLP. If you are interested in making interpretability work more robust and actionable, let's talk! I'm also recruiting interns to work on these problems together 💪