Mor Geva
@megamor2
ID: 850356925535531009
https://mega002.github.io/ 07-04-2017 14:37:44
368 Tweet
1,1K Followers
474 Following
The Linear Representation Hypothesis is now widely adopted despite its highly restrictive nature. Here, Csordás Róbert, Atticus Geiger, Christopher Manning & I present a counterexample to the LRH and argue for more expressive theories of interpretability: arxiv.org/abs/2408.10920
Emergence of "knowledge heads" in EleutherAI's Pythia 6.9B model, encoding mappings from countries to their capitals. Love this video, created by Amit Elhelo.
Mor Geva Riley Goodside Very interesting. It looks like the intent of the paper is to build this capability into the transformer to get it to one-shot it Otherwise, in order to make up for the transformer not having short-term memory outside of the token stream, you have to do something like
Introducing✨Still-Moving✨—our work from Google DeepMind that lets you apply *any* image customization method to video models🎥 Personalization (DreamBooth)🐶stylization (StyleDrop) 🎨 ControlNet🖼️—ALL in one method! Plus… you can control the amount of generated motion🏃♀️ 🧵👇
Some parameter vectors in MLP layers encode info about certain concepts (e.g. Harry Potter). Unlearning methods do not erase this info but reduce the activations of the vectors. Then jailbreak amplifies the activations to bypass unlearning. arxiv.org/abs/2406.11614 Yihuai Hong