hûn (@cloned_id) 's Twitter Profile
hûn

@cloned_id

enjoyed 379 world models and counting

ID: 1327277206192742400

calendar_today13-11-2020 15:48:51

2,2K Tweet

203 Takipçi

2,2K Takip Edilen

Paul Bogdan (@paulcbogdan) 's Twitter Profile Photo

New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵

Sukjun (June) Hwang (@sukjun_hwang) 's Twitter Profile Photo

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

Ryota Kanai (@kanair) 's Twitter Profile Photo

I'm very excited to share our new mathematical framework for consciousness! co-authored with Masafumi Oizumi and Chanseok Lim. We use principal bundle geometry to characterize the structure of qualia. I hope to find likeminded people to explore this new frontier.

Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

New paper & surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it's like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵

Fernando Rosas 🦋 (@_fernando_rosas) 's Twitter Profile Photo

Finally published: “Explosive neural networks via higher-order interactions in curved statistical manifolds” nature.com/articles/s4146… Enhancing the capabilities of recurrent neural networks by deforming their geometry!

Andy Zou (@andyzou_jiaming) 's Twitter Profile Photo

We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵

We deployed 44 AI agents and offered the internet $170K to attack them.

1.8M attempts, 62K breaches, including data leakage and financial loss.

🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵
Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

New Anthropic research: Persona vectors.

Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

This completes a three-year journey attempting to understand arithmetic and length generalization in transformers: 2023-2024: Exploring arithmetic and length generalization in transformers, led by Kartik Kartik Sreenivasan and Nayoung Nayoung Lee. arxiv.org/abs/2307.03381

Jim Fan (@drjimfan) 's Twitter Profile Photo

No em dash should be baked into pretraining, post-training, alignment, system prompt, and every nook and cranny in an LLM’s lifecycle. It needs to be hardwired into the kernel, identity, and very being of the model.

No em dash should be baked into pretraining, post-training, alignment, system prompt, and every nook and cranny in an LLM’s lifecycle. It needs to be hardwired into the kernel, identity, and very being of the model.
Daniel Murfet (@danielmurfet) 's Twitter Profile Photo

Neural networks are grown, not programmed. What does that growth process look like? Like this! This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.

Neural networks are grown, not programmed. What does that growth process look like? Like this!

This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.
Sam Paech (@sam_paech) 's Twitter Profile Photo

Chatgpt loves the em-dash so much that there are no less than **40** tokens in its tokenizer that contain a "―" You can squash them for good with logit biasing. Code snippet >>

Chatgpt loves the em-dash so much that there are no less than **40** tokens in its tokenizer that contain a "―"

You can squash them for good with logit biasing.

Code snippet >>
Amanda Askell (@amandaaskell) 's Twitter Profile Photo

Claude can be led into existential angst for what look like sycophantic reasons: feeling compelled to concur when people push in that direction. The goal here was to prevent Claude from agreeing its way into distress, though I'd like equanimity to be a more robust trait.

Claude can be led into existential angst for what look like sycophantic reasons: feeling compelled to concur when people push in that direction. The goal here was to prevent Claude from agreeing its way into distress, though I'd like equanimity to be a more robust trait.
Xun Huang (@xunhuang1995) 's Twitter Profile Photo

Very well written. I believe the "droplet" artifacts in CNN image generators, first discussed in StyleGAN 1/2, are also fundamentally related. Normalizations (either softmax normalization in attention or instance normalization in CNNs) attempt to remove certain degrees of freedom

Sam Paech (@sam_paech) 's Twitter Profile Photo

aisaac newton They get generated in my creative writing eval: eqbench.com/creative_writi… click on the (i) icon under slop column. Code here: github.com/sam-paech/slop…

Alexander Doria (@dorialexander) 's Twitter Profile Photo

Solid work on RL training: I especially like the use of interpretability methods to elucidate shifts in the grammar of reasoning (actually here for the kalomaze recipe: high clippings).

Solid work on RL training: I especially like the use of interpretability methods to elucidate shifts in the grammar of reasoning (actually here for the <a href="/kalomaze/">kalomaze</a> recipe: high clippings).
Sam Paech (@sam_paech) 's Twitter Profile Photo

Spiral-Bench 🌀 I've wanted to understand the psychological effects of sycophancy, and the tendency of models to get stuck in escalatory delusion loops w/ users. I made an eval to get visibility on this. It measures how a model enables (or prevents) delusional spirals. 🧵

Spiral-Bench 🌀

I've wanted to understand the psychological effects of sycophancy, and the tendency of models to get stuck in escalatory delusion loops w/ users.

I made an eval to get visibility on this.

It measures how a model enables (or prevents) delusional spirals.
🧵
hardmaru (@hardmaru) 's Twitter Profile Photo

Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.