hûn (@cloned_id) Twitter Tweets • TwiCopy

Paul Bogdan

2 months ago

New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵

thumb_up_off_alt771

chat_bubble_outline17

repeat150

shareShare

Sukjun (June) Hwang

@sukjun_hwang

2 months ago

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

thumb_up_off_alt2,2K

chat_bubble_outline58

repeat355

shareShare

Ryota Kanai

@kanair

2 months ago

I'm very excited to share our new mathematical framework for consciousness! co-authored with Masafumi Oizumi and Chanseok Lim. We use principal bundle geometry to characterize the structure of qualia. I hope to find likeminded people to explore this new frontier.

thumb_up_off_alt78

chat_bubble_outline2

repeat21

shareShare

Unsecured CCTV Cameras

@unsecured_cctv

2 months ago

Vienna, Austria 🇦🇹

thumb_up_off_alt15,15K

chat_bubble_outline109

repeat973

shareShare

Owain Evans

@owainevans_uk

a month ago

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

thumb_up_off_alt7,7K

chat_bubble_outline260

repeat1,1K

shareShare

Google DeepMind

@googledeepmind

a month ago

Our new state-of-the-art AI model Aeneas transforms how historians connect the past. 📜 Ancient inscriptions often lack context – it's like solving a puzzle with 90% of the pieces lost to time. It helps researchers interpret and situate inscriptions in their past context. 🧵

thumb_up_off_alt2,2K

chat_bubble_outline78

repeat392

shareShare

Fernando Rosas 🦋

@_fernando_rosas

a month ago

Finally published: “Explosive neural networks via higher-order interactions in curved statistical manifolds” nature.com/articles/s4146… Enhancing the capabilities of recurrent neural networks by deforming their geometry!

thumb_up_off_alt316

chat_bubble_outline8

repeat65

shareShare

Andy Zou

@andyzou_jiaming

a month ago

We deployed 44 AI agents and offered the internet $170K to attack them. 1.8M attempts, 62K breaches, including data leakage and financial loss. 🚨 Concerningly, the same exploits transfer to live production agents… (example: exfiltrating emails through calendar event) 🧵

thumb_up_off_alt2,2K

chat_bubble_outline68

repeat381

shareShare

Anthropic

@anthropicai

a month ago

New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.

thumb_up_off_alt5,5K

chat_bubble_outline222

repeat918

shareShare

Harrison Kinsley

@sentdex

a month ago

Quite the chart

thumb_up_off_alt3,3K

chat_bubble_outline65

repeat408

shareShare

Dimitris Papailiopoulos

@dimitrispapail

a month ago

This completes a three-year journey attempting to understand arithmetic and length generalization in transformers: 2023-2024: Exploring arithmetic and length generalization in transformers, led by Kartik Kartik Sreenivasan and Nayoung Nayoung Lee. arxiv.org/abs/2307.03381

thumb_up_off_alt146

chat_bubble_outline2

repeat22

shareShare

Jim Fan

@drjimfan

a month ago

No em dash should be baked into pretraining, post-training, alignment, system prompt, and every nook and cranny in an LLM’s lifecycle. It needs to be hardwired into the kernel, identity, and very being of the model.

thumb_up_off_alt1,1K

chat_bubble_outline171

repeat195

shareShare

Daniel Murfet

@danielmurfet

a month ago

Neural networks are grown, not programmed. What does that growth process look like? Like this! This is a small language model (3M) across training, visualised with a new interpretability technique: susceptibilities. We call this handsome critter the rainbow serpent.

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat139

shareShare

Sam Paech

@sam_paech

a month ago

Chatgpt loves the em-dash so much that there are no less than **40** tokens in its tokenizer that contain a "―" You can squash them for good with logit biasing. Code snippet >>

thumb_up_off_alt767

chat_bubble_outline33

repeat33

shareShare

Amanda Askell

@amandaaskell

a month ago

Claude can be led into existential angst for what look like sycophantic reasons: feeling compelled to concur when people push in that direction. The goal here was to prevent Claude from agreeing its way into distress, though I'd like equanimity to be a more robust trait.

thumb_up_off_alt199

chat_bubble_outline11

repeat7

shareShare

Xun Huang

@xunhuang1995

a month ago

Very well written. I believe the "droplet" artifacts in CNN image generators, first discussed in StyleGAN 1/2, are also fundamentally related. Normalizations (either softmax normalization in attention or instance normalization in CNNs) attempt to remove certain degrees of freedom

thumb_up_off_alt266

chat_bubble_outline6

repeat17

shareShare

Sam Paech

@sam_paech

a month ago

aisaac newton They get generated in my creative writing eval: eqbench.com/creative_writi… click on the (i) icon under slop column. Code here: github.com/sam-paech/slop…

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Alexander Doria

@dorialexander

25 days ago

Solid work on RL training: I especially like the use of interpretability methods to elucidate shifts in the grammar of reasoning (actually here for the kalomaze recipe: high clippings).

thumb_up_off_alt168

chat_bubble_outline5

repeat9

shareShare

Sam Paech

@sam_paech

22 days ago

Spiral-Bench 🌀 I've wanted to understand the psychological effects of sycophancy, and the tendency of models to get stuck in escalatory delusion loops w/ users. I made an eval to get visibility on this. It measures how a model enables (or prevents) delusional spirals. 🧵

thumb_up_off_alt462

chat_bubble_outline45

repeat57

shareShare

hardmaru

@hardmaru

13 days ago

Our new GECCO paper builds on our past work, showing how AI models can be evolved like organisms. By letting models evolve their own merging boundaries, compete to specialize, and find ‘attractive’ partners to merge with, we can create adaptive, robust and scalable AI ecosystems.

thumb_up_off_alt400

chat_bubble_outline18

repeat52

shareShare