Joshua Batson (@thebasepoint) 's Twitter Profile
Joshua Batson

@thebasepoint

trying to understand evolved systems (🖥 and 🧬)
interpretability research @anthropicai
formerly @czbiohub, @mit math

ID: 481288361

calendar_today02-02-2012 15:09:00

1,1K Tweet

3,3K Takipçi

669 Takip Edilen

Ethan Perez (@ethanjperez) 's Twitter Profile Photo

We’re hiring someone to run the Anthropic Fellows Program! Our research collaborations have led to some of our best safety research and hires. We’re looking for an exceptional ops generalist, TPM, or research/eng manager to help us significantly scale and improve our collabs 🧵

Mike Krieger (@mikeyk) 's Twitter Profile Photo

We asked every version of Claude to make a clone of Claude(dot)ai, including today’s Sonnet 4.5… see what happened in the video

Jack Lindsey (@jack_w_lindsey) 's Twitter Profile Photo

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)

Prior to the release of Claude Sonnet 4.5, we conducted a white-box audit of the model, applying interpretability techniques to “read the model’s mind” in order to validate its reliability and alignment. This was the first such audit on a frontier LLM, to our knowledge. (1/15)
Joshua Batson (@thebasepoint) 's Twitter Profile Photo

This was so cool to be a part of. Jack led an incredible effort to quickly analyze the internals of a new model, as versions were coming in, to assess alignment. Research at the speed of model development.

Xinyan Hu (@xyvickyhu) 's Twitter Profile Photo

3->5, 4->6, 9→11, 7-> ? LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to

3->5, 4->6, 9→11, 7-> ?
LLMs solve this via In-Context Learning (ICL); but how is ICL represented and transmitted in LLMs? We build new tools identifying “extractor” and “aggregator” subspaces for ICL, and use them to understand ICL addition tasks like above. Come to
Iain Cheeseman (@iaincheeseman) 's Twitter Profile Photo

Today Anthropic released PubMed integration for Claude. No hallucinations. Just real science, real data. As a beta tester, this has been a game changer—like having a supercharged research assistant. Here are 6 prompts that will transform how you search the literature. A 🧵

Emmanuel Ameisen (@mlpowered) 's Twitter Profile Photo

How does an LLM compare two numbers? We studied this in a common counting task, and were surprised to learn that the algorithm it used was: Put each number on a helix, and then twist one helix to compare it to the other. Not your first guess? Not ours either. 🧵

How does an LLM compare two numbers? We studied this in a common counting task, and were surprised to learn that the algorithm it used was:

Put each number on a helix, and then twist one helix to compare it to the other.

Not your first guess? Not ours either. 🧵
Emmanuel Ameisen (@mlpowered) 's Twitter Profile Photo

Looking at the geometry of these features, we discover clear structure: the model doesn't use independent directions for each position range. Instead, it is representing each potential position on a smooth 6D helix through embedding space.

Wes Gurnee (@wesg52) 's Twitter Profile Photo

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!

New paper! We reverse engineered the mechanisms underlying Claude Haiku’s ability to perform a simple “perceptual” task. We discover beautiful feature families and manifolds, clean geometric transformations, and distributed attention algorithms!
Joshua Batson (@thebasepoint) 's Twitter Profile Photo

I came back from a 2 week vacation in July to find that Wes Gurnee had started studying how models break lines in text. He and Emmanuel Ameisen uncovered another elegant geometric structure behind that mechanism every week since then. Publishing was the only way to get them to stop. Enjoy

Isaac Kauvar (@ikauvar) 's Twitter Profile Photo

What mechanisms do LLMs use to perceive their world? An exciting effort led by Wes Gurnee Emmanuel Ameisen reveals beautiful structure in how Claude Haiku implements a fundamental "perceptual" task for an LLM: deciding when to start a new line of text.

julius tarng cyber inspector (@tarngerine) 's Twitter Profile Photo

What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~ We found that semantic concepts transfer across text, ASCII, and SVG:

What happens when you turn a designer into an interpretability researcher? They spend hours staring at feature activations in SVG code to see if LLMs actually understand SVGs. It turns out – yes~

We found that semantic concepts transfer across text, ASCII, and SVG:
Isaac Kauvar (@ikauvar) 's Twitter Profile Photo

Do LLMs actually "understand" SVG and ASCII art? We looked inside Claude's mind to find out. Answer: yes! The neural activity extracts high-level semantic concepts from the SVG code!

Do LLMs actually "understand" SVG and ASCII art? 

We looked inside Claude's mind to find out.

Answer: yes! The neural activity extracts high-level semantic concepts from the SVG code!
Joshua Batson (@thebasepoint) 's Twitter Profile Photo

We were chatting about how crazily general LLM features are, and said something like, "i mean, an eye feature would probably fire on everything, ascii art, svgs, you name it." Then we realized we could just...check?

Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Signs of introspection in LLMs. Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.

New Anthropic research: Signs of introspection in LLMs.

Can language models recognize their own internal thoughts? Or do they just make up plausible answers when asked about them? We found evidence for genuine—though limited—introspective capabilities in Claude.