Craig Citro (@craigcitro) Twitter Tweets • TwiCopy

Craig Citro

@craigcitro

+ Follow

i like math and puns

| research engineer @anthropicai; previously: @GoogleColab, Google Bigquery, @sagemath, number theorist

ID: 94276164

linkhttps://www.craigcitro.org/ calendar_today03-12-2009 07:05:50

3,3K Tweet

1,1K Takipçi

259 Takip Edilen

Irena Buzarewicz

@irenabuzarewicz

a year ago

thumb_up_off_alt7,7K

chat_bubble_outline34

repeat993

shareShare

New Anthropic research: Tracing the thoughts of a large language model. We built a "microscope" to inspect what happens inside AI models and use it to understand Claude’s (often complex and surprising) internal mechanisms.

thumb_up_off_alt8,8K

chat_bubble_outline182

repeat1,1K

shareShare

Joshua Batson

@thebasepoint

9 months ago

We did a thing. A new method for looking inside AI models, and ten deep dives on what we see. I spent my word budget on the paper, so today I'll just highlight some of the threads from the team. 🧵

thumb_up_off_alt46

chat_bubble_outline1

repeat5

shareShare

Craig Citro

@craigcitro

9 months ago

as I told several people, I was hoping to "dive into the deep end" when I joined anthropic. i was not sure what I expected, but it was been way, WAY above expectations. super proud of this work, check it out.

thumb_up_off_alt76

chat_bubble_outline1

repeat2

shareShare

Jack Lindsey

@jack_w_lindsey

9 months ago

Human thought is built out of billions of cellular computations each second. Language models also perform billions of computations for each word they write. But do these form a coherent “thought process?” We’re starting to build tools to find out! Some reflections in thread.

thumb_up_off_alt198

chat_bubble_outline5

repeat22

shareShare

Wes Gurnee

@wesg52

9 months ago

We tried to build a “microscope” to understand how Claude works. There are still many things which we cannot see clearly, but there are many exciting things that are coming into focus! A few reflections and exciting results:

thumb_up_off_alt138

chat_bubble_outline5

repeat10

shareShare

Adam Pearce

@adamrpearce

9 months ago

Addition has been extensively studied in simple toy models. In our latest paper, we describe a method for untangling circuits of computations and examine how Claude understands "calc: 36+59=" anthropic.com/research/traci…

thumb_up_off_alt17

chat_bubble_outline1

repeat6

shareShare

Emmanuel Ameisen

@mlpowered

9 months ago

We use language models like Claude to help us write, code, and think better. But we don’t understand how they work! We’ve built a new tool which allows us to look inside the model’s “brain” as it is “thinking” Using it, we found really surprising behaviors 🧵

thumb_up_off_alt22

chat_bubble_outline4

repeat7

shareShare

Emmanuel Ameisen

@mlpowered

9 months ago

We've made progress in our quest to understand how Claude and models like it think! The paper has many fun and surprising case studies, that anyone who is interested in LLMs would enjoy. Check out the video below for an example

thumb_up_off_alt117

chat_bubble_outline4

repeat7

shareShare

Chris Olah

@ch402

9 months ago

A few reasons why I'm really excited about this project!

thumb_up_off_alt1,1K

chat_bubble_outline17

repeat103

shareShare

Nicholas Turner

@nicholasturner0

9 months ago

As people that know me well can attest, I love a good mystery! 🔍 Fortunately for me, this work had twists both surprising and peculiar. 🧵

thumb_up_off_alt21

chat_bubble_outline1

repeat5

shareShare

Trenton Bricken

@trentonbricken

9 months ago

My favorite figure from our new Circuits papers -- "How does Claude do math?" Claude simultaneously does: 1. a back of the envelope calculation of the tens digits -- "the answer should be somewhere around 90". 2. an exact calculation of 6+9=15 using these super cool look up

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat119

shareShare

Tim Fist

@fiiiiiist

8 months ago

New hires at mech interp orgs:

thumb_up_off_alt455

chat_bubble_outline3

repeat21

shareShare

Jack Lindsey

@jack_w_lindsey

5 months ago

We're launching an "AI psychiatry" team as part of interpretability efforts at Anthropic! We'll be researching phenomena like model personas, motivations, and situational awareness, and how they lead to spooky/unhinged behaviors. We're hiring - join us! job-boards.greenhouse.io/anthropic/jobs…

thumb_up_off_alt2,2K

chat_bubble_outline158

repeat203

shareShare