Jerry Zhi-Yang He (@_herobotics_) Twitter Tweets • TwiCopy

Paul Bogdan

5 months ago

New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵

thumb_up_off_alt771

chat_bubble_outline17

repeat150

shareShare

Massimo

@rainmaker1973

5 months ago

A typical Japanese scene. It's sad to see a machine that has worked for many years go away. The 92-year-old former factory manager performed a purification ritual, said farewell, and then the machine was taken away.

thumb_up_off_alt13,13K

chat_bubble_outline411

repeat1,1K

shareShare

Carlos E. Perez

@intuitmachine

5 months ago

Baidu's Ernie 4.5 and ByteDance's Seed 1.6 Thinking projects are really cooking!

thumb_up_off_alt15

chat_bubble_outline0

repeat3

shareShare

Kevin Lu

@_kevinlu

5 months ago

The internet is incredibly diverse, and it is sourced from data on topics which... humans actually cared about to engage with in the first place. There are low-resource languages and niche fanbases that will be forever immortalized in AGI because someone cared enough to document

thumb_up_off_alt76

chat_bubble_outline1

repeat3

shareShare

Dimitris Papailiopoulos

@dimitrispapail

4 months ago

Speculation: Within a year a <100B open weights model will also solve 5/6 IMO problems.

thumb_up_off_alt355

chat_bubble_outline19

repeat8

shareShare

Sanjeev Arora

@prfsanjeevarora

4 months ago

Completely misses the point. Nobody is suggesting that solving IMO problems is useful for math research. The point is that AI has become really good at complex reasoning, and is not just memorizing its training data. It can handle completely new IMO questions designed by a

thumb_up_off_alt600

chat_bubble_outline20

repeat38

shareShare

Nan Rosemary Ke

@rosemary_ke

4 months ago

TLDR: Gemini answered 5 out of 6 questions correctly, also within the time frame of 4.5 hours. Read the natural language proof here. storage.googleapis.com/deepmind-media…

thumb_up_off_alt17

chat_bubble_outline0

repeat1

shareShare

Owain Evans

@owainevans_uk

4 months ago

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

thumb_up_off_alt7,7K

chat_bubble_outline260

repeat1,1K

shareShare

Princeton Computer Science

@princetoncs

4 months ago

⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, Sanjeev Arora, Chi Jin, Danqi Chen and Princeton PLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 blog.goedel-prover.com

⏱️AI is making verification process easier, with models verifying proofs in minutes.

💻 Now, <a href="/prfsanjeevarora/">Sanjeev Arora</a>, <a href="/chijinML/">Chi Jin</a>, <a href="/danqi_chen/">Danqi Chen</a> and <a href="/PrincetonPLI/">Princeton PLI</a> have released Goedel Prover V2, a model more efficient and more accurate than any previous model.

👉 blog.goedel-prover.com

thumb_up_off_alt97

chat_bubble_outline1

repeat21

shareShare

gum1h0x

@gum1h0x

4 months ago

wh It's the hardest problem by a pretty good margin. last years gdm approach struggled with quite a similar problem that required you to make some non-local non-trivial observations. You need to observe there's a global hole pattern. See the n^2 x n^2 gird as (n-1)^2 disjoint n x n

thumb_up_off_alt149

chat_bubble_outline4

repeat4

shareShare

Denny Zhou

@denny_zhou

4 months ago

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

thumb_up_off_alt2,2K

chat_bubble_outline22

repeat322

shareShare

guille

@guilleangeris

4 months ago

looks like Lean is popular today, so here's a little post on how/why it works and implementing a mini version of it in Julia

thumb_up_off_alt158

chat_bubble_outline4

repeat18

shareShare

Jason Weston

@jaseweston

4 months ago

🤖Introducing: CoT-Self-Instruct 🤖 📝: arxiv.org/abs/2507.23751 - Builds high-quality synthetic data via reasoning CoT + quality filtering - Gains on reasoning tasks: MATH500, AMC23, AIME24 & GPQA-💎 - Outperforms existing train data s1k & OpenMathReasoning - Gains on

thumb_up_off_alt382

chat_bubble_outline1

repeat65

shareShare

yshan

@yshan783399

3 months ago

We are thrilled to introduce the Seed-OSS family of open-source LLMs, developed by ByteDance's Seed Team. GitHub: github.com/ByteDance-Seed… HuggingFace: huggingface.co/collections/By… Feel free to try it out and share your feedback!

thumb_up_off_alt207

chat_bubble_outline7

repeat52

shareShare

Lin Yang

@lyang36

3 months ago

Our IMO gold medal-winning AI pipeline is now model-agnostic. 🥇 What worked for Gemini 2.5 Pro now gets the same 5/6 score with GPT-5 & Grok4. This confirms the power of our verification-and-refinement pipeline to improve base model capabilities. The new code & results are

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat85

shareShare

Thinking Machines

@thinkymachines

3 months ago

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

thumb_up_off_alt6,6K

chat_bubble_outline205

repeat1,1K

shareShare

Jessy Lin

@realjessylin

2 months ago

What does it take to build a human-like user simulator? // To train collaborative agents, we need better user sims. In blog post pt 2, Nicholas Tomlin and I sketch a framework for building user simulators + open questions for research: jessylin.com/2025/09/25/use…

thumb_up_off_alt58

chat_bubble_outline3

repeat11

shareShare

Julian Schrittwieser

@mononofu

2 months ago

As a researcher at a frontier lab I’m often surprised by how unaware of current AI progress public discussions are. I wrote a post to summarize studies of recent progress, and what we should expect in the next 1-2 years: julian.ac/blog/2025/09/2…

thumb_up_off_alt5,5K

chat_bubble_outline231

repeat810

shareShare

Jelani Nelson

@minilek

2 months ago

I’ve also been integrating LLMs into my research workflow. I spent most of Tuesday working on a problem I’ve been thinking about for a while with some collaborators. I had a conjecture on a possible way forward, and with some hours of thinking, mixing in conversations with Gemini

thumb_up_off_alt291

chat_bubble_outline14

repeat18

shareShare

Jason Peng

@xbpeng4

2 months ago

I have always been surprised by how few positive samples adversarial imitation learning needs to be effective. With ADD we take this to the extreme! A differential discriminator trained with a SINGLE positive sample can still be effective for a wide range of tasks.

thumb_up_off_alt167

chat_bubble_outline5

repeat23

shareShare