Jerry Zhi-Yang He (@_herobotics_) 's Twitter Profile
Jerry Zhi-Yang He

@_herobotics_

LLM research @ Bytedance Seed. prev. PhD at @berkeley_ai with @ancadianadragan, @facebookai, @StanfordSVL and @StanfordHRI.

ID: 2905646814

linkhttp://herobotics.me calendar_today21-11-2014 03:50:44

286 Tweet

475 Takipçi

1,1K Takip Edilen

Paul Bogdan (@paulcbogdan) 's Twitter Profile Photo

New paper: What happens when an LLM reasons? We created methods to interpret reasoning steps & their connections: resampling CoT, attention analysis, & suppressing attention We discover thought anchors: key steps shaping everything else. Check our tool & unpack CoT yourself 🧵

Massimo (@rainmaker1973) 's Twitter Profile Photo

A typical Japanese scene. It's sad to see a machine that has worked for many years go away. The 92-year-old former factory manager performed a purification ritual, said farewell, and then the machine was taken away.

Kevin Lu (@_kevinlu) 's Twitter Profile Photo

The internet is incredibly diverse, and it is sourced from data on topics which... humans actually cared about to engage with in the first place. There are low-resource languages and niche fanbases that will be forever immortalized in AGI because someone cared enough to document

Sanjeev Arora (@prfsanjeevarora) 's Twitter Profile Photo

Completely misses the point. Nobody is suggesting that solving IMO problems is useful for math research. The point is that AI has become really good at complex reasoning, and is not just memorizing its training data. It can handle completely new IMO questions designed by a

Nan Rosemary Ke (@rosemary_ke) 's Twitter Profile Photo

TLDR: Gemini answered 5 out of 6 questions correctly, also within the time frame of 4.5 hours. Read the natural language proof here. storage.googleapis.com/deepmind-media…

TLDR: Gemini answered 5 out of 6 questions correctly, also within the time frame of 4.5 hours.

Read the natural language proof here.

storage.googleapis.com/deepmind-media…
Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

New paper & surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Princeton Computer Science (@princetoncs) 's Twitter Profile Photo

⏱️AI is making verification process easier, with models verifying proofs in minutes. 💻 Now, Sanjeev Arora, Chi Jin, Danqi Chen and Princeton PLI have released Goedel Prover V2, a model more efficient and more accurate than any previous model. 👉 blog.goedel-prover.com

⏱️AI is making verification process easier, with models verifying proofs in minutes. 

💻 Now, <a href="/prfsanjeevarora/">Sanjeev Arora</a>, <a href="/chijinML/">Chi Jin</a>, <a href="/danqi_chen/">Danqi Chen</a> and <a href="/PrincetonPLI/">Princeton PLI</a> have released Goedel Prover V2, a model more efficient and more accurate than any previous model.

👉 blog.goedel-prover.com
gum1h0x (@gum1h0x) 's Twitter Profile Photo

wh It's the hardest problem by a pretty good margin. last years gdm approach struggled with quite a similar problem that required you to make some non-local non-trivial observations. You need to observe there's a global hole pattern. See the n^2 x n^2 gird as (n-1)^2 disjoint n x n

Denny Zhou (@denny_zhou) 's Twitter Profile Photo

Slides for my lecture “LLM Reasoning” at Stanford CS 25: dennyzhou.github.io/LLM-Reasoning-… Key points: 1. Reasoning in LLMs simply means generating a sequence of intermediate tokens before producing the final answer. Whether this resembles human reasoning is irrelevant. The crucial

guille (@guilleangeris) 's Twitter Profile Photo

looks like Lean is popular today, so here's a little post on how/why it works and implementing a mini version of it in Julia

looks like Lean is popular today, so here's a little post on how/why it works and implementing a mini version of it in Julia
Jason Weston (@jaseweston) 's Twitter Profile Photo

🤖Introducing: CoT-Self-Instruct 🤖 📝: arxiv.org/abs/2507.23751 - Builds high-quality synthetic data via reasoning CoT + quality filtering - Gains on reasoning tasks: MATH500, AMC23, AIME24 & GPQA-💎 - Outperforms existing train data s1k & OpenMathReasoning - Gains on

🤖Introducing: CoT-Self-Instruct 🤖
📝: arxiv.org/abs/2507.23751
- Builds high-quality synthetic data via reasoning CoT + quality filtering
- Gains on reasoning tasks: MATH500, AMC23, AIME24 &amp; GPQA-💎
- Outperforms existing train data s1k &amp; OpenMathReasoning
- Gains on
yshan (@yshan783399) 's Twitter Profile Photo

We are thrilled to introduce the Seed-OSS family of open-source LLMs, developed by ByteDance's Seed Team. GitHub: github.com/ByteDance-Seed… HuggingFace: huggingface.co/collections/By… Feel free to try it out and share your feedback!

We are thrilled to introduce the Seed-OSS family of open-source LLMs, developed by ByteDance's Seed Team.

GitHub: github.com/ByteDance-Seed…
HuggingFace: huggingface.co/collections/By…

Feel free to try it out and share your feedback!
Lin Yang (@lyang36) 's Twitter Profile Photo

Our IMO gold medal-winning AI pipeline is now model-agnostic. 🥇 What worked for Gemini 2.5 Pro now gets the same 5/6 score with GPT-5 & Grok4. This confirms the power of our verification-and-refinement pipeline to improve base model capabilities. The new code & results are

Thinking Machines (@thinkymachines) 's Twitter Profile Photo

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference”

We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to
Jessy Lin (@realjessylin) 's Twitter Profile Photo

What does it take to build a human-like user simulator? // To train collaborative agents, we need better user sims. In blog post pt 2, Nicholas Tomlin and I sketch a framework for building user simulators + open questions for research: jessylin.com/2025/09/25/use…

Julian Schrittwieser (@mononofu) 's Twitter Profile Photo

As a researcher at a frontier lab I’m often surprised by how unaware of current AI progress public discussions are. I wrote a post to summarize studies of recent progress, and what we should expect in the next 1-2 years: julian.ac/blog/2025/09/2…

Jelani Nelson (@minilek) 's Twitter Profile Photo

I’ve also been integrating LLMs into my research workflow. I spent most of Tuesday working on a problem I’ve been thinking about for a while with some collaborators. I had a conjecture on a possible way forward, and with some hours of thinking, mixing in conversations with Gemini

Jason Peng (@xbpeng4) 's Twitter Profile Photo

I have always been surprised by how few positive samples adversarial imitation learning needs to be effective. With ADD we take this to the extreme! A differential discriminator trained with a SINGLE positive sample can still be effective for a wide range of tasks.