Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile
Allen Nie (🇺🇦☮️)

@allen_a_nie

Stanford CS PhD working on RL. Co-creator of Trace. Advised by Emma Brunskill and Chris Piech. Previously: @GoogleDeepMind @MSFTResearch

ID: 104744479

linkhttp://anie.me/ calendar_today14-01-2010 07:37:22

1,1K Tweet

1,1K Followers

1,1K Following

John Yang (@jyangballin) 's Twitter Profile Photo

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified.

We built it by synthesizing a ton of agentic training data from 100+ Python repos.

Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
Jiayi Pan (@jiayi_pirate) 's Twitter Profile Photo

Hieu Pham My friend and lab mate Ruiqi Zhong has a great blog post reflecting many PhD students' thoughts on this: ruiqizhong.substack.com/p/is-a-phd-on-…

Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

I think more and more people will realize RL is RL — Deep RL (gradient-based) is a particular solution for RL. Focus on the problem — solve it however you want.

Joseph Suarez (e/🐡) (@jsuarez5341) 's Twitter Profile Photo

There are no intuitions about what is going on here. MDPs are a bad model for real RL problems, and even for most toy ones. RL is hard to explain because your data comes from interacting with an environment. Non-stationary + hard to make fast!

Ching-An Cheng (Hiring 2025 intern) (@chinganc_rl) 's Twitter Profile Photo

We're organizing workshops on Programmatic Representation for Agent Learning at the upcoming #ICML2025 and #RLC2025. We welcome contributions using programs as policies, reward functions, skill libraries, task generators, environment models, etc., and more! See you soon!😀

We're organizing workshops on Programmatic Representation for Agent Learning at the upcoming #ICML2025 and #RLC2025.  We welcome contributions using programs as policies, reward functions, skill libraries, task generators, environment models, etc., and more!  See you soon!😀
Brando Miranda (@brandohablando) 's Twitter Profile Photo

One of our newest pre-training projects was built with Marin! Stay tuned for more soon! Thanks for Elyas Obbad & David Hall for being so fun to work with -- and Percy Liang help test Marin & Sanmi Koyejo really good kind advice. & Rylan Schaeffer for his very efficient feedback ;)

Allan Zhou (@allanzhou17) 's Twitter Profile Photo

How should we order training examples? In a new blogpost (w/ Yiding Jiang), we explore a compression-based perspective: order your dataset to minimize its prequential codelength.

Anshul Kundaje (anshulkundaje@bluesky) (@anshulkundaje) 's Twitter Profile Photo

Nice example of low hanging fruit connecting the dots. Some of the comments with links to papers suggest this is not a denovo discovery & may have been obvious in hindsight. But a lot of things are obvious in hindsight. 1/

Csordás Róbert (@robert_csordas) 's Twitter Profile Photo

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations. In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6

Your language model is wasting half of its layers to just refine probability distributions rather than doing interesting computations.

In our paper, we found that the second half of the layers of the Llama 3 models have minimal effect on future computations. 1/6
Shunyu Yao (@shunyuyao12) 's Twitter Profile Photo

Tech is overestimated in the short term, (because infra is so damn harder than people realize) And underestimated in the long run. (becuase new tech becomes infra for new applications) Applies for computer, chip, internet, llm, rl, etc.

Andrea Zanette (@zanette_ai) 's Twitter Profile Photo

Can Large Reasoning Models Self-Train? We propose Self-Rewarded Training (SRT)—where LLMs generate their own supervision. Main findings: SRT initially matches RL on ground truth, but sustained training risks reward hacking. We also investigate mitigation strategies.

Anne Ouyang (@anneouyang) 's Twitter Profile Photo

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6) [🔗 link in final post]

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6)

[🔗 link in final post]
Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

I'm experiencing the 🤩 moment when an amazing company just built their new library on top of the framework I helped build! Trace is a library for creating **extremely flexible** LLM-based workflows. Syftr uses Trace to optimize their workflow and push the cost-accuracy Pareto

I'm experiencing the 🤩 moment when an amazing company just built their new library on top of the framework I helped build! Trace is a library for creating **extremely flexible** LLM-based workflows.

Syftr uses Trace to optimize their workflow and push the cost-accuracy Pareto
Lucy Li (@lucy3_li) 's Twitter Profile Photo

"Tell, Don't Show" was accepted to #ACL2025 Findings! Our simple approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. A possible addition to your CSS/DH research 🛠️ box ✨📚 arxiv.org/abs/2505.23166

"Tell, Don't Show" was accepted to #ACL2025 Findings! 

Our simple approach for literary topic modeling combines the new (language models) with the old (classic LDA) to yield better topics. A possible addition to your CSS/DH research 🛠️ box

✨📚 arxiv.org/abs/2505.23166
Allen Nie (🇺🇦☮️) (@allen_a_nie) 's Twitter Profile Photo

I'm onboarding a research dev from France for Trace today with Ching-An and Adith. None of us knew him before. He just shipped, built, and impressed everyone 😅

Alexander Terenin (@avt_im) 's Twitter Profile Photo

We've got a major update to our preprint on adversarial regret guarantees for Thompson sampling! As before, I think this is one of the most important projects I've worked on due to new algorithmic primitives that it - in principle - unlocks. Thread below on what's new!

We've got a major update to our preprint on adversarial regret guarantees for Thompson sampling!

As before, I think this is one of the most important projects I've worked on due to new algorithmic primitives that it - in principle - unlocks.

Thread below on what's new!
Haitham Bou Ammar (@hbouammar) 's Twitter Profile Photo

I read this paper in detail, and I am very sad! They literally re-do the optimal reward baseline work that we have known since forever, without even crediting the true authors in their derivations. The third screenshot is taken from: ieeexplore.ieee.org/stamp/stamp.js… As you see, they

I read this paper in detail, and I am very sad! They literally re-do the optimal reward baseline work that we have known since forever, without even crediting the true authors in their derivations.  

The third screenshot is taken from: ieeexplore.ieee.org/stamp/stamp.js…

As you see, they
Nan Jiang (@nanjiang_cs) 's Twitter Profile Photo

Given the sheer number of ppl interested in PG methods nowadays I'm sure innocent "rediscoveries" like this are happening everyday. Otoh, due diligence takes minimal effort today as you can just DeepResearch. All it takes is the sense/taste to ask "no way this is not done b4"...