Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile
Zhiyuan Zeng

@zhiyuanzeng_

PhD-ing @uwnlp @uwcse | Prev. @Tsinghua_Uni @TsinghuaNLP @princeton_nlp

ID: 1650962310880714753

linkhttp://zhiyuan-zeng.github.io calendar_today25-04-2023 20:37:54

174 Tweet

417 Followers

216 Following

Yike Wang (@yikewang_) 's Twitter Profile Photo

LLMs are helpful for scientific research — but will they continuously be helpful? Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).

LLMs are helpful for scientific research — but will they continuously be helpful?

Introducing 🔍ScienceMeter: current knowledge update methods enable 86% preservation of prior scientific knowledge, 72% acquisition of new, and 38%+ projection of future (arxiv.org/abs/2505.24302).
Jacqueline He (@jcqln_h) 's Twitter Profile Photo

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content.

We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.
Diyi Yang (@diyi_yang) 's Twitter Profile Photo

AI agents are transforming the workforce! We mapped how AI agents could #automate vs. #augment jobs across the U.S. workforce With a worker-first look of the future of work👇🧵

Ai2 (@allen_ai) 's Twitter Profile Photo

We are #1 on the Hugging Face heatmap - this is what true openness looks like!🥇🎉 750+ models 230+ datasets And counting... Come build with us huggingface.co/spaces/cfahlgr…

Rulin Shao (@rulinshao) 's Twitter Profile Photo

🎉Our Spurious Rewards is available on ArXiv! We added experiments on - More prompts/steps/models/analysis... - Spurious Prompts! Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️ Check out our 2nd blog: tinyurl.com/spurious-prompt

🎉Our Spurious Rewards is available on ArXiv! We added experiments on
- More prompts/steps/models/analysis...
- Spurious Prompts!
Surprisingly, we obtained 19.4% gains when replacing prompts with LaTex placeholder text (\lipsum) 😶‍🌫️

Check out our 2nd blog: tinyurl.com/spurious-prompt
Stella Li (@stellalisy) 's Twitter Profile Photo

Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation: tinyurl.com/spurious-prompt. Who knew Lorem ipsum can bring 19.4% gains compared to default prompt👀 Also, arXiv is out🤩 arxiv.org/abs/2506.10947📄

Spurious Rewards was not all‼️We now present spurious PROMPTS🤔 check out our latest findings and discussion on evaluation: tinyurl.com/spurious-prompt.

Who knew Lorem ipsum can bring 19.4% gains compared to default prompt👀

Also, arXiv is out🤩 arxiv.org/abs/2506.10947📄
Hanna Hajishirzi (@hannahajishirzi) 's Twitter Profile Photo

Yayyy!!! Best paper honorable mention at CVPR goes to our Molmo and Pixmo Ai2! This is now becoming a tend :) Last year both OLMo and Dolma received best paper awards at ACL.

Sarah Wiegreffe (on faculty job market!) (@sarahwiegreffe) 's Twitter Profile Photo

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland UMD Department of Computer Science this August. I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)

A bit late to announce, but I’m excited to share that I'll be starting as an assistant professor at the University of Maryland <a href="/umdcs/">UMD Department of Computer Science</a> this August.

I'll be recruiting PhD students this upcoming cycle for fall 2026. (And if you're a UMD grad student, sign up for my fall seminar!)
Hao Xu (@xuhaoxh) 's Twitter Profile Photo

Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰? Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram 😎 We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more. (1/n) ⬇️

Wanna 🔎 inside Internet-scale LLM training data w/o spending 💰💰💰?
Introducing infini-gram mini, an exact-match search engine with 14x less storage req than the OG infini-gram 😎
We make 45.6 TB of text searchable. Read on to find our Web Interface, API, and more.
(1/n) ⬇️
Zhoujun (Jorge) Cheng (@chengzhoujun) 's Twitter Profile Photo

🤯What we know about RL for reasoning might not hold outside math and code? We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly

🤯What we know about RL for reasoning might not hold outside math and code?

We revisit established findings on RL for LLM reasoning on six domains (Math, Code, Science, Logic, Simulation, Tabular) and found that previous conclusions drawn on math and code are surprisingly
Chenghao Yang (@chrome1996) 's Twitter Profile Photo

Have you noticed… 🔍 Aligned LLM generations feel less diverse? 🎯 Base models are decoding-sensitive? 🤔 Generations get more predictable as they progress? 🌲 Tree search fails mid-generation (esp. for reasoning)? We trace these mysteries to LLM probability concentration, and

Junhao Chen (@cumquaaa) 's Twitter Profile Photo

🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the

🚀 Training an image generation model and picking sides between autoregressive (AR) and diffusion? Why not both? Check out MADFormer with half of the model layers for AR and half for diffusion. AR gives a fast guess for the next patch prediction while diffusion helps refine the
Joongwon Kim (@danieljwkim) 's Twitter Profile Photo

Can we improve Llama 3’s reasoning abilities through post-training only? Introducing ASTRO, our new framework that teaches LLMs to perform in-context search and generate long CoT to solve math problems, via SFT and RL. Work done at @aiatmeta. 📄 Paper: arxiv.org/abs/2507.00417

Valentina Pyatkin (@valentina__py) 's Twitter Profile Photo

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR. But the set of constraints and verifier functions is limited and most models overfit on IFEval. We introduce IFBench to measure model generalization to unseen constraints.

💡Beyond math/code, instruction following with verifiable constraints is suitable to be learned with RLVR.
But the set of constraints and verifier functions is limited and most models overfit on IFEval.
We introduce IFBench to measure model generalization to unseen constraints.
Victoria Graf (@victoriawgraf) 's Twitter Profile Photo

Worried about overfitting to IFEval? 🤔 Use ✨IFBench✨ our new, challenging instruction-following benchmark! Loved working w/ Valentina Pyatkin! Personal highlight: our multi-turn eval setting makes it possible to isolate constraint-following from the rest of the instruction 🔍

Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

EvalTree accepted to Conference on Language Modeling 2025 - my first PhD work and first COLM paper 🙌! What would you like to see next—extensions, applications, or other directions? Always open to ideas! 🧐

EvalTree accepted to <a href="/COLM_conf/">Conference on Language Modeling</a> 2025 - my first PhD work and first COLM paper 🙌!

What would you like to see next—extensions, applications, or other directions? Always open to ideas! 🧐
Allen School (@uwcse) 's Twitter Profile Photo

#UWAllen University of Washington & NVIDIA researchers earned a #MLSys2025 Best Paper Award for boosting #LLM performance with FlashInfer—and showed “what’s possible when academia, industry & the open-source community innovate together,” says Zihao Ye. #AI #UWdiscovers news.cs.washington.edu/2025/07/01/all…

Scott Geng (@scottgeng00) 's Twitter Profile Photo

🤔 How do we train AI models that surpass their teachers? 🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯 The secret? Learn from the *differences* in weak data pairs! 📜 arxiv.org/abs/2507.06187 🧵 below

🤔 How do we train AI models that surpass their teachers?

🚨 In #COLM2025: ✨Delta learning ✨makes LLM post-training cheap and easy – with only weak data, we beat open 8B SOTA 🤯

The secret? Learn from the *differences* in weak data pairs!

📜 arxiv.org/abs/2507.06187

🧵 below
Weijia Shi (@weijiashi2) 's Twitter Profile Photo

Can data owners & LM developers collaborate to build a strong shared model while each retaining data control? Introducing FlexOlmo💪, a mixture-of-experts LM enabling: • Flexible training on your local data without sharing it • Flexible inference to opt in/out your data

Akari Asai (@akariasai) 's Twitter Profile Photo

Some updates 🚨 I finished my Ph.D at Allen School in June 2025! After a year at AI2 as a Research Scientist, I am joining CMU Language Technologies Institute | @CarnegieMellon & Machine Learning Dept. at Carnegie Mellon (courtesy) as an Assistant Professor in Fall 2026. The journey, acknowledgments & recruiting in 🧵

Some updates 🚨
I finished my Ph.D at <a href="/uwcse/">Allen School</a> in June 2025!
After a year at AI2 as a Research Scientist, I am joining CMU <a href="/LTIatCMU/">Language Technologies Institute | @CarnegieMellon</a> &amp; <a href="/mldcmu/">Machine Learning Dept. at Carnegie Mellon</a> (courtesy) as an Assistant Professor in Fall 2026.
The journey, acknowledgments &amp; recruiting in 🧵