Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile
Zhepei Wei ✈️ ICLR 2025

@weizhepei

Ph.D. Student @CS_UVA | Applied Scientist Intern @AmazonScience. Research interest: ML/NLP/LLM.

ID: 4733421379

linkhttps://www.cs.virginia.edu/~tqf5qb/ calendar_today09-01-2016 12:56:00

51 Tweet

122 Takipçi

341 Takip Edilen

Gautam Kamath (@thegautamkamath) 's Twitter Profile Photo

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them

I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4
Jiaxin Huang (@jiaxinhuang0229) 's Twitter Profile Photo

🚀🚀Excited to share our new work on Speculative Decoding by Langlin Huang! We tackle a key limitation in draft models which predict worse tokens at later positions, and present PosS that generates high-quality drafts!

Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile Photo

Nice work! In our recent paper WebAgent-R1 (arxiv.org/abs/2505.16421), we also observed a similar finding—test-time scaling via increased interactions! Feels like we’re not far from discovering new scaling laws for agents!🤩

Nice work! In our recent paper WebAgent-R1 (arxiv.org/abs/2505.16421), we also observed a similar finding—test-time scaling via increased interactions! Feels like we’re not far from discovering new scaling laws for agents!🤩
Lex Fridman (@lexfridman) 's Twitter Profile Photo

Here's my conversation with Terence Tao, one of the greatest mathematicians in history. We talk about the hardest problems in mathematics & physics, and how AI might help us humans to solve them. This conversation was a huge honor for me. I can't quite put it into words, but

Sinclair Wang (@sinclairwang1) 's Twitter Profile Photo

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?

What Makes a Base Language Model Suitable for RL?

Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”:

(1) Is the magic only happening on Qwen + Math?
(2) Does the "aha moment" only spark during math reasoning?
(3) Is evaluation hiding some tricky traps?
Yu Meng @ ICLR'25 (@yumeng0818) 's Twitter Profile Photo

Will be at #ICML2025 next week! We'll present the following works: 🛠️ LarPO: Tue 7/15 (Poster Session 1 East) 🚀 AdaDecode: Wed 7/16 (Poster Session 3 East) 🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop) Happy to chat about latest research in LLMs🤩

Will be at #ICML2025 next week! We'll present the following works:
🛠️ LarPO: Tue 7/15 (Poster Session 1 East)
🚀 AdaDecode: Wed 7/16 (Poster Session 3 East)
🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop)
Happy to chat about latest research in LLMs🤩
Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile Photo

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!

Haolin Liu (@haolinliu616) 's Twitter Profile Photo

🚨 LLM-as-a-Judge in RLVR can be easily hacked, even GPT-4o. Simple sentences can trick top models into false positives, although the task is just to compare a given solution to a reference answer. 📊 What we found: 1️⃣ Figure 1: “:” and “Thought process:” fool nearly all models

🚨 LLM-as-a-Judge in RLVR can be easily hacked, even GPT-4o.
Simple sentences can trick top models into false positives, although the task is just to compare a given solution to a reference answer.
📊 What we found:
1️⃣ Figure 1: “:” and “Thought process:” fool nearly all models
Zhepei Wei ✈️ ICLR 2025 (@weizhepei) 's Twitter Profile Photo

Highlight of my #ICML2025 poster session: “So… did you train your model on the test set?” 😅 Probably the ML community’s new “standard practice” question — sadly necessary, but here we are 🤦‍♂️

Yang Yue (@yangyue_thu) 's Twitter Profile Photo

New paper alert: Unifies insights from Limit-of-RLVR and ProRL — does current RLVR actually expand reasoning? Turns out: RLVR is mostly an efficient sampler with shrinking, very rarely an explorer with explanding. Explore is holy grail for LLM and may entail beyond 0/1 reward.

New paper alert: Unifies insights from Limit-of-RLVR and ProRL — does current RLVR actually expand reasoning?

Turns out: RLVR is mostly an efficient sampler with shrinking, very rarely an explorer with explanding.

Explore is holy grail for LLM and may entail beyond 0/1 reward.
Chujie Zheng (@chujiezheng) 's Twitter Profile Photo

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀

📄 huggingface.co/papers/2507.18…
Scale AI (@scale_ai) 's Twitter Profile Photo

As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with Ohio State and UC Berkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵

Anthropic (@anthropicai) 's Twitter Profile Photo

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

We’re running another round of the Anthropic Fellows program. 

If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.
ChengSong Huang (@chengsongh31219) 's Twitter Profile Photo

🚀🚀Excited to share our paper R-Zero: Self-Evolving Reasoning LLM from Zero Data ! How to train LLM without data? R-Zero teaches Large Language Models to reason starting with nothing but a base model. No data required!!! Paper: arxiv.org/abs/2508.05004 Code:

🚀🚀Excited to share our paper R-Zero: Self-Evolving Reasoning LLM from Zero Data !

How to train LLM without data?

R-Zero teaches Large Language Models to reason starting with nothing but a base model. 
No data required!!!
Paper: arxiv.org/abs/2508.05004
Code:
Jiaxin Huang (@jiaxinhuang0229) 's Twitter Profile Photo

Thrilled to share this exciting work, R-Zero, from my student ChengSong Huang where LLM learns to reason from Zero human-curated data! The framework includes co-evolution of a "Challenger" to propose difficult tasks and a "Solver" to solve them. Check out more details in the

Prophet Arena (@prophetarena) 's Twitter Profile Photo

🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence. That is, can AI truly predict the future by connecting today’s dots? 👉 What makes it special? - It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen

🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence.

That is, can AI truly predict the future by connecting today’s dots?

👉 What makes it special?

- It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen