Zhepei Wei ✈️ ICLR 2025 (@weizhepei) Twitter Tweets • TwiCopy

Gautam Kamath

6 months ago

I wrote a post on how to connect with people (i.e., make friends) at CS conferences. These events can be intimidating so here's some suggestions on how to navigate them I'm late for #ICLR2025 #NAACL2025, but just in time for #AISTATS2025 and timely for #ICML2025 acceptances! 1/4

thumb_up_off_alt647

chat_bubble_outline4

repeat83

shareShare

Jiaxin Huang

@jiaxinhuang0229

5 months ago

🚀🚀Excited to share our new work on Speculative Decoding by Langlin Huang! We tackle a key limitation in draft models which predict worse tokens at later positions, and present PosS that generates high-quality drafts!

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

Zhepei Wei ✈️ ICLR 2025

@weizhepei

5 months ago

Nice work! In our recent paper WebAgent-R1 (arxiv.org/abs/2505.16421), we also observed a similar finding—test-time scaling via increased interactions! Feels like we’re not far from discovering new scaling laws for agents!🤩

thumb_up_off_alt10

chat_bubble_outline0

repeat0

shareShare

Lex Fridman

@lexfridman

5 months ago

Here's my conversation with Terence Tao, one of the greatest mathematicians in history. We talk about the hardest problems in mathematics & physics, and how AI might help us humans to solve them. This conversation was a huge honor for me. I can't quite put it into words, but

thumb_up_off_alt4,4K

chat_bubble_outline278

repeat700

shareShare

Sinclair Wang

@sinclairwang1

4 months ago

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?

thumb_up_off_alt505

chat_bubble_outline10

repeat89

shareShare

Yu Meng @ ICLR'25

@yumeng0818

4 months ago

Will be at #ICML2025 next week! We'll present the following works: 🛠️ LarPO: Tue 7/15 (Poster Session 1 East) 🚀 AdaDecode: Wed 7/16 (Poster Session 3 East) 🧮 Negative Reinforcement for Reasoning: Fri 7/18 (AI for Math Workshop) Happy to chat about latest research in LLMs🤩

thumb_up_off_alt24

chat_bubble_outline0

repeat7

shareShare

Zhepei Wei ✈️ ICLR 2025

@weizhepei

4 months ago

Thrilled to present three works at #ICML2025!🥳 🚀AdaDecode — Wed 7/16, East Exhibition Hall A-B (#E-2605) 🔢Negative Reinforcement for Reasoning — Fri 7/18, AI for Math Workshop 🤖WebAgent-R1 — Sat 7/19, Workshop on Computer Use Agents Feel free to stop by and chat about #LLMs!

thumb_up_off_alt12

chat_bubble_outline0

repeat5

shareShare

Haolin Liu

@haolinliu616

4 months ago

🚨 LLM-as-a-Judge in RLVR can be easily hacked, even GPT-4o. Simple sentences can trick top models into false positives, although the task is just to compare a given solution to a reference answer. 📊 What we found: 1️⃣ Figure 1: “:” and “Thought process:” fool nearly all models

thumb_up_off_alt19

chat_bubble_outline0

repeat3

shareShare

Zhepei Wei ✈️ ICLR 2025

@weizhepei

4 months ago

Highlight of my #ICML2025 poster session: “So… did you train your model on the test set?” 😅 Probably the ML community’s new “standard practice” question — sadly necessary, but here we are 🤦‍♂️

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Yang Yue

@yangyue_thu

3 months ago

New paper alert: Unifies insights from Limit-of-RLVR and ProRL — does current RLVR actually expand reasoning? Turns out: RLVR is mostly an efficient sampler with shrinking, very rarely an explorer with explanding. Explore is holy grail for LLM and may entail beyond 0/1 reward.

thumb_up_off_alt125

chat_bubble_outline4

repeat17

shareShare

Chujie Zheng

@chujiezheng

3 months ago

Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄 huggingface.co/papers/2507.18…

thumb_up_off_alt1,1K

chat_bubble_outline18

repeat143

shareShare

Quentin Gallouédec

@qgallouedec

3 months ago

There will be *no more than 5 days* between the release of GSPO and its implementation in TRL

thumb_up_off_alt323

chat_bubble_outline10

repeat11

shareShare

Scale AI

@scale_ai

3 months ago

As AI agents start taking real actions online, how do we prevent unintended harm? We teamed up with Ohio State and UC Berkeley to create WebGuard: the first dataset for evaluating web agent risks and building real-world safety guardrails for online environments. 🧵

thumb_up_off_alt88

chat_bubble_outline7

repeat22

shareShare

Anthropic

@anthropicai

3 months ago

We’re running another round of the Anthropic Fellows program. If you're an engineer or researcher with a strong coding or technical background, you can apply to receive funding, compute, and mentorship from Anthropic, beginning this October. There'll be around 32 places.

thumb_up_off_alt2,2K

chat_bubble_outline59

repeat212

shareShare

AK

@_akhaliq

3 months ago

R-Zero Self-Evolving Reasoning LLM from Zero Data

thumb_up_off_alt551

chat_bubble_outline13

repeat83

shareShare

ChengSong Huang

@chengsongh31219

3 months ago

🚀🚀Excited to share our paper R-Zero: Self-Evolving Reasoning LLM from Zero Data ! How to train LLM without data? R-Zero teaches Large Language Models to reason starting with nothing but a base model. No data required!!! Paper: arxiv.org/abs/2508.05004 Code:

thumb_up_off_alt130

chat_bubble_outline2

repeat33

shareShare

Jiaxin Huang

@jiaxinhuang0229

3 months ago

Thrilled to share this exciting work, R-Zero, from my student ChengSong Huang where LLM learns to reason from Zero human-curated data! The framework includes co-evolution of a "Challenger" to propose difficult tasks and a "Solver" to solve them. Check out more details in the

thumb_up_off_alt24

chat_bubble_outline1

repeat4

shareShare

Prophet Arena

@prophetarena

3 months ago

🔮 Introducing Prophet Arena — the AI benchmark for general predictive intelligence. That is, can AI truly predict the future by connecting today’s dots? 👉 What makes it special? - It can’t be hacked. Most benchmarks saturate over time, but here models face live, unseen

thumb_up_off_alt1,1K

chat_bubble_outline85

repeat148

shareShare