Rui Yang (@ruiyang70669025) 's Twitter Profile
Rui Yang

@ruiyang70669025

PhD student @ UIUC

ID: 1597825781937291265

linkhttps://yangrui2015.github.io calendar_today30-11-2022 05:32:31

97 Tweet

285 Followers

396 Following

Yong Lin (@yong18850571) 's Twitter Profile Photo

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using: * 1/20 the model size (32B vs. 671B) * 1/5 the passes (184 vs. 1024) Meanwhile, we also release *

🔥Our Goedel-Prover-V2-32B topped the PutnamBench Leaderboard by solving 86 problems  —nearly 2× more than the previous SOTA DeepSeek-Prover-V2-671B (solved 47), while using:  
* 1/20 the model size (32B vs. 671B)  
* 1/5 the passes (184 vs. 1024)  
Meanwhile, we also release  
*
Tianbao Xie (@tianbaox) 's Twitter Profile Photo

🚀 OSWorld gets a major upgrade! OSWorld-Verified: 15 months community feedback → 300+ fixes (ambiguity, graders…), 50x faster eval through AWS parallelization More apple-to-apple comparison for reliable CUA evaluation ✨ 👇xlang.ai/blog/osworld-v…

Chenlu Ye (@ye_chenlu) 's Twitter Profile Photo

PROF🌀Right answer, flawed reason?🤔🌀 📄arxiv.org/pdf/2509.03403 Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀 Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning

PROF🌀Right answer, flawed reason?🤔🌀
📄arxiv.org/pdf/2509.03403
Excited to share our work: PROF-PRocess cOnsistency Filter! 🚀
Challenge: ORM is blind to flawed logic, and PRM suffers from reward hacking. Our method harmonizes strengths of PRM & ORM. #LLM #ReinforcementLearning
Manling Li (@manlingli_) 's Twitter Profile Photo

Check out the 1st Behavior Challenge, co-host with our Foundation Models for Embodied Agent Challenge at NeurIPS …models-meet-embodied-agents.github.io/behavior_chall… When I first moved my focus from LLMs/VLMs toward embodied agents, I expected the biggest challenges would be around perception, motor

Rui Yang (@ruiyang70669025) 's Twitter Profile Photo

Check out the new benchmark accepted to NeurIPS 2025 DB Track! We evaluate model merging algorithms across instruction following, math, multilingual understanding, coding, and safety.

Yujia Qin@ICLR2025 (@tsingyoga) 's Twitter Profile Photo

The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳 sandbox.agent-infra.com github.com/agent-infra/sa…

The tool/env infra behind UI-TARS-2 is open-sourced. Enjoy the All-in-One Agent Sandbox!🥳

sandbox.agent-infra.com
github.com/agent-infra/sa…
Cheng Qian (@qiancheng1231) 's Twitter Profile Photo

🚀 Introducing UserRL: a new framework to train agents that truly assist users through proactive interaction, not just chase static benchmarking scores. 📄 Paper: arxiv.org/pdf/2509.19736 💻 Code: github.com/SalesforceAIRe…

🚀 Introducing UserRL: a new framework to train agents that truly assist users through proactive interaction, not just chase static benchmarking scores.

 📄 Paper: arxiv.org/pdf/2509.19736
 💻 Code: github.com/SalesforceAIRe…
Xin Eric Wang @ ICLR 2025 (@xwang_lk) 's Twitter Profile Photo

🚀 Introducing 𝐀𝐠𝐞𝐧𝐭 𝐒3, the most advanced computer-use agent, now 𝐚𝐩𝐩𝐫𝐨𝐚𝐜𝐡𝐢𝐧𝐠 𝐡𝐮𝐦𝐚𝐧-𝐥𝐞𝐯𝐞𝐥 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞🧠💻 Just one year ago, Agent S scored ~20% on OSWorld: SOTA then, but far from human 72%. Today, Agent S3 reaches 6̳9̳.̳9̳%̳ (⬆10% over

Shizhe Diao (@shizhediao) 's Twitter Profile Photo

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration When step-scaling hits a plateau, scale rollouts, not steps. BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts. 👇 (1/n)

🚀 Introducing BroRL: Scaling Reinforcement Learning via Broadened Exploration

When step-scaling hits a plateau, scale rollouts, not steps.
BroRL takes reinforcement learning beyond saturation—reviving stalled models by expanding exploration with large-N rollouts.
👇 (1/n)
Hanze Dong @ ICLR 2025 (@hendrydong) 's Twitter Profile Photo

💥Thrilled to share our new work Reinforce-Ada, which fixes signal collapse in GRPO 🥳No more blind oversampling or dead updates. Just sharper gradients, faster convergence, and stronger models. ⚙️ One-line drop-in. Real gains. arxiv.org/html/2510.0499… github.com/RLHFlow/Reinfo…

💥Thrilled to share our new work Reinforce-Ada, which fixes signal collapse in GRPO

🥳No more blind oversampling or dead updates. Just sharper gradients, faster convergence, and stronger models.

⚙️ One-line drop-in. Real gains.
arxiv.org/html/2510.0499…

github.com/RLHFlow/Reinfo…
Zhenhailong Wang (@zhenhailongw) 's Twitter Profile Photo

Multimodal conversational agents struggle to follow complex policies, which also impose a fixed computational cost. We ask: 👉 How can we achieve stronger policy-following behavior without having to include policies in-context? 🌐: mikewangwzhl.github.io/TriMPI/ 🧵1/3

Multimodal conversational agents struggle to follow complex policies, which also impose a fixed computational cost.
We ask:
👉 How can we achieve stronger policy-following behavior without having to include policies in-context?
🌐: mikewangwzhl.github.io/TriMPI/ 🧵1/3
Rui Yang (@ruiyang70669025) 's Twitter Profile Photo

🥳 Excited to share ERA: our training recipe for VLM-based embodied agents with interleaved perception + reasoning, tackling both high-level planning and low-level manipulation. We cover embodied-knowledge data curation and agent RL design. 🔎 Findings 1️⃣ Beyond

Manling Li (@manlingli_) 's Twitter Profile Photo

World Model Reasoning for VLM Agents (NeurIPS 2025, Score 5544) We release VAGEN to teach VLMs to build internal world models via visual state reasoning: - StateEstimation: what is the current state? - TransitionModeling: what is next? MDP → POMDP shift to handle the partial

Manling Li (@manlingli_) 's Twitter Profile Photo

VLAs, VLMs, LLMs, and Vision Foundation Models for Embodied Agents! There are just so many new updates in recent months! We have updated our tutorial, come and join us if you would like to discuss the latest advances! Room: 306B Time: 1pm-5pm Slides: …models-meet-embodied-agents.github.io

VLAs, VLMs, LLMs, and Vision Foundation Models for Embodied Agents!

There are just so many new updates in recent months!

We have updated our tutorial, come and join us if you would like to discuss the latest advances!

Room: 306B
Time: 1pm-5pm
Slides: …models-meet-embodied-agents.github.io
Daniel Kang (@daniel_d_kang) 's Twitter Profile Photo

🤖 Feeling excited about the future of household robotic agents (i.e., embodied agents)? You should also consider their safety! 🔪Meet BEAT: the first visual backdoor attack on MLLM-based embodied agents. 🧵 1/7

Han Zhao (@hanzhao_ml) 's Twitter Profile Photo

Glad to share that our paper has been awarded the Outstanding Paper Award at EMNLP' 25!! I am not attending the conference, but please find Jingyan Shen and talk to her if you want to know more details!

Rui Yang (@ruiyang70669025) 's Twitter Profile Photo

Thrilled to share our paper (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team Jingyan Shen Jiarui Yao Yifan Sun Feng Luo Rui Pan, and big thanks to our advisors Prof. Tong Zhang and Han Zhao!

Thrilled to share our paper (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉
Huge congrats to the team <a href="/evangelinejy99/">Jingyan Shen</a> <a href="/ExplainMiracles/">Jiarui Yao</a> <a href="/YifanSun99/">Yifan Sun</a> <a href="/FengLuo895614/">Feng Luo</a> <a href="/rui4research/">Rui Pan</a>, and big thanks to our advisors Prof. Tong Zhang and <a href="/hanzhao_ml/">Han Zhao</a>!
Qineng Wang (@qineng_wang) 's Twitter Profile Photo

Most VLM benchmarks watch the world; few ask how actions *change* it from a robot's eye. Embodied cognition tells us that intelligence isn't just watching – it's enacted through interaction. 👉We introduce ENACT: A benchmark that tests if VLMs can track the evolution of a