Jiarui Yao (@explainmiracles) 's Twitter Profile
Jiarui Yao

@explainmiracles

UIUC CS PhD, 24

ID: 1658999835910115328

calendar_today18-05-2023 00:56:17

6 Tweet

35 Followers

389 Following

Shizhe Diao (@shizhediao) 's Twitter Profile Photo

Thrilled to share my first project at NVIDIA! ✨ Today’s language models are pre-trained on vast and chaotic Internet texts, but these texts are unstructured and poorly understood. We propose CLIMB — Clustering-based Iterative Data Mixture Bootstrapping — a fully automated

Thrilled to share my first project at NVIDIA! ✨

Today’s language models are pre-trained on vast and chaotic Internet texts, but these texts are unstructured and poorly understood. We propose CLIMB — Clustering-based Iterative Data Mixture Bootstrapping — a fully automated
Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

Thrilled to announce that our paper Sparse VideoGen got into #ICML2025! 🎉 Our new approach to speedup Video Generation by 2×. Details in the thread/paper. Huge thanks to my collaborators! Blog: svg-project.github.io Paper: arxiv.org/abs/2502.01776 Code:

Jiarui Yao (@explainmiracles) 's Twitter Profile Photo

We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs. – Achieves 2–4× faster convergence than RAFT – Improves accuracy on math

We introduce Gradient Variance Minimization (GVM)-RAFT, a principled dynamic sampling strategy that minimizes gradient variance to improve the efficiency of chain-of-thought (CoT) training in LLMs.

– Achieves 2–4× faster convergence than RAFT
– Improves accuracy on math
Xiusi Chen (@xiusi_chen) 's Twitter Profile Photo

🚀 Can we cast reward modeling as a reasoning task? 📖 Introducing our new paper: RM-R1: Reward Modeling as Reasoning 📑 Paper: arxiv.org/pdf/2505.02387 💻 Code: github.com/RM-R1-UIUC/RM-… Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we

🚀 Can we cast reward modeling as a reasoning task?

📖 Introducing our new paper: 
RM-R1: Reward Modeling as Reasoning

📑 Paper: arxiv.org/pdf/2505.02387
💻 Code: github.com/RM-R1-UIUC/RM-…

Inspired by recent advances of long chain-of-thought (CoT) on reasoning-intensive tasks, we
Hanze Dong @ ICLR 2025 (@hendrydong) 's Twitter Profile Photo

How to improve the test-time scalability? - Separate thinking & solution phases to control performance under budget constraint - Budget-Constrained Rollout + GRPO - Outperforms baselines on math/code. - Cuts token 30% usage without hurting performance huggingface.co/papers/2505.05…

Cheng Qian (@qiancheng1231) 's Twitter Profile Photo

📢 New Paper Drop: From Solving to Modeling! LLMs can solve math problems — but can they model the real world? 🌍 📄 arXiv: arxiv.org/pdf/2505.15068 💻 Code: github.com/qiancheng0/Mod… Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.

📢 New Paper Drop: From Solving to Modeling!
LLMs can solve math problems — but can they model the real world? 🌍

📄 arXiv: arxiv.org/pdf/2505.15068
💻 Code: github.com/qiancheng0/Mod…

Introducing ModelingAgent, a breakthrough system for real-world mathematical modeling with LLMs.
Peixuan Han (韩沛煊) (@peixuanhakhan) 's Twitter Profile Photo

(1/5) Want to make your LLM a skilled persuader? Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"! For details: 📄Arxiv: arxiv.org/pdf/2505.22961 🛠️GitHub: github.com/ulab-uiuc/ToMAP

(1/5) Want to make your LLM a skilled persuader?

Check out our latest paper: "ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind"!

For details:
📄Arxiv: arxiv.org/pdf/2505.22961
🛠️GitHub: github.com/ulab-uiuc/ToMAP
Xiusi Chen (@xiusi_chen) 's Twitter Profile Photo

Can LLMs make rational decisions like human experts? 📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making

Can LLMs make rational decisions like human experts?

📖Introducing DecisionFlow: Advancing Large Language Model as Principled Decision Maker

We introduce a novel framework that constructs a semantically grounded decision space to evaluate trade-offs in hard decision-making
Shulin Tian (@shulin_tian) 's Twitter Profile Photo

🎥 Video is already a tough modality for reasoning. Egocentric video? Even tougher! It is longer, messier, and harder. 💡 How do we tackle these extremely long, information-dense sequences without exhausting GPU memory or hitting API limits? We introduce 👓Ego-R1: A framework

Noam Razin (@noamrazin) 's Twitter Profile Photo

Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types. 📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs 🧵 1/6

Reward models (RMs) are key to language model post-training and inference pipelines. But, little is known about the relative pros and cons of different RM types.

📰 We investigate why RMs implicitly defined by language models (LMs) often generalize worse than explicit RMs
🧵
1/6
Yong Lin (@yong18850571) 's Twitter Profile Photo

(1/4)🚨 Introducing Goedel-Prover V2 🚨 🔥🔥🔥 The strongest open-source theorem prover to date. 🥇 #1 on PutnamBench: Solves 64 problems—with far less compute. 🧠 New SOTA on MiniF2F: * 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%. * 8B > 671B: Our 8B

(1/4)🚨 Introducing Goedel-Prover V2 🚨
🔥🔥🔥 The strongest open-source theorem prover to date.
🥇 #1 on PutnamBench: Solves 64 problems—with far less compute.
🧠 New SOTA on MiniF2F:
* 32B model hits 90.4% at Pass@32, beating DeepSeek-Prover-V2-671B’s 82.4%.
* 8B > 671B: Our 8B
Cheng Qian (@qiancheng1231) 's Twitter Profile Photo

🤝 Can LLM agents really understand us? We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands. 📄 arxiv.org/pdf/2507.22034 💻 github.com/SalesforceAIRe…

🤝 Can LLM agents really understand us?

We introduce UserBench: a user-centric gym environment for benchmarking how well agents align with nuanced human intent, not just follow commands.

📄 arxiv.org/pdf/2507.22034
💻 github.com/SalesforceAIRe…
Peixuan Han (韩沛煊) (@peixuanhakhan) 's Twitter Profile Photo

(1/5) Super excited to release our new paper on Reinforcement Learning: "Self-Aligned Reward: Towards Effective and Efficient Reasoners"! Preprint: arxiv.org/pdf/2509.05489

(1/5) Super excited to release our new paper on Reinforcement Learning: 

"Self-Aligned Reward: Towards Effective and Efficient Reasoners"!

Preprint: arxiv.org/pdf/2509.05489
Jiarui Yao (@explainmiracles) 's Twitter Profile Photo

Glad that our paper has been accepted to Neurips 2025! By gradient variance minimization (GVM), we balance the training data by difficulties and their contribution to the model. We achieve improvement on math reasoning. Please check the original post for more details.

Rui Yang (@ruiyang70669025) 's Twitter Profile Photo

Thrilled to share our paper (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team Jingyan Shen Jiarui Yao Yifan Sun Feng Luo Rui Pan, and big thanks to our advisors Prof. Tong Zhang and Han Zhao!

Thrilled to share our paper (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉
Huge congrats to the team <a href="/evangelinejy99/">Jingyan Shen</a> <a href="/ExplainMiracles/">Jiarui Yao</a> <a href="/YifanSun99/">Yifan Sun</a> <a href="/FengLuo895614/">Feng Luo</a> <a href="/rui4research/">Rui Pan</a>, and big thanks to our advisors Prof. Tong Zhang and <a href="/hanzhao_ml/">Han Zhao</a>!
Jiarui Yao (@explainmiracles) 's Twitter Profile Photo

Thrilled to share our paper MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉 Huge congrats to the team Jingyan Shen Rui Yang Yifan Sun Feng Luo

Thrilled to share our paper MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning (arxiv.org/pdf/2505.24846) won an EMNLP 2025 Outstanding Paper Award! 🎉🎉
Huge congrats to the team <a href="/evangelinejy99/">Jingyan Shen</a> <a href="/RuiYang70669025/">Rui Yang</a> <a href="/YifanSun99/">Yifan Sun</a> <a href="/FengLuo895614/">Feng Luo</a>