Rui Lu (@raylu_thu) 's Twitter Profile
Rui Lu

@raylu_thu

PhD student in @Tsinghua_Uni studying machine learning theory, graduate from Yao class. Also a youtuber @ 漫士沉思录 manshi_math

ID: 1578007763015311369

calendar_today06-10-2022 13:02:44

22 Tweet

119 Followers

116 Following

Rui Lu (@raylu_thu) 's Twitter Profile Photo

Check us out at #NeurIPS2023 poster!We investigate into Q-value divergence phenomenon in offline RL and find self-excitation to be the main reason. Using layernorm in RL models can fundamentally prevent this from happening. arxiv.org/pdf/2310.04411…

Check us out at #NeurIPS2023 poster!We investigate into Q-value divergence phenomenon in offline RL and find self-excitation to be the main reason. Using layernorm in RL models can fundamentally prevent this from happening. arxiv.org/pdf/2310.04411…
Rui Lu (@raylu_thu) 's Twitter Profile Photo

Thank you for your appreciation! Reducing the communication cost is exactly what we want, since everybody needs to go through thousands of posters in two hours.

Mengdi Wang (@mengdiwang10) 's Twitter Profile Photo

How to capitalize #GenerativeAI and #diffusion models for modeling complex data and structured optimization? From images to proteins? Check my talk "Diffusion models for Generative Optimization" at Broad Institute , Harvard, MIT last week. Youtube: youtube.com/watch?v=hDRDx5…

How to capitalize #GenerativeAI and #diffusion models for modeling complex data and structured optimization? From images to proteins? 

Check my talk "Diffusion models for Generative Optimization"  at <a href="/broadinstitute/">Broad Institute</a> , Harvard, MIT last week. 

Youtube: youtube.com/watch?v=hDRDx5…
Rui Lu (@raylu_thu) 's Twitter Profile Photo

Still need finetuning to do safety alignment for LLM? Check out our new paper! We simply modify the LLM's parameters selected by a linear probe that can greatly reduce jail breaking behavior without hurting the performance! Details in arxiv link.

Rui Lu (@raylu_thu) 's Twitter Profile Photo

Whether video generation model can really become world model and understand physical laws? We conduct systematic study in synthetic setting. Check out our paper!

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? "we uncover that RL-trained models excel at low k (e.g., pass@1) but are consistently outperformed by base models at high k (e.g., pass@256)." "RLVR enhances sampling efficiency,

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

"we uncover that RL-trained models excel at low k (e.g., pass@1) but are consistently outperformed by base models at high k (e.g., pass@256)."

"RLVR enhances sampling efficiency,
Rui Lu (@raylu_thu) 's Twitter Profile Photo

🚨Reasoning model learning different abilities in RL! Understand our paper in 1⃣️ video. Also includes frequent Q&A. Check it out!

Andrew Zhao (@andrewz45732491) 's Twitter Profile Photo

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math &amp; coding domains.
🧵 1/
Jiahao Qiu (@jiahaoqiu99) 's Twitter Profile Photo

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI

The GAIA game is over, and Alita is the final answer.

Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus.

Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI
Rui Lu (@raylu_thu) 's Twitter Profile Photo

How does reasoning model actually reason? Our recent study shows that only 20% tokens with the high entropy play a critical role in deciding the reasoning trajectory! Check us out

Shenzhi Wang🌟 (@shenzhiwang_thu) 's Twitter Profile Photo

🧐Two papers, opposite opinions. Ours: High-entropy tokens drive all performance gains in LLM RL. Another: Don’t let low-prob (often high-entropy) tokens over-dominate. Both are valid. Why? 💡Model size matters. Larger LLMs support our view; smaller LLMs support theirs. 🧵⬇️

🧐Two papers, opposite opinions.

Ours: High-entropy tokens drive all performance gains in LLM RL.

Another: Don’t let low-prob (often high-entropy) tokens over-dominate.

Both are valid. Why?
💡Model size matters. Larger LLMs support our view; smaller LLMs support theirs.

🧵⬇️