Rui Lu (@raylu_thu) Twitter Tweets • TwiCopy

Rui Lu

@raylu_thu

+ Follow

PhD student in @Tsinghua_Uni studying machine learning theory, graduate from Yao class. Also a youtuber @ 漫士沉思录 manshi_math

ID: 1578007763015311369

calendar_today06-10-2022 13:02:44

22 Tweet

119 Followers

116 Following

Rui Lu

@raylu_thu

2 years ago

Check us out at #NeurIPS2023 poster！We investigate into Q-value divergence phenomenon in offline RL and find self-excitation to be the main reason. Using layernorm in RL models can fundamentally prevent this from happening. arxiv.org/pdf/2310.04411…

thumb_up_off_alt8

chat_bubble_outline1

repeat4

shareShare

Rui Lu

@raylu_thu

2 years ago

Thank you for your appreciation! Reducing the communication cost is exactly what we want, since everybody needs to go through thousands of posters in two hours.

thumb_up_off_alt5

chat_bubble_outline1

repeat0

shareShare

Rui Lu

@raylu_thu

2 years ago

Good wishes for #SpringFestival

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Mengdi Wang

@mengdiwang10

2 years ago

How to capitalize #GenerativeAI and #diffusion models for modeling complex data and structured optimization? From images to proteins? Check my talk "Diffusion models for Generative Optimization" at Broad Institute , Harvard, MIT last week. Youtube: youtube.com/watch?v=hDRDx5…

thumb_up_off_alt249

chat_bubble_outline2

repeat52

shareShare

Rui Lu

@raylu_thu

a year ago

Still need finetuning to do safety alignment for LLM? Check out our new paper! We simply modify the LLM's parameters selected by a linear probe that can greatly reduce jail breaking behavior without hurting the performance! Details in arxiv link.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Rui Lu

@raylu_thu

a year ago

Whether video generation model can really become world model and understand physical laws? We conduct systematic study in synthetic setting. Check out our paper!

thumb_up_off_alt6

chat_bubble_outline2

repeat1

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

8 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? "we uncover that RL-trained models excel at low k (e.g., pass@1) but are consistently outperformed by base models at high k (e.g., pass@256)." "RLVR enhances sampling efficiency,

thumb_up_off_alt654

chat_bubble_outline32

repeat100

shareShare

Rui Lu

@raylu_thu

8 months ago

🚨Reasoning model learning different abilities in RL! Understand our paper in 1⃣️ video. Also includes frequent Q&A. Check it out!

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Rui Lu

@raylu_thu

8 months ago

attending #ICLR2025 at Singapore! welcome to our poster and chat at tomorrow morning, id 591.😃

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Andrew Zhao

@andrewz45732491

7 months ago

❄️Introducing Absolute Zero Reasoner: Our reasoner learns to both propose tasks that maximize learnability and improve reasoning by solving them, entirely through self-play—with no external data! It overall outperforms other "zero" models in math & coding domains. 🧵 1/

thumb_up_off_alt1,1K

chat_bubble_outline55

repeat343

shareShare

Rui Lu

@raylu_thu

7 months ago

The only thing that can stop the progress of AGI... is overleaf before NeuRIPS deadline🙃

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Jiahao Qiu

@jiahaoqiu99

7 months ago

The GAIA game is over, and Alita is the final answer. Alita takes the top spot in GAIA, outperforming OpenAI Deep Research and Manus. Many general-purpose agents rely heavily on large-scale, manually predefined tools and workflows. However, we believe that for general AI

thumb_up_off_alt61

chat_bubble_outline15

repeat26

shareShare

Rui Lu

@raylu_thu

6 months ago

How does reasoning model actually reason？ Our recent study shows that only 20% tokens with the high entropy play a critical role in deciding the reasoning trajectory! Check us out

thumb_up_off_alt18

chat_bubble_outline2

repeat3

shareShare

Shenzhi Wang🌟

@shenzhiwang_thu

6 months ago

🧐Two papers, opposite opinions. Ours: High-entropy tokens drive all performance gains in LLM RL. Another: Don’t let low-prob (often high-entropy) tokens over-dominate. Both are valid. Why? 💡Model size matters. Larger LLMs support our view; smaller LLMs support theirs. 🧵⬇️