Qiao Rui (@ray_qiaorui) 's Twitter Profile
Qiao Rui

@ray_qiaorui

PhD student in ML/LLM @NUSingapore. Previously visiting scholar @uwcse, undergrad @sutdsg.

ID: 2917237274

linkhttps://qiaoruiyt.github.io calendar_today03-12-2014 05:16:02

11 Tweet

52 Followers

115 Following

Ruochen Zhang not @ ICLR (@ruochenz_) 's Twitter Profile Photo

When R1 came, I was thinking we should have a model trained to “reason” not only in English 🤔 Guess what, we show that with only English finetuning, the reasoning generalizes to other languages too! Models can also be “forced” to reason in other langs 🤯 However, more work

Tong Chen @ ICLR (@tomchen0) 's Twitter Profile Photo

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. 🛠️ No changes to pre-training or decoding 🔥 Training models to latently distinguish between memorized

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data.
🛠️ No changes to pre-training or decoding
🔥 Training models to latently distinguish between memorized
Qiao Rui (@ray_qiaorui) 's Twitter Profile Photo

Great work by Antoine Chaffin! It’s exciting to learn that training with the hard queries and hard negatives from our ReasonIR dataset can also significantly boost the performance of small models 😆 Looking forward to more advances in training efficient retrieval models!

Stella Li (@stellalisy) 's Twitter Profile Photo

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

🤯 We cracked RLVR with... Random Rewards?!
Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:
- Random rewards: +21%
- Incorrect rewards: +25%
- (FYI) Ground-truth rewards: + 28.8%
How could this even work⁉️ Here's why: 🧵
Blogpost: tinyurl.com/spurious-rewar…
Jacqueline He (@jcqln_h) 's Twitter Profile Photo

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content.

We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.
Thao Nguyen (@thao_nguyen26) 's Twitter Profile Photo

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔
We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats!

arxiv.org/abs/2506.04689