Qiao Rui (@ray_qiaorui) Twitter Tweets • TwiCopy

Qiao Rui

@ray_qiaorui

+ Follow

PhD student in ML/LLM @NUSingapore. Previously visiting scholar @uwcse, undergrad @sutdsg.

ID: 2917237274

linkhttps://qiaoruiyt.github.io calendar_today03-12-2014 05:16:02

11 Tweet

52 Followers

115 Following

Michael Littman

@mlittmancs

5 years ago

It makes me happy that the common machine-learning phrase "test set" is a palindrome.

thumb_up_off_alt2,2K

chat_bubble_outline20

repeat245

shareShare

Bryan Kian Hsiang Low

@bryanklow

10 months ago

👏👏👏 pham nguyen Rachael Sim qphong Zhiliang Chen Xinyuan Lin Xiaoqiang Xinyi Xu patrick jaillet Dai Zhongxiang Arun Verma Rui Qiao Wu Zhaoxuan Angel Pang Wei Koh Apivich H. (Kaotoo) Gregory Lau Bingchen Wang et al. Accepted ICLR 2026 #ICLR2025 AAAI #AAAI2025

👏👏👏 <a href="/nguyenpham2804/">pham nguyen</a> <a href="/RachaelSim2/">Rachael Sim</a> <a href="/qphong/">qphong</a> <a href="/ZhiliangChen94/">Zhiliang Chen</a> <a href="/xinyuan3142/">Xinyuan</a> <a href="/xiaoqiang_98/">Lin Xiaoqiang</a> <a href="/michael_xinyi/">Xinyi Xu</a> <a href="/pjaillet/">patrick jaillet</a> <a href="/Dai_Zh/">Dai Zhongxiang</a> <a href="/arun_v3rma/">Arun Verma</a> <a href="/ray_qiaorui/">Rui Qiao</a> <a href="/WuZhaoxuan/">Wu Zhaoxuan</a> <a href="/JingtanW/">Angel</a> <a href="/PangWeiKoh/">Pang Wei Koh</a> <a href="/apivich_h/">Apivich H. (Kaotoo)</a> <a href="/greglau/">Gregory Lau</a> <a href="/BingchenWang1/">Bingchen Wang</a> et al.

Accepted <a href="/iclr_conf/">ICLR 2026</a> #ICLR2025 <a href="/RealAAAI/">AAAI</a> #AAAI2025

thumb_up_off_alt28

chat_bubble_outline3

repeat6

shareShare

AK

@_akhaliq

7 months ago

Meta just dropped ReasonIR on Hugging Face Training Retrievers for Reasoning Tasks

thumb_up_off_alt315

chat_bubble_outline5

repeat50

shareShare

When R1 came, I was thinking we should have a model trained to “reason” not only in English 🤔 Guess what, we show that with only English finetuning, the reasoning generalizes to other languages too! Models can also be “forced” to reason in other langs 🤯 However, more work

thumb_up_off_alt33

chat_bubble_outline1

repeat7

shareShare

Tong Chen @ ICLR

@tomchen0

6 months ago

LLMs naturally memorize some verbatim of pre-training data. We study whether post-training can be an effective way to mitigate unintentional reproduction of pre-training data. 🛠️ No changes to pre-training or decoding 🔥 Training models to latently distinguish between memorized

thumb_up_off_alt98

chat_bubble_outline1

repeat30

shareShare

Qiao Rui

@ray_qiaorui

6 months ago

Great work by Antoine Chaffin! It’s exciting to learn that training with the hard queries and hard negatives from our ReasonIR dataset can also significantly boost the performance of small models 😆 Looking forward to more advances in training efficient retrieval models!

thumb_up_off_alt13

chat_bubble_outline1

repeat2

shareShare

Stella Li

@stellalisy

6 months ago

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat322

shareShare

Jacqueline He

@jcqln_h

6 months ago

LMs often output answers that sound right but aren’t supported by input context. This is intrinsic hallucination: the generation of plausible, but unsupported content. We propose Precise Information Control (PIC): a task requiring LMs to ground only on given verifiable claims.

thumb_up_off_alt43

chat_bubble_outline1

repeat18

shareShare

Thao Nguyen

@thao_nguyen26

5 months ago

Web data, the “fossil fuel of AI”, is being exhausted. What’s next?🤔 We propose Recycling the Web to break the data wall of pretraining via grounded synthetic data. It is more effective than standard data filtering methods, even with multi-epoch repeats! arxiv.org/abs/2506.04689