Yong Zheng-Xin (Yong) (@yong_zhengxin) 's Twitter Profile
Yong Zheng-Xin (Yong)

@yong_zhengxin

🎯 reasoning and alignment
🌎 making LLMs safe and helpful for everyone
📍 phd @BrownCSDept + research @AIatMeta @Cohere_Labs

ID: 955679485273153537

linkhttps://yongzx.github.io/ calendar_today23-01-2018 05:51:59

553 Tweet

1,1K Takipçi

1,1K Takip Edilen

AK (@_akhaliq) 's Twitter Profile Photo

Crosslingual Reasoning through Test-Time Scaling TL;DR: show that scaling up thinking tokens of English-centric reasoning language models, such as s1 models, can improve multilingual math reasoning performance. Also analyze the language-mixing patterns, effects of different

Crosslingual Reasoning through Test-Time Scaling

TL;DR: show that scaling up thinking tokens of English-centric reasoning language models, such as s1 models, can improve multilingual math reasoning performance. Also analyze the language-mixing patterns, effects of different
Stephen Bach (@stevebach) 's Twitter Profile Photo

Really interesting findings from Yong and many great collaborators. Test-time scaling generalizes cross-lingually, but maybe not in the way you’d hope. S1 tends to quote in the original language and then think in English.

Ruochen Zhang not @ ICLR (@ruochenz_) 's Twitter Profile Photo

When R1 came, I was thinking we should have a model trained to “reason” not only in English 🤔 Guess what, we show that with only English finetuning, the reasoning generalizes to other languages too! Models can also be “forced” to reason in other langs 🤯 However, more work

MKI (@mki028) 's Twitter Profile Photo

exploring why such mechanism occurs either from the inner mechanism and also from the data itself (using data attrib method) seems intriguing to look for :side eyes:

Yong Zheng-Xin (Yong) (@yong_zhengxin) 's Twitter Profile Photo

It has been such a great experience collaborating with Julia from Cohere Labs ! Come check out our new work on how test-time scaling of English-centric models improves crosslingual reasoning 🔥 📜 arxiv.org/abs/2505.05408

Niklas Muennighoff (@muennighoff) 's Twitter Profile Photo

In 2022, with Yong Zheng-Xin (Yong) & team, we showed that models trained to follow instructions in English can follow instructions in other languages. Our new work below shows that models trained to reason in English can also reason in other languages!

Genta Winata (@gentaiscool) 's Twitter Profile Photo

⭐️Reasoning LLMs trained on English data can think in other languages. Read our paper to learn more! Thank you Yong Zheng-Xin (Yong) for leading the project and team! It was an exciting colab! farid Jonibek Mansurov Ruochen Zhang Niklas Muennighoff Carsten Eickhoff Julia Kreutzer

Alham Fikri Aji (@alhamfikri) 's Twitter Profile Photo

🚨Multilingual LLMs, finetuned only on English reasoning data, can still reason when asked non-English questions, showing reasoning traces that go back & forth between languages. I had so much fun working on this project Please give our paper a read! arxiv.org/abs/2505.05408

🚨Multilingual LLMs, finetuned only on English reasoning data, can still reason when asked non-English questions, showing reasoning traces that go back & forth between languages.

I had so much fun working on this project

Please give our paper a read!
arxiv.org/abs/2505.05408
farid (@faridlazuarda) 's Twitter Profile Photo

Can English-finetuned LLMs reason in other languages? Short Answer: Yes, thanks to “quote-and-think” + test-time scaling. You can even force them to reason in a target language! But: 🌐 Low-resource langs & non-STEM topics still tough. New paper: arxiv.org/abs/2505.05408

Shan Chen (@shan23chen) 's Twitter Profile Photo

Designing a hard but useful benchmark has always been a passion of mine. Here we present MedBrowseComp, a deep research + computer use benchmark that is easy to verify (like BrowseComp from OpenAI) but still very expandable 💊! Project page: moreirap12.github.io/mbc-browse-app/ 1/n

Designing a hard but useful benchmark has always been a passion of mine. Here we present MedBrowseComp, a deep research + computer use benchmark that is easy to verify (like BrowseComp from <a href="/OpenAI/">OpenAI</a>) but still very expandable 💊!

Project page:
moreirap12.github.io/mbc-browse-app/

1/n
Brown CS (@browncsdept) 's Twitter Profile Photo

Congratulations to Brown CS faculty members Stephen Bach, Ugur Çetintemel, Ellie Pavlick, and Nikos Vasilakis, who have received Brown University's OVPR Seed Award and Salomon Faculty Research Award honors! Learn more about their work at Brown CS News: cs.brown.edu/news/2025/05/2…

Congratulations to <a href="/BrownCSDept/">Brown CS</a> faculty members <a href="/stevebach/">Stephen Bach</a>, Ugur Çetintemel, Ellie Pavlick, and <a href="/nikosvasilakis/">Nikos Vasilakis</a>, who have received <a href="/BrownUniversity/">Brown University</a>'s OVPR Seed Award and Salomon Faculty Research Award honors! Learn more about their work at Brown CS News: cs.brown.edu/news/2025/05/2…