Runlong Zhou (@vectorzhou) 's Twitter Profile
Runlong Zhou

@vectorzhou

PhD student @uwcse; Prev: Undergrad at IIIS, @Tsinghua_Uni

ID: 3437500454

linkhttp://vectorzhou.com calendar_today03-09-2015 13:58:28

86 Tweet

184 Takipçi

298 Takip Edilen

AniPlaylist (@aniplaylist) 's Twitter Profile Photo

🚨 FINAL FANTASY XVI Original Soundtracks on music streaming platforms At midnight, on September 18, FINAL FANTASY XVI & DLC OSTs will finally be released on music streaming platforms! 🔥 Links for Spotify & Apple Music below

🚨 FINAL FANTASY XVI Original Soundtracks on music streaming platforms  

At midnight, on September 18, FINAL FANTASY XVI & DLC OSTs will finally be released on music streaming platforms!  

🔥 Links for Spotify & Apple Music below
Runlong Zhou (@vectorzhou) 's Twitter Profile Photo

Feel free to stop by our poster if you are interested in online DPO and RLHF. Our proposed method -- a mixture of uniform sampler and policy-dependent sampler -- provably achieves a quadratic convergence rate!

Avinandan Bose (@avibose22) 's Twitter Profile Photo

🧠 Your LLM should model how you think, not reduce you to preassigned traits 📢 Introducing LoRe: a low-rank reward modeling framework for personalized RLHF ❌ Demographic grouping/handcrafted traits ✅ Infers implicit preferences ✅ Few-shot adaptation 📄 arxiv.org/abs/2504.14439

🧠 Your LLM should model how you think, not reduce you to preassigned traits
📢 Introducing LoRe: a low-rank reward modeling framework for personalized RLHF
❌ Demographic grouping/handcrafted traits
✅ Infers implicit preferences
✅ Few-shot adaptation
📄 arxiv.org/abs/2504.14439
Ruizhe Shi (@smellycat_zzz) 's Twitter Profile Photo

Two-stage RLHF or one-stage DPO: Which one is better for learning from preferences? Equal under strong assumptions, but representation differences break the tie. Our paper reveals their fine-grained performance gaps under various conditions. paper: arxiv.org/abs/2505.19770

Two-stage RLHF or one-stage DPO: Which one is better for learning from preferences?

Equal under strong assumptions, but representation differences break the tie. Our paper reveals their fine-grained performance gaps under various conditions.

paper: arxiv.org/abs/2505.19770
elvis (@omarsar0) 's Twitter Profile Photo

The Illusion of Thinking in LLMs Apple researchers discuss the strengths and limitations of reasoning models. Apparently, reasoning models "collapse" beyond certain task complexities. Lots of important insights on this one. (bookmark it!) Here are my notes:

The Illusion of Thinking in LLMs

Apple researchers discuss the strengths and limitations of reasoning models.

Apparently, reasoning models "collapse" beyond certain task complexities.

Lots of important insights on this one. (bookmark it!)

Here are my notes:
Runlong Zhou (@vectorzhou) 's Twitter Profile Photo

Thrilled to announce that CASCADE is accepted to #COLM2025! It is one of the most interesting project I've done this far -- so I'm grateful to all the reviewers and ACs for helping me to improve the paper. Most importantly! Thank you Yi Zhang for your mentorship and support

Ruoming Pang (@ruomingpang) 's Twitter Profile Photo

In this report we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model. machinelearning.apple.com/research/apple…

Yi Wu (@jxwuyi) 's Twitter Profile Photo

Tired intricate system code for RL training? 🤯 We release AReaL-lite – A lightweight AReaL version for AI researchers! 🚀#opensource ✨ Algorithm-first design & APIs🎉 ✨ 80% less code w. 90% AReaL's full efficiency 🎉 ✨ Customizable agentic RL🎉 🔗 github.com/inclusionAI/AR…

Tired intricate system code for RL training? 🤯 
We release AReaL-lite – A lightweight AReaL version for AI researchers! 🚀#opensource
✨ Algorithm-first design & APIs🎉
✨ 80% less code w. 90% AReaL's full efficiency 🎉
✨ Customizable agentic RL🎉
🔗 github.com/inclusionAI/AR…