Zirui "Colin" Wang (@zwcolin) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

🔔 I'm recruiting multiple fully funded MSc/PhD students University of Alberta for Fall 2025! Join my lab working on NLP, especially reasoning and interpretability (see my website for more details about my research). Apply by December 15!

thumb_up_off_alt526

chat_bubble_outline16

repeat158

shareShare

Zirui "Colin" Wang

@zwcolin

7 months ago

🚨 I'll be presenting CharXiv this Friday morning at #neurips and Sunday at the MAR workshop. I'm 🤗 to connect with new friends and chat about developing/enhancing multimodal models (text-to-image, VLMs, etc) and their evaluations! Let's meet up at the conference :)

thumb_up_off_alt29

chat_bubble_outline0

repeat7

shareShare

Danqi Chen

@danqi_chen

6 months ago

I’ve just arrived in Vancouver and am excited to join the final stretch of #NeurIPS2024! This morning, we are presenting 3 papers 11am-2pm: - Edge pruning for finding Transformer circuits (#3111, spotlight) Adithya Bhaskar - SimPO (#3410) Yu Meng @ ICLR'25 Mengzhou Xia - CharXiv (#5303)

thumb_up_off_alt164

chat_bubble_outline0

repeat12

shareShare

Jing-Jing Li

@drjingjing2026

6 months ago

1/3 Today, an anecdote shared by an invited speaker at #NeurIPS2024 left many Chinese scholars, myself included, feeling uncomfortable. As a community, I believe we should take a moment to reflect on why such remarks in public discourse can be offensive and harmful.

thumb_up_off_alt3,3K

chat_bubble_outline191

repeat581

shareShare

Zirui "Colin" Wang

@zwcolin

6 months ago

I'll present CharXiv at tmr's Multimodal Algorithmic Reasoning workshop for a spotlight talk at 11:45am followed by a poster session at 2:15pm in West Building Exhibit Hall A. If you are interested in or working on developing/evaluating multimodal models, let's connect there!

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Kaiqu Liang

@kaiqu_liang

5 months ago

Think your RLHF-trained AI is aligned with your goals? ⚠️ We found that RLHF can induce significant misalignment when humans provide feedback by predicting future outcomes 🤔, creating incentives for LLM deception 😱 Introduce ✨RLHS (Hindsight Simulation)✨: By simulating

thumb_up_off_alt241

chat_bubble_outline4

repeat34

shareShare

Zirui "Colin" Wang

@zwcolin

5 months ago

While DeepSeek R1 has been flexing 💪🏻, how are VLMs progressing in 𝐫𝐞𝐚𝐬𝐨𝐧𝐢𝐧𝐠? ⚠️ Major Shift: the latest 𝐨𝐩𝐞𝐧-𝐰𝐞𝐢𝐠𝐡𝐭 Qwen2.5-VL has beaten the first GPT-4o and is now on par with the latest ChatGPT-4o! 😲 But what about o1-like models? Can they enhance

thumb_up_off_alt238

chat_bubble_outline10

repeat37

shareShare

Zirui "Colin" Wang

@zwcolin

4 months ago

Six years ago I was a high school senior, and my dream was to get into Berkeley for CS. I got rejected. I appealed. Still No. But that setback only made me stronger. I never let that dream down. And now? I made it. Finally, time to get to visit the campus and know everyone!

thumb_up_off_alt533

chat_bubble_outline36

repeat8

shareShare

Zirui "Colin" Wang

@zwcolin

4 months ago

It seems that models can figure out the correct rules with RL. I created a synthetic game to run GRPO on VLMs over the weekend and I didn't realize I wrote down the wrong rule for the instruction 🤦🏻‍♂️. With ~200 steps the model learns the corner cases where the wrong rule can

thumb_up_off_alt29

chat_bubble_outline1

repeat0

shareShare

Zirui "Colin" Wang

@zwcolin

4 months ago

Life update: I'll be joining Berkeley EECS as a PhD student starting in fall 2025, playing around with multimodal models and llms, being part of Sky Lab & BAIR, and enjoying the unreal™️ weather 🏖️ CA has to offer!

thumb_up_off_alt328

chat_bubble_outline21

repeat10

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

News: Search Arena is now LIVE! 🌐🔍 ✅ Test web-augmented LLM systems on real-time, real-world tasks — retrieval, writing, debugging & more. ✅ Perplexity, Gemini, OpenAI go head-to-head. ✅ Crowd-powered evals. Leaderboard 🏆 coming soon… ⚡Try it now at lmarena .ai!

thumb_up_off_alt483

chat_bubble_outline16

repeat78

shareShare

Princeton NLP Group

@princeton_nlp

3 months ago

Nothing like a sunny hike to welcome spring!

thumb_up_off_alt76

chat_bubble_outline0

repeat6

shareShare

Zirui "Colin" Wang

@zwcolin

2 months ago

i've been working on my masters' thesis and finally got something worth mentioning for the broader impact of the research work i did last year -- it's not another benchmark but an eval that people and devs care about and i'm ready to build more of them :p

thumb_up_off_alt23

chat_bubble_outline0

repeat1

shareShare

Alex Zhang

@a1zhang

2 months ago

Claude can play Pokemon, but can it play DOOM? With a simple agent, we let VLMs play it, and found Sonnet 3.7 to get the furthest, finding the blue room! Our VideoGameBench (twenty games from the 90s) and agent are open source so you can try it yourself now --> 🧵

thumb_up_off_alt416

chat_bubble_outline22

repeat56

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

2 months ago

We're excited to invite everyone to a new Beta version of LMArena! 🎉 For months, we’ve been poring through community feedback to improve the site—fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this

thumb_up_off_alt690

chat_bubble_outline33

repeat70

shareShare

Tianle (Tim) Li

@litianleli

2 months ago

🚨 Arena-Hard-v2.0 is here! 🚨 Major Improvement: - Better Automatic Judges (Gemini-2.5 & GPT-4.1) 🦾 - 500 Fresh Prompts from LMArena🗿 - Tougher Baselines 🏋️ - Multilingual (30+ Langs) 🌎 - Plus Eval for Creative Writing ✍️ Test your model on the hardest prompts from LMArena!

thumb_up_off_alt214

chat_bubble_outline4

repeat30

shareShare

Xindi Wu

@cindy_x_wu

2 months ago

Introducing COMPACT: COMPositional Atomic-to-complex Visual Capability Tuning, a data-efficient approach to improve multimodal models on complex visual tasks without scaling data volume. 📦 arxiv.org/abs/2504.21850 1/10

thumb_up_off_alt149

chat_bubble_outline6

repeat42

shareShare

MLPC Group

@mlpcucsd

2 months ago

We’re thrilled that our lab’s work on “Deeply-Supervised Nets” has received the Test-of-Time Award at AISTATS 2025! 🏆 This prestigious award honors papers published 10 years ago that have had a lasting and significant impact on the field of artificial intelligence and

thumb_up_off_alt13

chat_bubble_outline1

repeat8

shareShare

Zirui "Colin" Wang

Gate.io

Xi Ye

Zirui "Colin" Wang

Danqi Chen

Jing-Jing Li

Zirui "Colin" Wang

Kaiqu Liang

Zirui "Colin" Wang

Zirui "Colin" Wang

Zirui "Colin" Wang

Zirui "Colin" Wang

lmarena.ai (formerly lmsys.org)

Princeton NLP Group

Zirui "Colin" Wang

Alex Zhang

lmarena.ai (formerly lmsys.org)

Tianle (Tim) Li

Xindi Wu

MLPC Group