Haoqin Tu (@haoqint) 's Twitter Profile
Haoqin Tu

@haoqint

Passionate researcher in #NLProc, #Multimodality, and #AISafety; Ph.D. Student @UCSC, @BaskinEng; Prev @UCAS1978.

ID: 1377915615055282180

linkhttp://haqtu.me calendar_today02-04-2021 09:27:47

118 Tweet

168 Followers

230 Following

Hardy Chen (@hardychen266091) 's Twitter Profile Photo

🚨New Paper: “SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models” Can multimodal reasoning be taught through imitation, or must it emerge through interaction? Paper: github.com/UCSC-VLAA/VLAA… Project Page: ucsc-vlaa.github.io/VLAA-Thinking/

🚨New Paper: “SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models”
Can multimodal reasoning be taught through imitation, or must it emerge through interaction?
Paper: github.com/UCSC-VLAA/VLAA…
Project Page: ucsc-vlaa.github.io/VLAA-Thinking/
Haoqin Tu (@haoqint) 's Twitter Profile Photo

Nice collab w Hardy Chen, solid insights that challenge the traditional path of “SFT, then RL” to train reasoning MLLMs. What we really need are reliable rewards in RL and diverse, high-quality data!

Cihang Xie (@cihangxie) 's Twitter Profile Photo

In this earlier post, we believed SFT would be crucial for multimodal reasoning models, thus releasing the VL-Thinking dataset to facilitate research in this direction. However, our recent findings show a surprising shift: SFT can hinder learning, often inducing

In this earlier post, we believed SFT would be crucial for multimodal reasoning models, thus releasing the VL-Thinking dataset to facilitate research in this direction.  

However, our recent findings show a surprising shift: SFT can hinder learning, often inducing
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

SFT blocks deep reasoning, but RL sparks genuine insights. They found removing SFT constraints ignites sharper multimodal thinking. This research warns that SFT hurts RL-based multimodal reasoning by up to 47%, unveils VLAA-Thinking with 150K+ examples, and secures #1 on the

SFT blocks deep reasoning, but RL sparks genuine insights. 
They found removing SFT constraints ignites sharper multimodal thinking.

This research warns that SFT hurts RL-based multimodal reasoning by up to 47%, unveils VLAA-Thinking with 150K+ examples, and secures #1 on the
Nathan Lambert (@natolambert) 's Twitter Profile Photo

First draft online version of The RLHF Book is DONE. Recently I've been creating the advanced discussion chapters on everything from Constitutional AI to evaluation and character training, but I also sneak in consistent improvements to the RL specific chapter.

First draft online version of The RLHF Book is DONE. Recently I've been creating the advanced discussion chapters on everything from Constitutional AI to evaluation and character training, but I also sneak in consistent improvements to the RL specific chapter.
Nataniel Ruiz (@natanielruizg) 's Twitter Profile Photo

I'm sharing the product of an exciting collaboration between UC Santa Cruz, Google and others. The first of its kind: a new complexity-controllable image editing benchmark suite called Complex-Edit to systematically evaluate image editing across varying complexity levels đź§µ

I'm sharing the product of an exciting collaboration between UC Santa Cruz, Google and others. The first of its kind: a new complexity-controllable image editing benchmark suite called Complex-Edit to systematically evaluate image editing across varying complexity levels đź§µ
Dimitris Papailiopoulos (@dimitrispapail) 's Twitter Profile Photo

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.
Percy Liang (@percyliang) 's Twitter Profile Photo

Announcing VHELM v2.1.2 for VLMs: We added the latest Gemini models, Qwen2.5-VL Instruct models, GPT 4.5 preview, o3, o4-mini, and Llama 4 Scout/Maverick. Prompts and predictions can be found on our website: crfm.stanford.edu/helm/vhelm/v2.…

Announcing VHELM v2.1.2 for VLMs: We added the latest Gemini models, Qwen2.5-VL Instruct models, GPT 4.5 preview, o3, o4-mini, and Llama 4 Scout/Maverick.  Prompts and predictions can be found on our website:
crfm.stanford.edu/helm/vhelm/v2.…
Cihang Xie (@cihangxie) 's Twitter Profile Photo

Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧 We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and

Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧

We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and
Percy Liang (@percyliang) 's Twitter Profile Photo

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:
elvis (@omarsar0) 's Twitter Profile Photo

Knowledge or Reasoning? Evaluation matters, and even more so when using reasoning LLMs. Look at final response accuracy, but also pay attention to thinking trajectories. Lots of good findings on this one. Here are my notes:

Knowledge or Reasoning?

Evaluation matters, and even more so when using reasoning LLMs. 

Look at final response accuracy, but also pay attention to thinking trajectories. 

Lots of good findings on this one. 

Here are my notes:
yuyin zhou@ICLR'25 (@yuyinzhou_cs) 's Twitter Profile Photo

Thanks elvis for the great summarization of our work "KNOWLEDGE or REASONING ? A Close Look at How LLMs Think Across Domains"! How can we build more reliable LLMs? 🤔 We focus on ensuring not just accurate final answers, but also high-quality reasoning 💡 & knowledge

Cihang Xie (@cihangxie) 's Twitter Profile Photo

Reasoning LLMs are now able to tackle much tougher questions than ever—but what really drives their success? Is it Knowledge 📖 or Reasoning 🤔? 🔎 We present a new step-by-step framework to evaluate how LLMs think. 🧵 Thread:

Reasoning LLMs are now able to tackle much tougher questions than ever—but what really drives their success? Is it Knowledge 📖 or Reasoning 🤔?

🔎 We present a new step-by-step framework to evaluate how LLMs think.

đź§µ Thread:
Chen Wei (@_chen_wei_) 's Twitter Profile Photo

💡 New work: You might not need math data to teach models math reasoning. Recent 🔥 RLVR works challenge the need of *labels* of math questions. We find just playing video games, eg. Snake, can boost multimodal reasoning. No math *questions* needed. arxiv.org/abs/2506.08011🧵👇

đź’ˇ New work: You might not need math data to teach models math reasoning.

Recent 🔥 RLVR works challenge the need of *labels* of math questions.

We find just playing video games, eg. Snake, can boost multimodal reasoning. No math *questions* needed.

arxiv.org/abs/2506.08011🧵👇
Pan Lu (@lupantech) 's Twitter Profile Photo

Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. ➡️ ineqmath.github.io To tackle

Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs.

➡️ ineqmath.github.io

To tackle
Haoqin Tu (@haoqint) 's Twitter Profile Photo

I’ll be giving a talk on our VLAA-Thinker🤔 at the Arize AI Observe event at SHACK15 next Wednesday, swing by to chat about visual-language reasoning models! Always happy to discuss broader ideas around multimodal reasoning and generative models too arize.com/observe-2025/a…

Cihang Xie (@cihangxie) 's Twitter Profile Photo

OpenVision is accepted by #ICCV2025 🥳🥳 Additionally, stay tuned for v2, arriving very soon with even greater efficiency and capability.