Haoqin Tu (@haoqint) Twitter Tweets • TwiCopy

Hardy Chen

7 months ago

🚨New Paper: “SFT or RL? An Early Investigation into Training R1-Like Reasoning Large Vision-Language Models” Can multimodal reasoning be taught through imitation, or must it emerge through interaction? Paper: github.com/UCSC-VLAA/VLAA… Project Page: ucsc-vlaa.github.io/VLAA-Thinking/

thumb_up_off_alt65

chat_bubble_outline2

repeat5

shareShare

Haoqin Tu

@haoqint

7 months ago

Nice collab w Hardy Chen, solid insights that challenge the traditional path of “SFT, then RL” to train reasoning MLLMs. What we really need are reliable rewards in RL and diverse, high-quality data!

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Cihang Xie

@cihangxie

7 months ago

In this earlier post, we believed SFT would be crucial for multimodal reasoning models, thus releasing the VL-Thinking dataset to facilitate research in this direction. However, our recent findings show a surprising shift: SFT can hinder learning, often inducing

thumb_up_off_alt172

chat_bubble_outline4

repeat38

shareShare

Rohan Paul

@rohanpaul_ai

7 months ago

SFT blocks deep reasoning, but RL sparks genuine insights. They found removing SFT constraints ignites sharper multimodal thinking. This research warns that SFT hurts RL-based multimodal reasoning by up to 47%, unveils VLAA-Thinking with 150K+ examples, and secures #1 on the

thumb_up_off_alt276

chat_bubble_outline4

repeat58

shareShare

Nathan Lambert

@natolambert

7 months ago

First draft online version of The RLHF Book is DONE. Recently I've been creating the advanced discussion chapters on everything from Constitutional AI to evaluation and character training, but I also sneak in consistent improvements to the RL specific chapter.

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat214

shareShare

Nataniel Ruiz

@natanielruizg

7 months ago

I'm sharing the product of an exciting collaboration between UC Santa Cruz, Google and others. The first of its kind: a new complexity-controllable image editing benchmark suite called Complex-Edit to systematically evaluate image editing across varying complexity levels 🧵

thumb_up_off_alt94

chat_bubble_outline2

repeat20

shareShare

Haoqin Tu

@haoqint

6 months ago

Great to see our ViLBench appears in the multimodality section! Looking forward to more exciting works in this area!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Dimitris Papailiopoulos

@dimitrispapail

6 months ago

We’ve been cooking... a new open weights 14B Phi-4 reasoning model, SFT’d on ~1.4M carefully curated reasoning demonstrations from o3-mini and RL’d for a tiny bit. This model is a little beast.

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat237

shareShare

Percy Liang

@percyliang

6 months ago

Announcing VHELM v2.1.2 for VLMs: We added the latest Gemini models, Qwen2.5-VL Instruct models, GPT 4.5 preview, o3, o4-mini, and Llama 4 Scout/Maverick. Prompts and predictions can be found on our website: crfm.stanford.edu/helm/vhelm/v2.…

thumb_up_off_alt69

chat_bubble_outline3

repeat25

shareShare

Cihang Xie

@cihangxie

6 months ago

Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧 We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat193

shareShare

Alexander Kolesnikov

@__kolesnikov__

6 months ago

So cool to see big_vision (github.com/google-researc…) is used as a foundation for open source projects, even completely outside of google. Lucas Beyer (bl16) @xzhai André Susano Pinto Andreas Steiner

thumb_up_off_alt63

chat_bubble_outline0

repeat14

shareShare

Percy Liang

@percyliang

5 months ago

What would truly open-source AI look like? Not just open weights, open code/data, but *open development*, where the entire research and development process is public *and* anyone can contribute. We built Marin, an open lab, to fulfill this vision:

thumb_up_off_alt939

chat_bubble_outline39

repeat185

shareShare

elvis

@omarsar0

5 months ago

Knowledge or Reasoning? Evaluation matters, and even more so when using reasoning LLMs. Look at final response accuracy, but also pay attention to thinking trajectories. Lots of good findings on this one. Here are my notes:

thumb_up_off_alt529

chat_bubble_outline16

repeat106

shareShare

yuyin zhou@ICLR'25

@yuyinzhou_cs

5 months ago

Thanks elvis for the great summarization of our work "KNOWLEDGE or REASONING ? A Close Look at How LLMs Think Across Domains"! How can we build more reliable LLMs? 🤔 We focus on ensuring not just accurate final answers, but also high-quality reasoning 💡 & knowledge

thumb_up_off_alt34

chat_bubble_outline1

repeat11

shareShare

Cihang Xie

@cihangxie

5 months ago

Reasoning LLMs are now able to tackle much tougher questions than ever—but what really drives their success? Is it Knowledge 📖 or Reasoning 🤔? 🔎 We present a new step-by-step framework to evaluate how LLMs think. 🧵 Thread:

thumb_up_off_alt136

chat_bubble_outline3

repeat32

shareShare

Chen Wei

@_chen_wei_

5 months ago

💡 New work: You might not need math data to teach models math reasoning. Recent 🔥 RLVR works challenge the need of *labels* of math questions. We find just playing video games, eg. Snake, can boost multimodal reasoning. No math *questions* needed. arxiv.org/abs/2506.08011🧵👇

thumb_up_off_alt263

chat_bubble_outline7

repeat55

shareShare

Pan Lu

@lupantech

5 months ago

Do LLMs truly understand math proofs, or just guess? 🤔Our new study on #IneqMath dives deep into Olympiad-level inequality proofs & reveals a critical gap: LLMs are often good at finding answers, but struggle with rigorous, sound proofs. ➡️ ineqmath.github.io To tackle

thumb_up_off_alt180

chat_bubble_outline11

repeat40

shareShare

Haoqin Tu

@haoqint

4 months ago

I’ll be giving a talk on our VLAA-Thinker🤔 at the Arize AI Observe event at SHACK15 next Wednesday, swing by to chat about visual-language reasoning models! Always happy to discuss broader ideas around multimodal reasoning and generative models too arize.com/observe-2025/a…

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare