Mohit Bansal (@mohitban47) Twitter Tweets • TwiCopy

CoLLAs 2025

4 months ago

🤖 Jaehong Yoon (NTU Singapore) Talk: Toward Continually Growing Embodied AIs via Selective and Purposeful Experience From multimodal LLMs to LLM-generated training environments, Jaehong shows how purposeful experience helps agents grow efficiently. #EmbodiedAI #ContinualLearning

thumb_up_off_alt11

chat_bubble_outline1

repeat7

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

4 months ago

Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents "We present BIFROST-1, a unified framework that bridges pretrained multimodal LLMs (MLLMs) and diffusion models using patch-level CLIP image embeddings as latent variables, which are natively

thumb_up_off_alt225

chat_bubble_outline3

repeat38

shareShare

Han Lin

@hanlin_hl

4 months ago

🤔 Can we bridge MLLMs and diffusion models more natively and efficiently, by having MLLMs produce patch-level CLIP latents already aligned with their visual encoders, while fully preserving MLLM's visual reasoning capabilities? Introducing Bifrost-1: 🌈 > High-Fidelity

thumb_up_off_alt131

chat_bubble_outline2

repeat47

shareShare

AK

@_akhaliq

4 months ago

Bifrost-1 Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

thumb_up_off_alt184

chat_bubble_outline5

repeat35

shareShare

Zun Wang

@zunwang919

4 months ago

Check this amazing work!

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Jaemin Cho (on faculty job market)

@jmin__cho

4 months ago

Introducing Bifrost-1 Previous approaches to combine LLMs with diffusion models for image generation train the LLMs to produce visual tokens—essentially "a foreign language" they must learn—to communicate with the diffusion model What if MLLMs could connect to diffusion models

thumb_up_off_alt30

chat_bubble_outline1

repeat9

shareShare

Han Lin

@hanlin_hl

4 months ago

Thanks AK for sharing our work! For readers interested in our work, please check our project page: bifrost-1.github.io And here is our thread with more details: x.com/hanlin_hl/stat…

thumb_up_off_alt31

chat_bubble_outline3

repeat11

shareShare

Jaemin Cho (on faculty job market)

@jmin__cho

4 months ago

📢 Introducing RotBench, which tests whether SoTA MLLMs (e.g., GPT-5, GPT-4o, o3, Gemini-2.5-pro) can identify the rotation of input images (0°, 90°, 180°, and 270°). Even frontier MLLMs struggle at this spatial reasoning task that humans solve with >98% Acc. ➡️ Models struggle

thumb_up_off_alt85

chat_bubble_outline2

repeat37

shareShare

Tianyi Niu

@niu_tianyi

4 months ago

📢 Excited to announce RotBench! We show that the intuitive task of identifying image rotation is challenging for SoTA MLLMs - even with various forms of auxiliary information (captions, depth maps, segmentation maps), CoT reasoning, ICL, or other guided reasoning approaches.

thumb_up_off_alt20

chat_bubble_outline2

repeat7

shareShare

Elias Stengel-Eskin (on the faculty job market)

@eliaseskin

4 months ago

🚨 Excited to share RotBench, where we evaluate MLLMs' ability to identify rotation in images. Although humans achieve near 100% accuracy on this, MLLMs struggle across the board, especially with identifying 90° and 270° rotations. We tested a lot of possible solutions (CoT,

thumb_up_off_alt18

chat_bubble_outline1

repeat6

shareShare

AK

@_akhaliq

4 months ago

RotBench Evaluating Multimodal Large Language Models on Identifying Image Rotation

thumb_up_off_alt103

chat_bubble_outline3

repeat20

shareShare

Jaemin Cho (on faculty job market)

@jmin__cho

4 months ago

Thanks for sharing our paper! For those interested in our detailed eval + analysis, see our thread here --> x.com/jmin__cho/stat…

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Ziyang Wang

@ziyangw00

4 months ago

🎉Our Video-RTS paper has been accepted at #EMNLP2025 Main!! We propose a novel video reasoning approach that combines data-efficient reinforcement learning (GRPO) with video-adaptive test-time scaling, improving reasoning performance while maintaining efficiency on multiple

thumb_up_off_alt39

chat_bubble_outline1

repeat28

shareShare

Justin Chih-Yao Chen

@cyjustinchen

4 months ago

Excited to share that MAgICoRe has been accepted to #EMNLP2025 main! 🎉 Our work identifies 3 key challenges in LLM refinement for reasoning: 1) Over-correction on easy problems 2) Fail to localize and fix its own errors 3) Too few refinement iterations for harder problems

thumb_up_off_alt98

chat_bubble_outline0

repeat36

shareShare

Jaehong Yoon (on the faculty job market)

@jaeh0ng_yoon

3 months ago

🎉 RACCooN got accepted at #EMNLP2025 Main! 🚀 Our MLLM+Video Diffusion (Video-to-Paragraph-to-Video, V2P2V) framework enables effortless video editing w/ auto-generated descriptions, multi-granular pooling & mask planning. RACCooN Achieves +9.4%p human eval & 49.7%↓ FVD,

thumb_up_off_alt70

chat_bubble_outline1

repeat22

shareShare

Shoubin Yu✈️ICLR 2025🇸🇬

@shoubin621

3 months ago

🎉Excited to share that our MEXA paper is accepted to #EMNLP2025 Findings! 🚀MEXA is a general, training-free multimodal reasoning framework that dynamically selects and aggregates experts/skills for deep, free-form reasoning, and is flexible & extensible to new

thumb_up_off_alt50

chat_bubble_outline1

repeat19

shareShare

Daeun Lee

@danadaeun

3 months ago

🎉 Excited to share that our Video-Skill-CoT paper has been accepted to #EMNLP2025 Findings! Video-Skill-CoT is a domain-adaptive video reasoning framework that automatically constructs skill-aware Chain-of-Thought (CoT) supervisions. It builds a shared skill taxonomy from

thumb_up_off_alt83

chat_bubble_outline0

repeat20

shareShare

Jaehong Yoon (on the faculty job market)

@jaeh0ng_yoon

3 months ago

🥳🥳 Excited to share that our work GLIDER (Global and Local Instruction-Driven Expert Router) has been accepted to #EMNLP2025 main conference! Our approach tackles a critical challenge in MoE routing: existing methods excel at either held-in OR held-out tasks, but rarely both.

thumb_up_off_alt63

chat_bubble_outline2

repeat17

shareShare

Elias Stengel-Eskin (on the faculty job market)

@eliaseskin

3 months ago

🚨 Excited to share new work on LLMs and loopholes, accepted to #EMNLP2025 main! When models are faced with conflicting goals and ambiguous instructions that would let them exploit a loophole, many of the strongest models (Qwen, GPT4o, Claude, Gemini) do. This is a new risk and

thumb_up_off_alt98

chat_bubble_outline3

repeat29

shareShare

JinYeong Bak

@nosyu

3 months ago

Thank you to Jaemin Cho for his insightful talk on "Modular and Interpretable Multimodal AI for Improved Generation and Evaluation."

Thank you to <a href="/jmin__cho/">Jaemin Cho</a> for his insightful talk on "Modular and Interpretable Multimodal AI for Improved Generation and Evaluation."

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare