David Wan (@meetdavidwan) 's Twitter Profile
David Wan

@meetdavidwan

PhD student at UNC-Chapel Hill (@uncnlp), advised by @mohitban47. @Google PhD Fellow. @AmazonScience, @MetaAI, and @SFResearch intern.

ID: 1101184093524553729

linkhttps://meetdavidwan.github.io/ calendar_today28-02-2019 18:15:21

178 Tweet

452 Followers

470 Following

Elias Stengel-Eskin (on the faculty job market) (@eliaseskin) 's Twitter Profile Photo

Excited to announce CLaMR, our new retriever for multimodal documents! Strong performance improvements (+25 nDGC@10) compared to both multimodal and unimodal retrieval baselines. 🤝 CLaMR jointly encodes multiple modalities and selects the most relevant ones for each query. 🏋️‍♂️

Han Wang (@hanwang98) 's Twitter Profile Photo

How can a multimodal retriever accurately retrieve docs from massive online video content that spans multiple modalities? We introduce CLaMR, a contextualized late-interaction retriever that jointly encodes all modalities and dynamically selects those containing the relevant

Jaemin Cho (on faculty job market) (@jmin__cho) 's Twitter Profile Photo

Introducing CLaMR -- a late-interaction retriever for complex multimodal video content! 📽️📚 ➡️ Jointly encodes frames, speech, on-screen text, and metadata to answer diverse queries grounded across modalities ➡️ Trained with a new dataset we introduce, MultiVENT 2.0++, a

Arie Cattan (@ariecattan) 's Twitter Profile Photo

🚨 RAG is a popular approach but what happens when the retrieved sources provide conflicting information?🤔 We're excited to introduce our paper: “DRAGged into CONFLICTS: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs”🚀 A thread 🧵👇

🚨 RAG is a popular approach but what happens when the retrieved sources provide conflicting information?🤔

We're excited to introduce our paper: 
“DRAGged into CONFLICTS: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs”🚀

A thread 🧵👇
Ziyang Wang (@ziyangw00) 's Twitter Profile Photo

Excited to present VideoTree🌲 at #CVPR2025 Fri at 10:30AM! VideoTree improves long-video QA via smart sampling: -Query-adaptive: finds the parts of the video relevant to the query -Coarse-to-fine structure: structured hierarchically to sample granularly from relevant segments

David Wan (@meetdavidwan) 's Twitter Profile Photo

Thanks for the discovering + sharing our work on contextualized late-interaction based multimodal content retrieval, Omar! (and ColBERT is awesome of course) 😀

Elias Stengel-Eskin (on the faculty job market) (@eliaseskin) 's Twitter Profile Photo

🚨 Excited to announce GenerationPrograms (GP) which generates inherently attributed text by asking LLMs to produce a program that executes to text. Following the program trace gives us a causal understanding of how the text was generated, with major benefits: ➡️ Attribution

Eran Hirsch (@hirscheran) 's Twitter Profile Photo

In RAG applications, self-citation methods are prone to make attribution mistakes because there is no inductive bias for LLMs to track which source supports each statement. We propose GenerationPrograms: first generate a clear plan, then use that plan to guide generation. That

David Wan (@meetdavidwan) 's Twitter Profile Photo

🎉 Our paper, GenerationPrograms, which proposes a modular framework for attributable text generation, has been accepted to Conference on Language Modeling! GenerationPrograms produces a program that executes to text, providing an auditable trace of how the text was generated and major gains on

Han Lin (@hanlin_hl) 's Twitter Profile Photo

🤔 Can we bridge MLLMs and diffusion models more natively and efficiently, by having MLLMs produce patch-level CLIP latents already aligned with their visual encoders, while fully preserving MLLM's visual reasoning capabilities? Introducing Bifrost-1: 🌈 > High-Fidelity

🤔 Can we bridge MLLMs and diffusion models more natively and efficiently, by having MLLMs produce patch-level CLIP latents already aligned with their visual encoders, while fully preserving MLLM's visual reasoning capabilities?

Introducing Bifrost-1: 🌈

> High-Fidelity
Jaemin Cho (on faculty job market) (@jmin__cho) 's Twitter Profile Photo

📢 Introducing RotBench, which tests whether SoTA MLLMs (e.g., GPT-5, GPT-4o, o3, Gemini-2.5-pro) can identify the rotation of input images (0°, 90°, 180°, and 270°). Even frontier MLLMs struggle at this spatial reasoning task that humans solve with >98% Acc. ➡️ Models struggle

📢 Introducing RotBench, which tests whether SoTA MLLMs (e.g., GPT-5, GPT-4o, o3, Gemini-2.5-pro) can identify the rotation of input images (0°, 90°, 180°, and 270°). Even frontier MLLMs struggle at this spatial reasoning task that humans solve with >98% Acc.

➡️ Models struggle
Ziyang Wang (@ziyangw00) 's Twitter Profile Photo

🎉Our Video-RTS paper has been accepted at #EMNLP2025 Main!! We propose a novel video reasoning approach that combines data-efficient reinforcement learning (GRPO) with video-adaptive test-time scaling, improving reasoning performance while maintaining efficiency on multiple

Justin Chih-Yao Chen (@cyjustinchen) 's Twitter Profile Photo

Excited to share that MAgICoRe has been accepted to #EMNLP2025 main! 🎉 Our work identifies 3 key challenges in LLM refinement for reasoning: 1) Over-correction on easy problems 2) Fail to localize and fix its own errors 3) Too few refinement iterations for harder problems