Shubham Agarwal (@shubhamag1992) 's Twitter Profile
Shubham Agarwal

@shubhamag1992

Human | He/Him
Staff RS (PhD). Works on Frontier LLM & VLM.
Prev. @Mila_Quebec | @ServiceNowRSRCH | @UMontreal
| @AdobeResearch | @naverlabseurope | @iitdaa

ID: 524192316

linkhttps://shubhamagarwal92.github.io calendar_today14-03-2012 09:39:23

285 Tweet

319 Followers

1,1K Following

Gaurav Sahu (@dem_fier) 's Twitter Profile Photo

🚀 #LitLLM demo is live! Try it here: litllm.onrender.com New features: • Export BibTeX citations • Add paper via URL Feedback welcome! GitHub: github.com/litllm/litllm Google Form: forms.gle/8WkULP6eho6zTq… Email: [email protected] Paper: arxiv.org/abs/2412.15249

Juan A. Rodríguez 💫 (@joanrod_ai) 's Twitter Profile Photo

StarVector poster happening now at CVPR! Come by poster #31 if you want to chat about vector graphics, image-to-code generation, or just say hi!

Sai Rajeswar (@rajeswarsai) 's Twitter Profile Photo

If you are at #CVPR, make sure to catch Joan Rodriguez and hear about scalable SVG generation straight from the source! 🚩Exhibition Hall D – Poster #31

Ai2 (@allen_ai) 's Twitter Profile Photo

New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released: 1️⃣ New benchmark for fair comparison of OCR engines and APIs 2️⃣ Improved inference that is faster and cheaper to run 3️⃣ Docker image for easy deployment

New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released:

1️⃣ New benchmark for fair comparison of OCR engines and APIs
2️⃣ Improved inference that is faster and cheaper to run
3️⃣ Docker image for easy deployment
Amir Zamir (@zamir_ar) 's Twitter Profile Photo

We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way. Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd

We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way.

Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd
Ziwei Liu (@liuziwei7) 's Twitter Profile Photo

📽️Expert-Level Cinematic Understanding in VLM📽️ #ShotBench: benchmark covering 8 core cinematography dimensions #ShotQA: 70k training dataset #ShotVL: 3B and 7B model surpassing GPT-4o on cinematic understanding - Project: vchitect.github.io/ShotBench-proj… - Code: github.com/Vchitect/ShotB…

Zhaochen Su (@suzhaochen0110) 's Twitter Profile Photo

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️
Our work offers a roadmap for more powerful & aligned AI. 🚀
📜 Paper: arxiv.org/pdf/2506.23918
⭐ GitHub (400+🌟): github.com/zhaochen0110/A…
Wenhu Chen (@wenhuchen) 's Twitter Profile Photo

#NewPaperAlert Since we released VLM2Vec, we have received surprising amount of attention from the community. The most common request is to expand VLM2Vec to more modalities like docs, videos, screenshots. Today, we are so excited to introduce VLM2Vec & MMEB-v2! 🚀 We're

#NewPaperAlert

Since we released VLM2Vec, we have received surprising amount of attention from the community. The most common request is to expand VLM2Vec to more modalities like docs, videos, screenshots.

Today, we are so excited to introduce VLM2Vec & MMEB-v2! 🚀 We're
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Oh wow, this VLM benchmark is pure evil, and I love it! "Vision Language Models are Biased" by An Vo, taesiri, Anh Totti Nguyen, etal. Also really good idea to have one-click copy-paste of images and prompts, makes trying it super easy.

Oh wow, this VLM benchmark is pure evil, and I love it!

"Vision Language Models are Biased" by <a href="/an_vo12/">An Vo</a>, <a href="/taesiri/">taesiri</a>, <a href="/anh_ng8/">Anh Totti Nguyen</a>, etal.

Also really good idea to have one-click copy-paste of images and prompts, makes trying it super easy.
Sasha Rush (@srush_nlp) 's Twitter Profile Photo

Think about this talk a lot. There was a time when people were bullish on "feed all the modalities to the LLM," but it didn't really pan out as I would have expected. The discrete / continuous divide remains a interesting challenge in deep learning.

Shubham Agarwal (@shubhamag1992) 's Twitter Profile Photo

We’ve released VoiceAgentBench to evaluate whether voice assistants can function as true agents (not just ASR/QA). Multilingual (English + Indic), tool-driven workflows & safety. Proud of our team. See thread 👇 #VoiceAgents #AgenticAI #SpeechAI #MultilingualAI

Gaurav Sahu (@dem_fier) 's Twitter Profile Photo

ever been here? open overleaf → write a paragraph → "hmm...this needs a citation" → open 15 different tabs → skim 8 abstracts → find the 1 actually relevant paper → format bibtex → paste it back on overleaf if so, i built a plugin just for you. meet openleaf: → reads