Shubham Agarwal (@shubhamag1992) Twitter Tweets • TwiCopy

Shubham Agarwal

@shubhamag1992

+ Follow

ID: 524192316

linkhttps://shubhamagarwal92.github.io calendar_today14-03-2012 09:39:23

285 Tweet

319 Followers

1,1K Following

Gaurav Sahu

@dem_fier

a year ago

🚀 #LitLLM demo is live! Try it here: litllm.onrender.com New features: • Export BibTeX citations • Add paper via URL Feedback welcome! GitHub: github.com/litllm/litllm Google Form: forms.gle/8WkULP6eho6zTq… Email: [email protected] Paper: arxiv.org/abs/2412.15249

thumb_up_off_alt19

chat_bubble_outline1

repeat7

shareShare

Nicolas Chapados

@nicolaschapados

a year ago

All good science starts with a good literature review, and AI can help with that :) LitLLM, brought to you by the awesome team at ServiceNow Research.

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

Juan A. Rodríguez 💫

@joanrod_ai

10 months ago

StarVector poster happening now at CVPR! Come by poster #31 if you want to chat about vector graphics, image-to-code generation, or just say hi!

thumb_up_off_alt17

chat_bubble_outline0

repeat5

shareShare

Sai Rajeswar

@rajeswarsai

10 months ago

If you are at #CVPR, make sure to catch Joan Rodriguez and hear about scalable SVG generation straight from the source! 🚩Exhibition Hall D – Poster #31

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Ai2

@allen_ai

10 months ago

New updates for olmOCR, our fully open toolkit for transforming documents (PDFs & images) into clean markdown. We released: 1️⃣ New benchmark for fair comparison of OCR engines and APIs 2️⃣ Improved inference that is faster and cheaper to run 3️⃣ Docker image for easy deployment

thumb_up_off_alt286

chat_bubble_outline7

repeat40

shareShare

Amir Zamir

@zamir_ar

10 months ago

We open-sourced the codebase of Flextok. Flextok is an image tokenizer that produces flexible-length token sequences and represents image content in a compressed coarse-to-fine way. Like in PCA: the 1st token captures the most compressed representation of the image, the 2nd

thumb_up_off_alt475

chat_bubble_outline6

repeat81

shareShare

Ziwei Liu

@liuziwei7

10 months ago

📽️Expert-Level Cinematic Understanding in VLM📽️ #ShotBench: benchmark covering 8 core cinematography dimensions #ShotQA: 70k training dataset #ShotVL: 3B and 7B model surpassing GPT-4o on cinematic understanding - Project: vchitect.github.io/ShotBench-proj… - Code: github.com/Vchitect/ShotB…

thumb_up_off_alt104

chat_bubble_outline0

repeat13

shareShare

Hieu Pham

@hyhieu226

10 months ago

Physicists and mathematicians really have the wriest humor.

thumb_up_off_alt1,1K

chat_bubble_outline43

repeat199

shareShare

Zhaochen Su

@suzhaochen0110

10 months ago

Excited to share our new survey on the reasoning paradigm shift from "Think with Text" to "Think with Image"! 🧠🖼️ Our work offers a roadmap for more powerful & aligned AI. 🚀 📜 Paper: arxiv.org/pdf/2506.23918 ⭐ GitHub (400+🌟): github.com/zhaochen0110/A…

thumb_up_off_alt160

chat_bubble_outline7

repeat61

shareShare

Wenhu Chen

@wenhuchen

9 months ago

#NewPaperAlert Since we released VLM2Vec, we have received surprising amount of attention from the community. The most common request is to expand VLM2Vec to more modalities like docs, videos, screenshots. Today, we are so excited to introduce VLM2Vec & MMEB-v2! 🚀 We're

thumb_up_off_alt75

chat_bubble_outline1

repeat16

shareShare

AK

@_akhaliq

9 months ago

Perception-Aware Policy Optimization for Multimodal Reasoning

thumb_up_off_alt189

chat_bubble_outline6

repeat36

shareShare

Lucas Beyer (bl16)

@giffmana

8 months ago

Oh wow, this VLM benchmark is pure evil, and I love it! "Vision Language Models are Biased" by An Vo, taesiri, Anh Totti Nguyen, etal. Also really good idea to have one-click copy-paste of images and prompts, makes trying it super easy.

Oh wow, this VLM benchmark is pure evil, and I love it!

"Vision Language Models are Biased" by <a href="/an_vo12/">An Vo</a>, <a href="/taesiri/">taesiri</a>, <a href="/anh_ng8/">Anh Totti Nguyen</a>, etal.

Also really good idea to have one-click copy-paste of images and prompts, makes trying it super easy.

thumb_up_off_alt937

chat_bubble_outline32

repeat75

shareShare

Sasha Rush

@srush_nlp

5 months ago

Think about this talk a lot. There was a time when people were bullish on "feed all the modalities to the LLM," but it didn't really pan out as I would have expected. The discrete / continuous divide remains a interesting challenge in deep learning.

thumb_up_off_alt222

chat_bubble_outline12

repeat20

shareShare

Shubham Agarwal

@shubhamag1992

5 months ago

Check out our work at #EMNLP2025 tomorrow for building cultural post training dataset for Indian languages!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Shubham Agarwal

@shubhamag1992

2 months ago

We’ve released VoiceAgentBench to evaluate whether voice assistants can function as true agents (not just ASR/QA). Multilingual (English + Indic), tool-driven workflows & safety. Proud of our team. See thread 👇 #VoiceAgents #AgenticAI #SpeechAI #MultilingualAI

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Gaurav Sahu

@dem_fier

a month ago

ever been here? open overleaf → write a paragraph → "hmm...this needs a citation" → open 15 different tabs → skim 8 abstracts → find the 1 actually relevant paper → format bibtex → paste it back on overleaf if so, i built a plugin just for you. meet openleaf: → reads

thumb_up_off_alt817

chat_bubble_outline27

repeat106

shareShare