Pavankumar Vasu (@pavankumarvasu) 's Twitter Profile
Pavankumar Vasu

@pavankumarvasu

ID: 1586729245

calendar_today11-07-2013 20:12:38

24 Tweet

136 Followers

117 Following

Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

Introducing Dataset Decomposition! 🚀🚀🚀 In recent work from the Apple Machine Learning Research team, we introduce a sequence length-aware method to efficiently & accurately train LLMs with minimal changes to existing pipelines. Paper: arxiv.org/abs/2405.13226 🧵(1/n)

Introducing Dataset Decomposition!
🚀🚀🚀
In recent work from the Apple Machine Learning Research team, we introduce a sequence length-aware method to efficiently & accurately train LLMs with minimal changes to existing pipelines.

Paper: arxiv.org/abs/2405.13226

🧵(1/n)
Pavankumar Vasu (@pavankumarvasu) 's Twitter Profile Photo

Want to build extremely fast zero-shot classifiers or lean retrieval engines? MobileCLIP models are available now on HF.(checkpoints+datasets): huggingface.co/apple training+inference code: github.com/apple/ml-mobil… Live on-device demo at CVPR2024. #Apple #MLR #CVPR2024

Fabio Guzman (@fguzmanai) 's Twitter Profile Photo

🚀 Excited to launch CLIP-Finder! 🎉 CLIP-Finder enables semantic searches of images using natural language descriptions and camera input. Built on Apple's MobileCLIP-S0 architecture. Check it out on GitHub: github.com/fguzman82/CLIP… #ComputerVision #CoreML #AppleSilicon

Jiatao Gu (@thoma_gu) 's Twitter Profile Photo

Finally! We are excited to release our MDM code from the paper at github.com/apple/ml-mdm. We hope this will advance research in this field! With this code, you can easily train text-to-image diffusion models on datasets like CC12M. Due to licensing constraints, we cannot

Jason Ramapuram (@jramapuram) 's Twitter Profile Photo

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length. Paper: arxiv.org/abs/2409.04431 Code: github.com/apple/ml-sigmo… This was

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length.

Paper: arxiv.org/abs/2409.04431
Code: github.com/apple/ml-sigmo…

This was
Pavankumar Vasu (@pavankumarvasu) 's Twitter Profile Photo

📢 Presenting our app for real-time zero-shot image classification using MobileCLIP! Fully open-source—code & models available for everyone to explore. Check it out here: github.com/apple/ml-mobil… with - David Koski, Travis Trotto, Megan Maher Welsh & Hugues Thomas

📢 Presenting our app for real-time zero-shot image classification using MobileCLIP!

Fully open-source—code & models available for everyone to explore. Check it out here: github.com/apple/ml-mobil… 

with - David Koski, Travis Trotto, Megan Maher Welsh & Hugues Thomas
Ryan Hoque (@ryan_hoque) 's Twitter Profile Photo

🚨 New research from my team at Apple - real-time augmented reality robot feedback with just your hands + Vision Pro! Paper: arxiv.org/abs/2412.10631 Short thread below -

Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency 🤔? Image resolution 🤔? Number of visual tokens 🤔? LLM size 🤔? In this thread, we break it all down and introduce FastVLM — a family of fast and accurate VLMs. (1/n 🧵)

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency 🤔? Image resolution 🤔? Number of visual tokens 🤔? LLM size 🤔?

In this thread, we break it all down and introduce FastVLM — a family of fast and accurate VLMs.

(1/n 🧵)
Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

📢📢📢 We released the code for dataset-decomposition [NeurIPS 2024]: a simple method to speed up LLM pre-training using sequence length aware training. Paper: arxiv.org/abs/2405.13226 Code: github.com/apple/ml-datas…

Samira Abnar (@samira_abnar) 's Twitter Profile Photo

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? 

We explored this through the lens of MoEs:
Yizhe Zhang @ ICLR 2025 🇸🇬 (@yizhezhangnlp) 's Twitter Profile Photo

Excited to share our new paper on "Reversal Blessing" - where thinking BACKWARDS makes language models smarter on some multiple-choice questions! We found that right-to-left (R2L) models consistently outperform traditional left-to-right (L2R) models on certain reasoning tasks.🧵

Excited to share our new paper on "Reversal Blessing" - where thinking BACKWARDS makes language models smarter on some multiple-choice questions! We found that right-to-left (R2L) models consistently outperform traditional left-to-right (L2R) models on certain reasoning tasks.🧵
Martin Klissarov (@martinklissarov) 's Twitter Profile Photo

Here is an RL perspective on understanding LLMs for decision making. Are LLMs best used as: policies / rewards / transition functions ? How do you fine-tune them ? Can LLMs explore / exploit ? 🧵 Join us down this rabbit hole... (ICLR 2025 paper, done at  ML Research)

Cheng-Yu Hsieh (@cydhsieh) 's Twitter Profile Photo

Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions! 📢: Hadi Pouransari will be presenting

Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions!

📢: <a href="/HPouransari/">Hadi Pouransari</a> will be presenting
Ryan Hoque (@ryan_hoque) 's Twitter Profile Photo

Imitation learning has a data scarcity problem. Introducing EgoDex from Apple, the largest and most diverse dataset of dexterous human manipulation to date — 829 hours of egocentric video + paired 3D hand poses across 194 tasks. Now on arxiv: arxiv.org/abs/2505.11709 (1/4)