Pavankumar Vasu (@pavankumarvasu) 's Twitter Profile
Pavankumar Vasu

@pavankumarvasu

ID: 1586729245

calendar_today11-07-2013 20:12:38

24 Tweet

136 Followers

117 Following

Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

Introducing Dataset Decomposition! ๐Ÿš€๐Ÿš€๐Ÿš€ In recent work from the Apple Machine Learning Research team, we introduce a sequence length-aware method to efficiently & accurately train LLMs with minimal changes to existing pipelines. Paper: arxiv.org/abs/2405.13226 ๐Ÿงต(1/n)

Introducing Dataset Decomposition!
๐Ÿš€๐Ÿš€๐Ÿš€
In recent work from the Apple Machine Learning Research team, we introduce a sequence length-aware method to efficiently & accurately train LLMs with minimal changes to existing pipelines.

Paper: arxiv.org/abs/2405.13226

๐Ÿงต(1/n)
Pavankumar Vasu (@pavankumarvasu) 's Twitter Profile Photo

Want to build extremely fast zero-shot classifiers or lean retrieval engines? MobileCLIP models are available now on HF.(checkpoints+datasets): huggingface.co/apple training+inference code: github.com/apple/ml-mobilโ€ฆ Live on-device demo at CVPR2024. #Apple #MLR #CVPR2024

Fabio Guzman (@fguzmanai) 's Twitter Profile Photo

๐Ÿš€ Excited to launch CLIP-Finder! ๐ŸŽ‰ CLIP-Finder enables semantic searches of images using natural language descriptions and camera input. Built on Apple's MobileCLIP-S0 architecture. Check it out on GitHub: github.com/fguzman82/CLIPโ€ฆ #ComputerVision #CoreML #AppleSilicon

Jiatao Gu (@thoma_gu) 's Twitter Profile Photo

Finally! We are excited to release our MDM code from the paper at github.com/apple/ml-mdm. We hope this will advance research in this field! With this code, you can easily train text-to-image diffusion models on datasets like CC12M. Due to licensing constraints, we cannot

Jason Ramapuram (@jramapuram) 's Twitter Profile Photo

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length. Paper: arxiv.org/abs/2409.04431 Code: github.com/apple/ml-sigmoโ€ฆ This was

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length.

Paper: arxiv.org/abs/2409.04431
Code: github.com/apple/ml-sigmoโ€ฆ

This was
Pavankumar Vasu (@pavankumarvasu) 's Twitter Profile Photo

๐Ÿ“ข Presenting our app for real-time zero-shot image classification using MobileCLIP! Fully open-sourceโ€”code & models available for everyone to explore. Check it out here: github.com/apple/ml-mobilโ€ฆ with - David Koski, Travis Trotto, Megan Maher Welsh & Hugues Thomas

๐Ÿ“ข Presenting our app for real-time zero-shot image classification using MobileCLIP!

Fully open-sourceโ€”code & models available for everyone to explore. Check it out here: github.com/apple/ml-mobilโ€ฆ 

with - David Koski, Travis Trotto, Megan Maher Welsh & Hugues Thomas
Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

If youโ€™re at #NeurIPS2024, stop by the #Apple booth to try the MobileCLIP on-device demoโ€”itโ€™s extremely fast!

Ryan Hoque (@ryan_hoque) 's Twitter Profile Photo

๐Ÿšจ New research from my team at Apple - real-time augmented reality robot feedback with just your hands + Vision Pro! Paper: arxiv.org/abs/2412.10631 Short thread below -

Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency ๐Ÿค”? Image resolution ๐Ÿค”? Number of visual tokens ๐Ÿค”? LLM size ๐Ÿค”? In this thread, we break it all down and introduce FastVLM โ€” a family of fast and accurate VLMs. (1/n ๐Ÿงต)

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency ๐Ÿค”? Image resolution ๐Ÿค”? Number of visual tokens ๐Ÿค”? LLM size ๐Ÿค”?

In this thread, we break it all down and introduce FastVLM โ€” a family of fast and accurate VLMs.

(1/n ๐Ÿงต)
Hadi Pouransari (@hpouransari) 's Twitter Profile Photo

๐Ÿ“ข๐Ÿ“ข๐Ÿ“ข We released the code for dataset-decomposition [NeurIPS 2024]: a simple method to speed up LLM pre-training using sequence length aware training. Paper: arxiv.org/abs/2405.13226 Code: github.com/apple/ml-datasโ€ฆ

Samira Abnar (@samira_abnar) 's Twitter Profile Photo

๐Ÿšจ One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

๐Ÿšจ One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? 

We explored this through the lens of MoEs:
Yizhe Zhang @ ICLR 2025 ๐Ÿ‡ธ๐Ÿ‡ฌ (@yizhezhangnlp) 's Twitter Profile Photo

Excited to share our new paper on "Reversal Blessing" - where thinking BACKWARDS makes language models smarter on some multiple-choice questions! We found that right-to-left (R2L) models consistently outperform traditional left-to-right (L2R) models on certain reasoning tasks.๐Ÿงต

Excited to share our new paper on "Reversal Blessing" - where thinking BACKWARDS makes language models smarter on some multiple-choice questions! We found that right-to-left (R2L) models consistently outperform traditional left-to-right (L2R) models on certain reasoning tasks.๐Ÿงต
Martin Klissarov (@martinklissarov) 's Twitter Profile Photo

Here is an RL perspective on understanding LLMs for decision making. Are LLMs best used as: policies / rewards / transition functions ? How do you fine-tune them ? Can LLMs explore / exploit ? ๐Ÿงต Join us down this rabbit hole... (ICLR 2025 paper, done at ๏ฃฟ ML Research)

Cheng-Yu Hsieh (@cydhsieh) 's Twitter Profile Photo

Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions! ๐Ÿ“ข: Hadi Pouransari will be presenting

Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions!

๐Ÿ“ข: <a href="/HPouransari/">Hadi Pouransari</a> will be presenting
Ryan Hoque (@ryan_hoque) 's Twitter Profile Photo

Imitation learning has a data scarcity problem. Introducing EgoDex from Apple, the largest and most diverse dataset of dexterous human manipulation to date โ€” 829 hours of egocentric video + paired 3D hand poses across 194 tasks. Now on arxiv: arxiv.org/abs/2505.11709 (1/4)