Pavankumar Vasu (@pavankumarvasu) Twitter Tweets • TwiCopy

Hadi Pouransari

2 years ago

Introducing Dataset Decomposition! 🚀🚀🚀 In recent work from the Apple Machine Learning Research team, we introduce a sequence length-aware method to efficiently & accurately train LLMs with minimal changes to existing pipelines. Paper: arxiv.org/abs/2405.13226 🧵(1/n)

thumb_up_off_alt72

chat_bubble_outline2

repeat19

shareShare

Pavankumar Vasu

@pavankumarvasu

a year ago

Want to build extremely fast zero-shot classifiers or lean retrieval engines? MobileCLIP models are available now on HF.(checkpoints+datasets): huggingface.co/apple training+inference code: github.com/apple/ml-mobil… Live on-device demo at CVPR2024. #Apple #MLR #CVPR2024

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Fabio Guzman

@fguzmanai

a year ago

🚀 Excited to launch CLIP-Finder! 🎉 CLIP-Finder enables semantic searches of images using natural language descriptions and camera input. Built on Apple's MobileCLIP-S0 architecture. Check it out on GitHub: github.com/fguzman82/CLIP… #ComputerVision #CoreML #AppleSilicon

thumb_up_off_alt121

chat_bubble_outline3

repeat25

shareShare

Jiatao Gu

@thoma_gu

a year ago

Finally! We are excited to release our MDM code from the paper at github.com/apple/ml-mdm. We hope this will advance research in this field! With this code, you can easily train text-to-image diffusion models on datasets like CC12M. Due to licensing constraints, we cannot

thumb_up_off_alt123

chat_bubble_outline2

repeat26

shareShare

Jason Ramapuram

@jramapuram

a year ago

Enjoy attention? Want to make it ~18% faster? Try out Sigmoid Attention. We replace the traditional softmax in attention with a sigmoid and a constant (not learned) scalar bias based on the sequence length. Paper: arxiv.org/abs/2409.04431 Code: github.com/apple/ml-sigmo… This was

thumb_up_off_alt836

chat_bubble_outline16

repeat165

shareShare

Pavankumar Vasu

@pavankumarvasu

a year ago

📢 Presenting our app for real-time zero-shot image classification using MobileCLIP! Fully open-source—code & models available for everyone to explore. Check it out here: github.com/apple/ml-mobil… with - David Koski, Travis Trotto, Megan Maher Welsh & Hugues Thomas

thumb_up_off_alt26

chat_bubble_outline0

repeat11

shareShare

Hadi Pouransari

@hpouransari

a year ago

If you’re at #NeurIPS2024, stop by the #Apple booth to try the MobileCLIP on-device demo—it’s extremely fast!

thumb_up_off_alt8

chat_bubble_outline0

repeat1

shareShare

Ryan Hoque

@ryan_hoque

a year ago

🚨 New research from my team at Apple - real-time augmented reality robot feedback with just your hands + Vision Pro! Paper: arxiv.org/abs/2412.10631 Short thread below -

thumb_up_off_alt194

chat_bubble_outline3

repeat40

shareShare

Hadi Pouransari

@hpouransari

a year ago

What matters for runtime optimization in Vision Language Models (VLMs)? Vision encoder latency 🤔? Image resolution 🤔? Number of visual tokens 🤔? LLM size 🤔? In this thread, we break it all down and introduce FastVLM — a family of fast and accurate VLMs. (1/n 🧵)

thumb_up_off_alt77

chat_bubble_outline2

repeat21

shareShare

Hadi Pouransari

@hpouransari

a year ago

📢📢📢 We released the code for dataset-decomposition [NeurIPS 2024]: a simple method to speed up LLM pre-training using sequence length aware training. Paper: arxiv.org/abs/2405.13226 Code: github.com/apple/ml-datas…

thumb_up_off_alt20

chat_bubble_outline2

repeat6

shareShare

Samira Abnar

@samira_abnar

10 months ago

🚨 One question that has always intrigued me is the role of different ways to increase a model's capacity: parameters, parallelizable compute, or sequential compute? We explored this through the lens of MoEs:

thumb_up_off_alt287

chat_bubble_outline4

repeat65

shareShare

Yizhe Zhang @ ICLR 2025 🇸🇬

@yizhezhangnlp

9 months ago

Excited to share our new paper on "Reversal Blessing" - where thinking BACKWARDS makes language models smarter on some multiple-choice questions! We found that right-to-left (R2L) models consistently outperform traditional left-to-right (L2R) models on certain reasoning tasks.🧵

thumb_up_off_alt129

chat_bubble_outline6

repeat26

shareShare

Martin Klissarov

@martinklissarov

8 months ago

Here is an RL perspective on understanding LLMs for decision making. Are LLMs best used as: policies / rewards / transition functions ? How do you fine-tune them ? Can LLMs explore / exploit ? 🧵 Join us down this rabbit hole... (ICLR 2025 paper, done at  ML Research)

thumb_up_off_alt169

chat_bubble_outline2

repeat27

shareShare

Cheng-Yu Hsieh

@cydhsieh

7 months ago

Excited to introduce FocalLens: an instruction tuning framework that turns existing VLMs/MLLMs into text-conditioned vision encoders that produce visual embeddings focusing on relevant visual information given natural language instructions! 📢: Hadi Pouransari will be presenting

thumb_up_off_alt27

chat_bubble_outline1

repeat7

shareShare

Ryan Hoque

@ryan_hoque

6 months ago

Imitation learning has a data scarcity problem. Introducing EgoDex from Apple, the largest and most diverse dataset of dexterous human manipulation to date — 829 hours of egocentric video + paired 3D hand poses across 194 tasks. Now on arxiv: arxiv.org/abs/2505.11709 (1/4)

thumb_up_off_alt568

chat_bubble_outline15

repeat95

shareShare