Ferjad Naeem (@ferjadnaeem) 's Twitter Profile
Ferjad Naeem

@ferjadnaeem

Research Scientist @Google

ID: 145732891

linkhttps://ferjad.github.io/ calendar_today19-05-2010 18:50:12

311 Tweet

902 Takipçi

357 Takip Edilen

Haiyang Wang (@haiyang73756134) 's Twitter Profile Photo

Excited to see our paper "Tokenformer: Rethinking transformer scaling with tokenized model parameters" accepted as a spotlight at #ICLR2025 ! Hope our idea of ​​tokenizing everything can inspire the future of AI. Paper: arxiv.org/abs/2410.23168 Code: github.com/Haiyang-W/Toke…

Excited to see our paper "Tokenformer: Rethinking transformer scaling with tokenized model parameters" accepted as a spotlight at #ICLR2025 ! 

Hope our idea of ​​tokenizing everything can inspire the future of AI.

Paper: arxiv.org/abs/2410.23168
Code: github.com/Haiyang-W/Toke…
Ferjad Naeem (@ferjadnaeem) 's Twitter Profile Photo

Excited to share what we have been up to in image text embedding models. SigLIP 2 is the most powerful encoder for most open vocabulary computer vision and MMLLM tasks. Checkpoints are open sourced and we look forward to what the community achieves with these.

Michael Tschannen (@mtschannen) 's Twitter Profile Photo

📢2⃣ Yesterday we released SigLIP 2! TL;DR: Improved high-level semantics, localization, dense features, and multilingual capabilities via drop-in replacement for v1. Bonus: Variants supporting native aspect and variable sequence length. A thread with interesting resources👇

📢2⃣ Yesterday we released SigLIP 2! 

TL;DR: Improved high-level semantics, localization, dense features, and multilingual capabilities via drop-in replacement for v1.

Bonus: Variants supporting native aspect and variable sequence length.

A thread with interesting resources👇
Ferjad Naeem (@ferjadnaeem) 's Twitter Profile Photo

Delighted to share that ACED has been accepted at CVPR2025! Check out our work to know how to distill the strongest smol size image-text contrastive models.

Ferjad Naeem (@ferjadnaeem) 's Twitter Profile Photo

Fully supportive of this. Machine Learning/ Computer Vision review process is broken with irresponsible reviewers. Glad to see there is some accountability.

Rudy Gilman (@rgilman33) 's Twitter Profile Photo

The majority of features in this layer of Siglip-2 are multimodal. I'd expected some multimodality but was surprised that two-thirds of the neurons I tested bind together their visual and linguistic features. This neuron fires for images of mustaches and for the word "mustache"

André Araujo (@andrefaraujo) 's Twitter Profile Photo

Google's global PhD Fellowship program will open for applications this week! (on Apr 10th) This supports PhD students in computer science and related fields, also connecting to a Google mentor. Learn more and apply at: goo.gle/phdfellowship (deadline: May 15th, 2025)

Michael Tschannen (@mtschannen) 's Twitter Profile Photo

We are presenting JetFormer at ICLR this morning, poster #190. Stop by if you’re interested in unified multimodal architectures!

Sundar Pichai (@sundarpichai) 's Twitter Profile Photo

At #GoogleIO, we shared how decades of AI research have now become reality.  From a total reimagining of Search to Agent Mode, Veo 3 and more, Gemini season will be the most exciting era of AI yet.  Some highlights 🧵

At #GoogleIO, we shared how decades of AI research have now become reality. 

From a total reimagining of Search to Agent Mode, Veo 3 and more, Gemini season will be the most exciting era of AI yet. 

Some highlights 🧵
Ross Wightman (@wightmanr) 's Twitter Profile Photo

timm's got a new vision transformer (NaFlexVit), and it's flexible! I've been plugging away at this for a bit, integrating ideas from FlexiViT, NaViT, and NaFlex and finally ready to merge for initial exploration. The model supports: * variable aspect/size images of NaFlex (see

timm's got a new vision transformer (NaFlexVit), and it's flexible! I've been plugging away at this for a bit, integrating ideas from FlexiViT, NaViT, and NaFlex and finally ready to merge for initial exploration. The model supports:
* variable aspect/size images of NaFlex (see
Jack (in SF) Langerman (@jacklangerman) 's Twitter Profile Photo

Active Data Curation Effectively Distills Large-Scale Multimodal Models - compute per sample loss with large batch - only backprop (probabilistically) through samples with high loss intuition: these are the samples where there is “something to learn” - if both teacher and

Active Data Curation Effectively Distills Large-Scale Multimodal Models

- compute per sample loss with large batch
- only backprop (probabilistically) through samples with high loss

intuition: these are the samples where there is “something to learn” - if both teacher and
Ferjad Naeem (@ferjadnaeem) 's Twitter Profile Photo

A big congratulations to the whole Gemini team on pushing this amazing family of models out 😄 Our tech report is out now: storage.googleapis.com/deepmind-media… Feels a bit unreal to share the contributors list with all the amazing colleagues

Prune Truong (@prunetruong) 's Twitter Profile Photo

🎺Meet VIST3A — Text-to-3D by Stitching a Multi-view Reconstruction Network to a Video Generator. ➡️ Paper: arxiv.org/abs/2510.13454 ➡️ Website: gohyojun15.github.io/VIST3A/ Collaboration between ETH & Google with Hyojun Go, Dominik Narnhofer, Goutam Bhat, Federico Tombari, and Konrad Schindler.

Enis Simsar (@enisimsar) 's Twitter Profile Photo

🚀 Excited to share our new work RefAM: Attention Magnets for Zero-Shot Referral Segmentation, a training-free approach that turns diffusion model attentions into segmentations. By Anna Kukleva, me, Alessio Tonioni, Ferjad Naeem, Federico Tombari, Jan Eric Lenssen, Bernt Schiele

Jitendra MALIK (@jitendramalikcv) 's Twitter Profile Photo

(3/5)2016-2021 was a wonderful period for AI research precisely because the leading labs at the time – FAIR, Deep Mind, Google, Open AI – were all publishing freely and building off each other’s results. If the transformer paper in 2017 had been held as a secret inside Google