Saksham Suri (@_sakshams_) 's Twitter Profile
Saksham Suri

@_sakshams_

Research Scientist @AiatMeta. Previously PhD @UMDCS, @MetaAI, @AmazonScience, @USCViterbi, @IIITDelhi, @IBMResearch.
#computervision #deeplearning

ID: 2977040274

linkhttp://www.cs.umd.edu/~sakshams/ calendar_today12-01-2015 16:09:41

129 Tweet

760 Takipçi

638 Takip Edilen

Matt Shumer (@mattshumer_) 's Twitter Profile Photo

Wild tech you have to try: groq.com They are serving Mixtral at nearly 500 tok/s. Answers are pretty much instantaneous. Opens up new use-cases, and completely changes the UX possibilities of existing ones.

Stability AI (@stabilityai) 's Twitter Profile Photo

Announcing Stable Diffusion 3, our most capable text-to-image model, utilizing a diffusion transformer architecture for greatly improved performance in multi-subject prompts, image quality, and spelling abilities. Today, we are opening the waitlist for early preview. This phase

Announcing Stable Diffusion 3, our most capable text-to-image model, utilizing a diffusion transformer architecture for greatly improved performance in multi-subject prompts, image quality, and spelling abilities.

Today, we are opening the waitlist for early preview. This phase
Anthropic (@anthropicai) 's Twitter Profile Photo

Today, we're announcing Claude 3, our next generation of AI models. The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.

Today, we're announcing Claude 3, our next generation of AI models. 

The three state-of-the-art models—Claude 3 Opus, Claude 3 Sonnet, and Claude 3 Haiku—set new industry benchmarks across reasoning, math, coding, multilingual understanding, and vision.
Abhinav Shrivastava (@abhi2610) 's Twitter Profile Photo

Call for Papers: #INRV2024 Workshop on Implicit Neural Representation for Vision @ #CVPR2024! Topics: Compression, Representation using INR’s for images, audio, video & more! Ddl: 3/31. Submit now! #CVPR2025 Website: inrv.github.io Submission Link: shorturl.at/vzBR8

Saksham Suri (@_sakshams_) 's Twitter Profile Photo

That's a wrap! Happy to share that I have defended my thesis. Thankful for the insightful questions and feedback from my committee members Abhinav Shrivastava,Tianyi Zhou, David Jacobs, Prof. Espy-Wilson, and Prof. Andrew Zisserman.

That's a wrap! Happy to share that I have defended my thesis. 

Thankful for the insightful questions and feedback from my committee members <a href="/abhi2610/">Abhinav Shrivastava</a>,<a href="/zhoutianyi/">Tianyi Zhou</a>, <a href="/davwiljac/">David Jacobs</a>, Prof. Espy-Wilson, and  Prof. Andrew Zisserman.
Saksham Suri (@_sakshams_) 's Twitter Profile Photo

Excited to announce that I have joined AI at Meta as a Research Scientist where I will be working on model optimization. Also I will be at ECCV to present my work and am excited to meet and learn from everyone. Reach out if you are attending and would like to chat. Ciao 🇮🇹

Yunyang Xiong (@youngxiong1) 's Twitter Profile Photo

🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: huggingface.co/papers/2410.17… 🧑🏻‍💻Code: github.com/Vision-CAIR/Lo… 🚀Project (Demo): vision-cair.github.io/LongVU We propose LongVU, a video LLM with a spatiotemporal adaptive

🚨VideoLLM from Meta!🚨
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding

📝Paper: huggingface.co/papers/2410.17…
🧑🏻‍💻Code: github.com/Vision-CAIR/Lo…
🚀Project (Demo): vision-cair.github.io/LongVU

We propose LongVU, a video LLM with a spatiotemporal adaptive
Saksham Suri (@_sakshams_) 's Twitter Profile Photo

We are happy to release our LiFT code and pretrained models! 📢 Code: github.com/saksham-s/lift Project Page: cs.umd.edu/~sakshams/LiFT Here are some super spooky super resolved feature visualizations to make the season scarier 🎃 Coauthors: Matthew Walmer Kamal Gupta Abhinav Shrivastava

We are happy to release our LiFT code and pretrained models! 📢

Code: github.com/saksham-s/lift
Project Page: cs.umd.edu/~sakshams/LiFT

Here are some super spooky super resolved feature visualizations to make the season scarier 🎃

Coauthors: <a href="/MatthewWalmer/">Matthew Walmer</a> <a href="/kamalgupta09/">Kamal Gupta</a> <a href="/abhi2610/">Abhinav Shrivastava</a>
Saksham Suri (@_sakshams_) 's Twitter Profile Photo

Checkout LARP, our work on creating a video tokenizer which is trained with an autoregressive generative prior. Code and models are open sourced!

Saksham Suri (@_sakshams_) 's Twitter Profile Photo

Checkout Efficient Track Anything from our team. 2x faster than SAM2 on A100 > 10 FPS on iPhone 15 Pro Max Paper: arxiv.org/pdf/2411.18933 demo: yformer.github.io/efficient-trac…

Forrest Iandola (@fiandola) 's Twitter Profile Photo

[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from Meta: interactive video segmentation and tracking on an iPhone!

Saksham Suri (@_sakshams_) 's Twitter Profile Photo

📢 Excited to announce LARP has been accepted to #ICLR2025 ! 🇸🇬 Code and models are publicly available. Project page: hywang66.github.io/larp/index.html

AI at Meta (@aiatmeta) 's Twitter Profile Photo

Today is the start of a new era of natively multimodal AI innovation. Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick — our most advanced models yet and the best in their class for multimodality. Llama 4 Scout • 17B-active-parameter model

Today is the start of a new era of natively multimodal AI innovation.

Today, we’re introducing the first Llama 4 models: Llama 4 Scout and Llama 4 Maverick —  our most advanced models yet and the best in their class for multimodality.

Llama 4 Scout
• 17B-active-parameter model
Saksham Suri (@_sakshams_) 's Twitter Profile Photo

Drop by our oral presentation and poster session to chat and learn about our video tokenizer with learned autoregressive prior. #ICLR2025