Yunyang Xiong (@youngxiong1) Twitter Tweets • TwiCopy

Yunyang Xiong

2 years ago

EfficientSAM, small but mighty! With 20x fewer parameters and 20x faster runtime, EfficientSAM is within 2 points of the original SAM model, outperforming MobileSAM/FastSAM by a large margin. Paper: arxiv.org/pdf/2312.00863… Project details and demo: yformer.github.io/efficient-sam/

thumb_up_off_alt13

chat_bubble_outline0

repeat3

shareShare

Yunyang Xiong

@youngxiong1

2 years ago

MobileLLM, our recent on-device efforts for large language models (LLMs with 125M/350M parameters)

thumb_up_off_alt12

chat_bubble_outline0

repeat0

shareShare

Yunyang Xiong

@youngxiong1

a year ago

The guide of vision-language models (VLMs).

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Yann LeCun

@ylecun

a year ago

MobileLLM: nice paper from AI at Meta about running sub-billion LLMs on smartphones and other edge devices. TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks; Paper: arxiv.org/abs/2402.14905

MobileLLM: nice paper from <a href="/AIatMeta/">AI at Meta</a> about running sub-billion LLMs on smartphones and other edge devices.
TL;DR: more depth, not width; shared matrices for token->embedding and embedding->token; shared weights between multiple transformer blocks;

Paper: arxiv.org/abs/2402.14905

thumb_up_off_alt1,1K

chat_bubble_outline40

repeat199

shareShare

Yunyang Xiong

@youngxiong1

a year ago

Agent-as-a Judge: our new work on using AI agents to evaluate AI agents, achieving human-level performance.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Yunyang Xiong

@youngxiong1

10 months ago

🚨VideoLLM from Meta!🚨 LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding 📝Paper: huggingface.co/papers/2410.17… 🧑🏻‍💻Code: github.com/Vision-CAIR/Lo… 🚀Project (Demo): vision-cair.github.io/LongVU We propose LongVU, a video LLM with a spatiotemporal adaptive

thumb_up_off_alt253

chat_bubble_outline5

repeat73

shareShare

Zechun Liu

@zechunliu

10 months ago

🚀We're thrilled to announce the MobileLLM weights are Available on HuggingFace: huggingface.co/collections/fa… 📱MobileLLM is a state-of-the-art language model designed for mobile devices： arxiv.org/abs/2402.14905 🔥Explore the pretraining code on GitHub: github.com/facebookresear…

thumb_up_off_alt25

chat_bubble_outline1

repeat8

shareShare

Forrest Iandola

@fiandola

9 months ago

[1/n] 𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗧𝗿𝗮𝗰𝗸 𝗔𝗻𝘆𝘁𝗵𝗶𝗻𝗴 from Meta: interactive video segmentation and tracking on an iPhone!

thumb_up_off_alt527

chat_bubble_outline13

repeat111

shareShare

Yunyang Xiong

@youngxiong1

9 months ago

Efficient Track Anything for segment everything 🔥 Gradio demo (built upon SkalskiP 's sam2 demo): 5239f8e221db7ee8a0.gradio.live #supermario #UniversalStudios #sanfranisco

Efficient Track Anything for segment everything 🔥

<a href="/Gradio/">Gradio</a> demo (built upon <a href="/skalskip92/">SkalskiP</a> 's sam2 demo): 5239f8e221db7ee8a0.gradio.live

#supermario #UniversalStudios #sanfranisco

thumb_up_off_alt9

chat_bubble_outline1

repeat0

shareShare

Yunyang Xiong

@youngxiong1

9 months ago

Efficient Track Anything is on alphaxiv.org/abs/2411.18933. I am happy to take questions on alpharxiv. alphaXiv

thumb_up_off_alt4

chat_bubble_outline0

repeat5

shareShare

Jiao Sun

@sunjiao123sun_

9 months ago

Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference NeurIPS Conference We have ethical reviews for authors, but missed it for invited speakers? 😡

Mitigating racial bias from LLMs is a lot easier than removing it from humans!

Can’t believe this happened at the best AI conference <a href="/NeurIPSConf/">NeurIPS Conference</a>

We have ethical reviews for authors, but missed it for invited speakers? 😡

thumb_up_off_alt3,3K

chat_bubble_outline184

repeat837

shareShare

Peter Tong

@tongpetersb

9 months ago

This project really changed how I think about multimodal models and LLMs. I used to believe that multimodal (visual) prediction required significant changes to the model and heavy pretraining, like Chameleon. But surprisingly, the opposite is true! In large autoregressive models,

thumb_up_off_alt476

chat_bubble_outline9

repeat97

shareShare

Yunyang Xiong

@youngxiong1

9 months ago

🚀MetaMorph can leverage LLM knowledge for generation and implicit reasoning.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Yunyang Xiong

@youngxiong1

9 months ago

Excited to see Peter Tong 's internship work (MetaMorph) on exploring unified multimodal understanding and generation with many interesting findings. Check out the paper and project page below.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Yunyang Xiong

@youngxiong1

7 months ago

Glad to see efficient track anything model has been used for near real-time multi-object segment and tracking on a MacBook for SlapFX. Looking forward to your next-gen CapCut release, Hart Woolery !

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare