Xinlei Chen (@endernewton) Twitter Tweets • TwiCopy

Gate.io

5 hours ago

🔥The 9th Round of Easy Loan, Earn $40 Reward is in progress❗️ ⏰ Promotion Period: January 15th - Feburary 15th, 2025 👉 Register now and check more details at gate.io/campaigns/358

thumb_up_off_alt34

chat_bubble_outline39

repeat6

shareShare

Very happy to see the TTT-series reaching yet another milestone! This time it serves as an inspiration for next-generation architecture post-Transformer, and by connecting TTT to Transformer, it can explain why (autoregressive) Transformers are so good at in-context learning!

thumb_up_off_alt100

chat_bubble_outline0

repeat11

shareShare

Michael Dorkenwald

@mdorkenw

9 months ago

Interested in learning about the future of self-supervised learning? Don’t miss our workshop this Sunday at European Conference on Computer Vision #ECCV2026 with an incredible lineup of speakers! 🔥 Ishan Misra Oriane Siméoni Xinlei Chen Olivier Hénaff Yuki Yutong Bai More details at sslwin.org

Interested in learning about the future of self-supervised learning? Don’t miss our workshop this Sunday at <a href="/eccvconf/">European Conference on Computer Vision #ECCV2026</a> with an incredible lineup of speakers! 🔥

<a href="/imisra_/">Ishan Misra</a> <a href="/oriane_simeoni/">Oriane Siméoni</a> <a href="/endernewton/">Xinlei Chen</a> <a href="/olivierhenaff/">Olivier Hénaff</a> <a href="/y_m_asano/">Yuki</a> <a href="/YutongBAI1002/">Yutong Bai</a>

More details at sslwin.org

thumb_up_off_alt81

chat_bubble_outline5

repeat19

shareShare

Leroy Wang

@liruiwang1

9 months ago

Excited to share Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers (HPT)! We explore the challenging problem of data heterogeneity across embodiments in robotics and investigate the scaling behavior of HPT. Accepted to a #Neurips2024 as Spotlight.

thumb_up_off_alt209

chat_bubble_outline3

repeat39

shareShare

Quinn McIntyre

@qamcintyre

8 months ago

So excited to share what I have been working on Etched. It was a great honor to work with Julian Quevedo Spruce Xinlei Chen Robert Wachen and to have the chance to collaborate with Decart — interactive video models will be the most impactful interface in the next decade

thumb_up_off_alt82

chat_bubble_outline5

repeat10

shareShare

Julian Quevedo

@julianhquevedo

8 months ago

oasis is here! it's an interactive diffusion transformer that predicts the next frame autoregressively. here, we used it to create one of the first immersive, generative worlds. and the future possibilities for interactive video are so, so exciting.

thumb_up_off_alt71

chat_bubble_outline8

repeat12

shareShare

Yossi Gandelsman

@ygandelsman

8 months ago

Current video representation models (e.g. VideoMAE) are inefficient learners. How inefficient? We show that reprs with similar quality can be learned without training on *any* real videos, by using synthetic datasets that were created from very simple generative processes!

thumb_up_off_alt206

chat_bubble_outline2

repeat26

shareShare

Xinlei Chen

@endernewton

7 months ago

I am looking for an intern to do research together next summer. Possible topics: representation learning, network architecture, and in general understanding what's going on :P. Please apply (metacareers.com/jobs/532549086…) and email me ([email protected]) if interested.

thumb_up_off_alt443

chat_bubble_outline10

repeat42

shareShare

Leroy Wang

@liruiwang1

7 months ago

HPT will be presented at Neurips in Vancouver East Exhibit Hall A-C #4210 on Thursday at 11 next week! Unfortunately, I cannot make it in person but Xinlei Chen will be there! Thanks for the constructive feedback from the reviewers. Check out the poster and come talk to us!

thumb_up_off_alt24

chat_bubble_outline0

repeat3

shareShare

Alex Li

@alexlioralexli

6 months ago

I'm presenting our #NeurIPS2024 work on Attention Transfer today! Key finding: Pretrained representations aren't essential - just using attention patterns from pretrained models to guide token interactions is enough for models to learn high-quality features from scratch and

thumb_up_off_alt160

chat_bubble_outline2

repeat21

shareShare

Zhuang Liu

@liuzhuang1234

6 months ago

How far is an LLM from not only understanding but also generating visually? Not very far! Introducing MetaMorph---a multimodal understanding and generation model. In MetaMorph, understanding and generation benefit each other. Very moderate generation data is needed to elicit

thumb_up_off_alt730

chat_bubble_outline24

repeat133

shareShare

Shiry Ginosar

@shiryginosar

5 months ago

New paper! A SSL object-centric 2.1D image representation using 3D Gaussians, extending MAE with a Gaussian bottleneck. While Gaussian splatting has been used for single-scene reconstruction, we’re the first to apply it to image representation learning! brjathu.github.io/gmae/

thumb_up_off_alt305

chat_bubble_outline5

repeat60

shareShare

Simone Scardapane

@s_scardapane

5 months ago

*On the Surprising Effectiveness of Attention Transfer for Vision Transformers* by Yuandong Tian Beidi Chen Deepak Pathak Xinlei Chen Alex Li Shows that distilling attention patterns in ViTs is competitive with standard fine-tuning. arxiv.org/abs/2411.09702

*On the Surprising Effectiveness of Attention Transfer
for Vision Transformers*
by <a href="/tydsh/">Yuandong Tian</a> <a href="/BeidiChen/">Beidi Chen</a> <a href="/pathak2206/">Deepak Pathak</a> <a href="/endernewton/">Xinlei Chen</a> <a href="/alexlioralexli/">Alex Li</a>

Shows that distilling attention patterns in ViTs is competitive with standard fine-tuning.

arxiv.org/abs/2411.09702

thumb_up_off_alt202

chat_bubble_outline3

repeat36

shareShare

Rohit Girdhar

@_rohitgirdhar_

5 months ago

Super excited to share some recent work that shows that pure, text-only LLMs, can see and hear without any training! Our approach, called "MILS", uses LLMs with off-the-shelf multimodal models, to caption images/videos/audio, improve image generation, style transfer, and more!

thumb_up_off_alt247

chat_bubble_outline7

repeat38

shareShare

Zhuang Liu

@liuzhuang1234

3 months ago

New paper - Transformers, but without normalization layers (1/n)

thumb_up_off_alt4,4K

chat_bubble_outline77

repeat615

shareShare

David Fan

@davidjfan

3 months ago

Can visual SSL match CLIP on VQA? Yes! We show with controlled experiments that visual SSL can be competitive even on OCR/Chart VQA, as demonstrated by our new Web-SSL model family (1B-7B params) which is trained purely on web images – without any language supervision.

thumb_up_off_alt452

chat_bubble_outline12

repeat93

shareShare