Xianhang Li (@xianhangli) Twitter Tweets • TwiCopy

Vimal Thilak🦉🐒

7 months ago

Xianhang Li has a thread on work conducted during his internship. I'm very happy to see this project out in the open! Please check it out. We love video-based learning ;)

thumb_up_off_alt14

chat_bubble_outline0

repeat5

shareShare

Here's another fun @apple research project continuing the theme of simplifying ML methods to make representation learning more efficient and scalable. Maybe we should have called it SimpleJEPA 😂. Great work Xianhang Li on your internship!

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Rohan Paul

@rohanpaul_ai

7 months ago

New Apple paper rethinks how video pretraining is done and hugely cuts needed compute. shows a frozen teacher can replace V-JEPA’s moving teacher for video pretraining while improving compute efficiency. V-JEPA, or Video Joint Embedding Predictive Architecture, is the baseline

thumb_up_off_alt30

chat_bubble_outline0

repeat5

shareShare

Huangjie Zheng

@undergroundjeg

7 months ago

We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) — a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] Paper: arxiv.org/abs/2510.01329

thumb_up_off_alt233

chat_bubble_outline6

repeat36

shareShare

Eran Malach

@eranmalach

7 months ago

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧵

thumb_up_off_alt124

chat_bubble_outline1

repeat32

shareShare

Joshua Susskind

@jmsusskind

7 months ago

Check out RepTok, which represents each image as a single continuous latent token, and leverages pre-trained SSL encoders for highly efficient generative model training. This work was led by our excellent LMU collaborators with a couple of us from Apple research!

thumb_up_off_alt12

chat_bubble_outline0

repeat2

shareShare

Cihang Xie

@cihangxie

7 months ago

We will be presenting OpenVision at Exhibit Hall I #370, starting 11:45 am #ICCV2025

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

Cihang Xie

@cihangxie

4 months ago

🚀 Introducing OpenVision 3 — the 3rd generation of OpenVision and a step forward in unified visual modeling for both 🧠 understanding and 🎨 generation. 👇🧵 Thread

thumb_up_off_alt36

chat_bubble_outline2

repeat13

shareShare

Cihang Xie

@cihangxie

3 months ago

Introducing Skillbolt ⚡✨ — an open-source tool to make AI agents more powerful than ever 🤖💥 Check it out: github.com/TacoSkill/Skil…

thumb_up_off_alt14

chat_bubble_outline1

repeat7

shareShare

Cihang Xie

@cihangxie

3 months ago

If you’re still on a ViT-style visual backbone… consider switching to ViT-5 🚀 (Also, QK-Norm is my favorite piece 🫶 — fixes a lot of training instability headaches 🤯)

thumb_up_off_alt40

chat_bubble_outline0

repeat15

shareShare

Cihang Xie

@cihangxie

3 months ago

SkillRL is here! 🤖✨ This is a new learning paradigm for evolving LLM agents through recursive skill discovery: by organizing knowledge into a hierarchical SkillBank, it boosts reasoning utility while cutting token usage by ~20%. Check it out 👉 github.com/aiming-lab/Ski…

thumb_up_off_alt141

chat_bubble_outline3

repeat41

shareShare

Li Bo

@boli68567011

3 months ago

x.com/i/article/2021…

thumb_up_off_alt24

chat_bubble_outline0

repeat5

shareShare

Michael Kirchhof

@mkirchhof_

2 months ago

New paper 🥳 RL relies a lot on an agent’s capability to explore. Our strategy-guided exploration makes the agent find new solutions more efficiently. It learns faster, and in some environments its Pass@1 surpasses the base model’s Pass@128. 🧵1/6 📄 arxiv.org/abs/2603.02045

thumb_up_off_alt65

chat_bubble_outline3

repeat13

shareShare

Cihang Xie

@cihangxie

2 months ago

While Google's Veo has mastered visual realism, capturing the causal logic of the physical world—like the state transition from 'whole' to 'sliced'—remains a major challenge. 🍅🔪 Excited to share our latest work, CAST, that improves Veo to generate more coherent storylines! It

thumb_up_off_alt33

chat_bubble_outline1

repeat12

shareShare

Yanqing Liu

@yanqingliu83931

2 months ago

Excited to share my internship project at Google! Yan Jiao Yingcheng Liu In CAST, we explore modeling visual state transitions in representation space. While the paper studies this through video retrieval, I’m especially excited about its broader potential for video

thumb_up_off_alt11

chat_bubble_outline0

repeat4

shareShare

Anshul Shah

@anshul__shah

2 months ago

Excited to share our latest research on limitations of RL-finetuned VLMs! We investigate the robustness of model responses and consistency of CoT to textual perturbations. Work led by Rosie Zhao during her internship with the Multimodal Machine Intelligence team at Apple.

thumb_up_off_alt71

chat_bubble_outline0

repeat14

shareShare