Xianhang Li (@xianhangli) 's Twitter Profile
Xianhang Li

@xianhangli

Ph.D. student at @UCSC

ID: 1744526528942395392

linkhttps://xhl-video.github.io/xianhangli/ calendar_today09-01-2024 01:08:43

23 Tweet

154 Followers

272 Following

Vimal Thilak🦉🐒 (@aggieinca) 's Twitter Profile Photo

Xianhang Li has a thread on work conducted during his internship. I'm very happy to see this project out in the open! Please check it out. We love video-based learning ;)

Joshua Susskind (@jmsusskind) 's Twitter Profile Photo

Here's another fun @apple research project continuing the theme of simplifying ML methods to make representation learning more efficient and scalable. Maybe we should have called it SimpleJEPA 😂. Great work Xianhang Li on your internship!

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

New Apple paper rethinks how video pretraining is done and hugely cuts needed compute. shows a frozen teacher can replace V-JEPA’s moving teacher for video pretraining while improving compute efficiency. V-JEPA, or Video Joint Embedding Predictive Architecture, is the baseline

New Apple paper rethinks how video pretraining is done and hugely cuts needed compute.

shows a frozen teacher can replace V-JEPA’s moving teacher for video pretraining while improving compute efficiency.

V-JEPA, or Video Joint Embedding Predictive Architecture, is the baseline
Huangjie Zheng (@undergroundjeg) 's Twitter Profile Photo

We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) — a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] Paper: arxiv.org/abs/2510.01329

We’re excited to share our new paper: Continuously-Augmented Discrete Diffusion (CADD) — a simple yet effective way to bridge discrete and continuous diffusion models on discrete data, such as language modeling. [1/n] 

Paper: arxiv.org/abs/2510.01329
Eran Malach (@eranmalach) 's Twitter Profile Photo

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. Arxiv: arxiv.org/pdf/2510.14826 🧵

SSMs promised efficient language modeling for long context, but so far seem to underperform compared to Transformers in many settings. Our new work suggests that this is not a problem with SSMs, but with how we are currently using them. 
Arxiv: arxiv.org/pdf/2510.14826
🧵
Joshua Susskind (@jmsusskind) 's Twitter Profile Photo

Check out RepTok, which represents each image as a single continuous latent token, and leverages pre-trained SSL encoders for highly efficient generative model training. This work was led by our excellent LMU collaborators with a couple of us from Apple research!

Cihang Xie (@cihangxie) 's Twitter Profile Photo

🚀 Introducing OpenVision 3 — the 3rd generation of OpenVision and a step forward in unified visual modeling for both 🧠 understanding and 🎨 generation. 👇🧵 Thread

🚀 Introducing OpenVision 3 — the 3rd generation of OpenVision and a step forward in unified visual modeling for both 🧠 understanding and 🎨 generation.

👇🧵 Thread
Cihang Xie (@cihangxie) 's Twitter Profile Photo

Introducing Skillbolt ⚡✨ — an open-source tool to make AI agents more powerful than ever 🤖💥 Check it out: github.com/TacoSkill/Skil…

Cihang Xie (@cihangxie) 's Twitter Profile Photo

If you’re still on a ViT-style visual backbone… consider switching to ViT-5 🚀 (Also, QK-Norm is my favorite piece 🫶 — fixes a lot of training instability headaches 🤯)

Cihang Xie (@cihangxie) 's Twitter Profile Photo

SkillRL is here! 🤖✨ This is a new learning paradigm for evolving LLM agents through recursive skill discovery: by organizing knowledge into a hierarchical SkillBank, it boosts reasoning utility while cutting token usage by ~20%. Check it out 👉 github.com/aiming-lab/Ski…

SkillRL is here! 🤖✨

This is a new learning paradigm for evolving LLM agents through recursive skill discovery: by organizing knowledge into a hierarchical SkillBank, it boosts reasoning utility while cutting token usage by ~20%.

Check it out 👉 github.com/aiming-lab/Ski…
Michael Kirchhof (@mkirchhof_) 's Twitter Profile Photo

New paper 🥳 RL relies a lot on an agent’s capability to explore. Our strategy-guided exploration makes the agent find new solutions more efficiently. It learns faster, and in some environments its Pass@1 surpasses the base model’s Pass@128. 🧵1/6 📄 arxiv.org/abs/2603.02045

New paper 🥳 RL relies a lot on an agent’s capability to explore. Our strategy-guided exploration makes the agent find new solutions more efficiently. It learns faster, and in some environments its Pass@1 surpasses the base model’s Pass@128. 🧵1/6

📄 arxiv.org/abs/2603.02045
Cihang Xie (@cihangxie) 's Twitter Profile Photo

While Google's Veo has mastered visual realism, capturing the causal logic of the physical world—like the state transition from 'whole' to 'sliced'—remains a major challenge. 🍅🔪 Excited to share our latest work, CAST, that improves Veo to generate more coherent storylines! It

While Google's Veo has mastered visual realism, capturing the causal logic of the physical world—like the state transition from 'whole' to 'sliced'—remains a major challenge. 🍅🔪

Excited to share our latest work, CAST, that improves Veo to generate more coherent storylines! It
Yanqing Liu (@yanqingliu83931) 's Twitter Profile Photo

Excited to share my internship project at Google! Yan Jiao Yingcheng Liu In CAST, we explore modeling visual state transitions in representation space. While the paper studies this through video retrieval, I’m especially excited about its broader potential for video

Anshul Shah (@anshul__shah) 's Twitter Profile Photo

Excited to share our latest research on limitations of RL-finetuned VLMs! We investigate the robustness of model responses and consistency of CoT to textual perturbations. Work led by Rosie Zhao during her internship with the Multimodal Machine Intelligence team at Apple.