Han Cai (@hancai_hm) 's Twitter Profile
Han Cai

@hancai_hm

Research Scientist, NVIDIA

ID: 1208445012683280384

linkhttp://hancai.ai/ calendar_today21-12-2019 17:52:10

11 Tweet

293 Takipçi

14 Takip Edilen

Han Cai (@hancai_hm) 's Twitter Profile Photo

🥳 We are excited to introduce Deep Compression Autoencoder. It dramatically reduces the token number of the latent space, delivering significant training and inference speedup for latent diffusion models. Paper: arxiv.org/abs/2410.10733 Code: github.com/mit-han-lab/ef…

Enze Xie (@xieenze_jr) 's Twitter Profile Photo

Hi everyone, I'm thrilled to announce that you can now try #SANA models in #ComfyUI🎉. We show video generation using SANA+CogVideoX. SANA now also supports Chinese and Emoji prompts. If you find SANA useful, we’d be grateful if you could give us a🌟at github.com/NVlabs/Sana/💗

Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

🚀 We're are excited to open source an FP8 training technique, COAT: Compressing Optimizer states and Activation for memory-efficient fp8 Training. COAT is accepted by ICLR 2025! FP8 training effectively improves the training efficiency. Deepseek-v3 is a successful example of

🚀 We're are excited to open source an FP8 training technique, COAT: Compressing Optimizer states and Activation for memory-efficient fp8 Training. 
COAT is accepted by ICLR 2025!

FP8 training effectively improves the training efficiency. Deepseek-v3 is a successful example of
Baifeng (@baifeng_shi) 's Twitter Profile Photo

Next-gen vision pre-trained models shouldn’t be short-sighted. Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage. Today, we

Next-gen vision pre-trained models shouldn’t be short-sighted.

Humans can easily perceive 10K x 10K resolution. But today’s top vision models—like SigLIP and DINOv2—are still pre-trained at merely hundreds by hundreds of pixels, bottlenecking their real-world usage.

Today, we
Enze Xie (@xieenze_jr) 's Twitter Profile Photo

🚀 SANA 1.5 Update: Inference Scaling Now Open-Source! 🎉 📈 Breakthrough on GenEval benchmark: • SANA 1.5 + Inference Scaling: 0.81 → 0.96 (!!) 🎯 • SD 1.5 + Inference Scaling: 0.42 → 0.87 ⬆️ 💫 The secret sauce: 1. Generate n candidates 🎨 2. Pick top k with NVILA

🚀 SANA 1.5 Update: Inference Scaling Now Open-Source! 🎉

📈 Breakthrough on GenEval benchmark:
• SANA 1.5 + Inference Scaling: 0.81 → 0.96 (!!) 🎯
• SD 1.5 + Inference Scaling: 0.42 → 0.87 ⬆️

💫 The secret sauce:
1. Generate n candidates 🎨
2. Pick top k with NVILA
Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

🚀 COAT: Memory Efficient FP8 Training @ICLR 2025 📍 Hall 3 + Hall 2B Poster #566 🗓 Sat, Apr 26 | 3:00–5:30 PM Singapore Time ✅ 1.54x Memory Efficiency, 1.43x Speedup, near lossless performance! ✅ Check our poster about FP8 Training by Compressing Optimizer states and

🚀 COAT: Memory Efficient FP8 Training @ICLR 2025 
📍 Hall 3 + Hall 2B Poster #566 
🗓 Sat, Apr 26 | 3:00–5:30 PM Singapore Time

✅ 1.54x Memory Efficiency, 1.43x Speedup, near lossless performance!

✅ Check our poster about FP8 Training by Compressing Optimizer states and
Hao Kang (@gt_haokang) 's Twitter Profile Photo

🚀📉 A new kind of efficiency challenge: "Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs" We explore a new frontier: what if the reward doesn’t come from being right—but from being fast and right? 🔗 arxiv.org/abs/2505.19481 🛜

Baifeng (@baifeng_shi) 's Twitter Profile Photo

We just dropped a few new PS3 models, with SOTA performance compared to existing vision encoders such as SigLIP2, C-RADIOv2, AIMv2, InternViT2.5, and Perception Encoder! Coming along with several new VILA-HD models. Check it out👇 Models: huggingface.co/collections/nv… Code:

We just dropped a few new PS3 models, with SOTA performance compared to existing vision encoders such as SigLIP2, C-RADIOv2, AIMv2, InternViT2.5, and Perception Encoder! Coming along with several new VILA-HD models. Check it out👇

Models: huggingface.co/collections/nv…
Code:
Han Cai (@hancai_hm) 's Twitter Profile Photo

Developing new LLM architectures is both costly and risky. Our latest project — hanlab.mit.edu/projects/jet-n… — offers an effective strategy to address this challenge. Our first result is Jet-Nemotron, a new family of hybrid-architecture language models that outperform state-of-the-art

Developing new LLM architectures is both costly and risky. Our latest project — hanlab.mit.edu/projects/jet-n… — offers an effective strategy to address this challenge.
Our first result is Jet-Nemotron, a new family of hybrid-architecture language models that outperform state-of-the-art
Han Cai (@hancai_hm) 's Twitter Profile Photo

🚀 Excited to announce DC-AE 1.5! With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels. 📍 Catch us at

🚀 Excited to announce DC-AE 1.5!

With a spatial compression ratio boosted to f64, it accelerates high-res diffusion models while preserving text-to-image quality. Key innovation: channel-wise latent structure for faster convergence with many latent channels.

📍 Catch us at
Haocheng Xi (@haochengxiucb) 's Twitter Profile Photo

🚀 Introducing Sparse VideoGen2 (SVG2) — Pareto-frontier video generation acceleration with semantic-aware sparse attention! 🏆Spotlight paper accepted by #NeurIPS2025 ✅ Training-free & plug-and-play ✅ Up to 2.5× faster on HunyuanVideo, 1.9× faster on Wan 2.1 ✅ SOTA quality

Han Cai (@hancai_hm) 's Twitter Profile Photo

🚀 Jet-Nemotron – Code & pre-trained checkpoints now available! ⚡️ Achieve up to 53.6× higher generation throughput on H100 GPUs with cost-efficient finetuning. 🔗 GitHub: github.com/NVlabs/Jet-Nem… 🔗 Hugging Face: huggingface.co/collections/je… 🔗 Paper: arxiv.org/abs/2508.15884

Han Cai (@hancai_hm) 's Twitter Profile Photo

Decoding is often the speed bottleneck in few-step latent diffusion models. 🚀 Meet DC-AE-Lite: ⚡ 1.8× faster decoding than DC-AE 🎯 Similar reconstruction quality 👉 Code: github.com/dc-ai-projects… 👉 Pre-trained model: huggingface.co/collections/dc… Contributors: Dongyun Zou,

Han Cai (@hancai_hm) 's Twitter Profile Photo

Changing the autoencoder in latent diffusion models is easier than you think. 🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with

Changing the autoencoder in latent diffusion models is easier than you think.

🚀 Introducing DC-Gen – a post-training acceleration framework that works with any pre-trained diffusion model, boosting efficiency by transferring it into a deeply compressed latent space with
Han Cai (@hancai_hm) 's Twitter Profile Photo

We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features: 🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU ⚡ Delivers 14.8× faster inference than the base model while achieving comparable or

We release DC-VideoGen, a new post-training framework for accelerating video diffusion models. Key features:
🎬 Supports video generation up to 2160×3840 (4K) resolution on a single H100 GPU
⚡ Delivers 14.8× faster inference than the base model while achieving comparable or