Hongxu (Danny) Yin (@yin_hongxu) 's Twitter Profile
Hongxu (Danny) Yin

@yin_hongxu

Staff Research Scientist, NVIDIA Research | Ph.D. Princeton University | Forbes Top 60 Elite Chinese North America.

ID: 1207922192916402176

linkhttps://hongxu-yin.github.io calendar_today20-12-2019 07:14:36

96 Tweet

692 Followers

149 Following

Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🚀 Exciting news! We’ve just released a new LLM: Llama-3.1-Nemotron-51B = LLaMa-70B-Instruct + Block Distillation + NAS + Logics Distillation; Powered by a single H100 GPU with nearly the same accuracy! ⚡ This gives a 2.2x inference speed-up with MT Bench 8.99 ➡️ 8.94.

🚀 Exciting news! We’ve just released a new LLM: 
Llama-3.1-Nemotron-51B = LLaMa-70B-Instruct + Block Distillation + NAS + Logics Distillation;
Powered by a single H100 GPU with nearly the same accuracy! ⚡ This gives a 2.2x inference speed-up with MT Bench 8.99 ➡️ 8.94.
AK (@_akhaliq) 's Twitter Profile Photo

MaskLLM Learnable Semi-Structured Sparsity for Large Language Models discuss: huggingface.co/papers/2409.17… Large Language Models (LLMs) are distinguished by their massive parameter counts, which typically result in significant redundancy. This work introduces MaskLLM, a learnable

MaskLLM

Learnable Semi-Structured Sparsity for Large Language Models

discuss: huggingface.co/papers/2409.17…

Large Language Models (LLMs) are distinguished by their massive parameter counts, which typically result in significant redundancy. This work introduces MaskLLM, a learnable
Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🚀 NeurIPS Conference Spotlight! 🥳 Imagine fine-tuning an LLM with just a sparsity mask! In our latest work, we freeze the LLM and use 2:4 structured sparsity to learn binary masks for each linear layer. Thanks to NVIDIA Ampere’s 2:4 sparsity, we can achieve up to 2x compute

Hongxu (Danny) Yin (@yin_hongxu) 's Twitter Profile Photo

Tomorrow we will be hosting the first Efficient DL for Foundation Model workshop tomorrow at #ECCV2024! The place will be Brown 3, at Milan time 2pm-6pm, with keynotes and posters. Co-organized by NVIDIA, Microsoft Research, MIT, and UCSD. Come and join us!

Tomorrow we will be hosting the first Efficient DL for Foundation Model workshop tomorrow at #ECCV2024! The place will be Brown 3, at Milan time 2pm-6pm, with keynotes and posters. Co-organized by NVIDIA, Microsoft Research, MIT, and UCSD. Come and join us!
Song Han (@songhan_mit) 's Twitter Profile Photo

Explore VILA-U: multi-modal token in, multi-modal token out, a single autoregressive next-token prediction model for both image/video generation and understanding. VILA-U is open sourced: github.com/mit-han-lab/vi…

Hongxu (Danny) Yin (@yin_hongxu) 's Twitter Profile Photo

Very proud to announce VILA-HD, an ultracheap method for VLMs to crack high resolution tasks! 20x cheaper than current tiling based methods. Surpass GPT-4o, Gemini-1.5, and Qwen2 for high resolution benchmarks. Scale to 8Kx8K resolution. #CVPR 2025. Check below for repository.

Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025. Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like

🔥 Vision encoder upgrade: RADIOv2.5 = DFN_CLIP + DINOv2 + SAM + SigLIP + ToMe + multi-res training + teacher loss balancing + smart augmentations, CVPR2025.

Current foundation models have too many limitations: i) tailored for a single task, ii) not flexible on resolution (like
Hongxu (Danny) Yin (@yin_hongxu) 's Twitter Profile Photo

#RSS2025 NaVILA constitutes a successful attempt for VILA to drive real world robotic dogs and humanoid! Fully deployable. Money saving. Fast inference. Check out our project page: navila-bot.github.io Many more amazing things to come!

Pavlo Molchanov (@pavlomolchanov) 's Twitter Profile Photo

New efficient Hybrid LLMs from @NVIDIA: Nemotron-H! Introducing a family of models combining Mamba-2, Self-Attention & FFNs for 8B, 47B and 56B sizes. • 3x faster and 1.5x smaller 47B model is on par with Qwen-72B and Llama-70B • 1.8x faster Hybrid 8B than transformers

New efficient Hybrid LLMs from @NVIDIA: Nemotron-H! Introducing a family of models combining Mamba-2, Self-Attention & FFNs for 8B, 47B and 56B sizes.

• 3x faster and 1.5x smaller 47B model is on par with Qwen-72B and Llama-70B
• 1.8x faster Hybrid 8B than transformers
AK (@_akhaliq) 's Twitter Profile Photo

Nvidia just dropped CLIMB on Hugging Face CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Nvidia just dropped CLIMB on Hugging Face

CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training
Hongxu (Danny) Yin (@yin_hongxu) 's Twitter Profile Photo

Sadly I will not be attending ICLR in Singapore. We have been researching on VILA across the entire system, model, and application stack. Cost-saving. Agile. Capable, yet deployable. Talk to our colleagues this week at ICLR, and the upcoming CVPR, RSS, MLSys!

Sadly I will not be attending ICLR in Singapore. We have been researching on VILA across the entire system, model, and application stack. Cost-saving. Agile. Capable, yet deployable. 

Talk to our colleagues this week at ICLR, and the upcoming CVPR, RSS, MLSys!