Mohammad Shoeybi (@mohammadshoeybi) 's Twitter Profile
Mohammad Shoeybi

@mohammadshoeybi

Director of Applied Research @NVIDIA

ID: 1446997466

calendar_today21-05-2013 18:25:48

72 Tweet

307 Takipçi

58 Takip Edilen

Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2).
Remarkably, NVLM 1.0 shows improved text-only
Mohammad Shoeybi (@mohammadshoeybi) 's Twitter Profile Photo

We are very excited to release our Common Crawl based large scale dataset. This 6.3T tokens dataset will help the community develop stronger models. Check it out!

Wei Ping (@_weiping) 's Twitter Profile Photo

We’re at #NeurIPS 2024 in Vancouver, presenting two papers from NVIDIA on advancing state-of-the-art LLM RAG models! ChatQA: Surpassing GPT-4 on Conversational QA and RAG Thu 12 Dec 11 a.m. PST — 2 p.m. PST, West Ballroom A-D #7201 Paper: arxiv.org/abs/2401.10225 RankRAG:

Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing AceMath, a cutting-edge suite of math models designed to excel at solving complex math problems, complemented by highly effective reward models. Our flagship model, AceMath-72B-Instruct, significantly improves upon Qwen2.5-Math-72B and outperforms GPT-4o and

Introducing AceMath, a cutting-edge suite of math models designed to excel at solving complex math problems, complemented by highly effective reward models. Our flagship model, AceMath-72B-Instruct, significantly improves upon Qwen2.5-Math-72B and outperforms GPT-4o and
Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs.
* Hybrid architecture means up to 3X faster at the same accuracy
* Trained in FP8
* Great for VLMs
* Weights and instruct versions to come soon.

research.nvidia.com/labs/adlr/nemo…
Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing UltraLong-8B: We extended Llama3.1-8B-Instruct to support 1M, 2M, and 4M context windows by continuing pretraining on just 1B tokens. Performance on short-context tasks (e.g., MMLU, MATH) can be efficiently recovered with SFT using only 100K curated samples. We're

Mostofa Patwary (@mapatwary) 's Twitter Profile Photo

Nemotron-H base models (8B/47B/56B): A family of Hybrid Mamba-Transformer LLMs are now available on HuggingFace: huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… Technical Report: arxiv.org/abs/2504.03624 Blog: research.nvidia.com/labs/adlr/nemo…

Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B.
It achieves:
- AIME24: 69.0 (+13.5 gain by RL)
- AIME25: 53.6 (+14.4)
- LiveCodeBench: 44.4 (surprisingly, +6.8 gain after
Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

It takes great data to make a great model. We're opening the data curation pipeline for Nemotron models, and we're also posting as much of the Nemotron training and post-training data as possible. These days, data is a fundamental part of accelerated computing.

Albert Gu (@_albertgu) 's Twitter Profile Photo

exciting to see that hybrid models maintain reasoning performance with few attention layers. benefits of linear architectures are prominent for long reasoning traces, when efficiency is bottlenecked by decoding - seems like a free win if reasoning ability is preserved as well!

Yang Chen (@ychennlp) 's Twitter Profile Photo

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models.

The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks.

✅AIME2025 (math): 53.6% -> 64.8%
✅LiveCodeBench
Mohammad Shoeybi (@mohammadshoeybi) 's Twitter Profile Photo

We just released Nemotron Nano V2 with great accuracies and unprecedented inference speeds. With the goal of true open source models, we also released most of the data used to train this model. Check it out!

clem 🤗 (@clementdelangue) 's Twitter Profile Photo

💚💚💚 9B: huggingface.co/nvidia/NVIDIA-… 9B Base: huggingface.co/nvidia/NVIDIA-… 12B Base: huggingface.co/nvidia/NVIDIA-… Pre-training datasets: huggingface.co/collections/nv…