Mohammad Shoeybi (@mohammadshoeybi) Twitter Tweets • TwiCopy

Wei Ping

a year ago

Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary models (e.g., GPT-4o) and open-access models (e.g., InternVL 2). Remarkably, NVLM 1.0 shows improved text-only

thumb_up_off_alt479

chat_bubble_outline12

repeat126

shareShare

Mohammad Shoeybi

@mohammadshoeybi

a year ago

We are very excited to release our Common Crawl based large scale dataset. This 6.3T tokens dataset will help the community develop stronger models. Check it out!

thumb_up_off_alt16

chat_bubble_outline0

repeat4

shareShare

Wei Ping

@_weiping

a year ago

We’re at #NeurIPS 2024 in Vancouver, presenting two papers from NVIDIA on advancing state-of-the-art LLM RAG models! ChatQA: Surpassing GPT-4 on Conversational QA and RAG Thu 12 Dec 11 a.m. PST — 2 p.m. PST, West Ballroom A-D #7201 Paper: arxiv.org/abs/2401.10225 RankRAG:

thumb_up_off_alt59

chat_bubble_outline2

repeat11

shareShare

Wei Ping

@_weiping

a year ago

Open-sourcing the training code for NVLM-1.0 72B in: github.com/NVIDIA/Megatro…

thumb_up_off_alt120

chat_bubble_outline1

repeat33

shareShare

Wei Ping

@_weiping

a year ago

Introducing AceMath, a cutting-edge suite of math models designed to excel at solving complex math problems, complemented by highly effective reward models. Our flagship model, AceMath-72B-Instruct, significantly improves upon Qwen2.5-Math-72B and outperforms GPT-4o and

thumb_up_off_alt92

chat_bubble_outline2

repeat22

shareShare

Bryan Catanzaro

@ctnzr

9 months ago

Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy * Trained in FP8 * Great for VLMs * Weights and instruct versions to come soon. research.nvidia.com/labs/adlr/nemo…

thumb_up_off_alt639

chat_bubble_outline19

repeat102

shareShare

Wei Ping

@_weiping

8 months ago

Introducing UltraLong-8B: We extended Llama3.1-8B-Instruct to support 1M, 2M, and 4M context windows by continuing pretraining on just 1B tokens. Performance on short-context tasks (e.g., MMLU, MATH) can be efficiently recovered with SFT using only 100K curated samples. We're

thumb_up_off_alt54

chat_bubble_outline0

repeat17

shareShare

Mostofa Patwary

@mapatwary

8 months ago

Nemotron-H base models (8B/47B/56B): A family of Hybrid Mamba-Transformer LLMs are now available on HuggingFace: huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… huggingface.co/nvidia/Nemotro… Technical Report: arxiv.org/abs/2504.03624 Blog: research.nvidia.com/labs/adlr/nemo…

thumb_up_off_alt27

chat_bubble_outline1

repeat12

shareShare

Bryan Catanzaro

@ctnzr

8 months ago

Base model Nemotron-H weights have been released under a research license: huggingface.co/nvidia/Nemotro…

thumb_up_off_alt117

chat_bubble_outline3

repeat18

shareShare

Wei Ping

@_weiping

8 months ago

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after

thumb_up_off_alt75

chat_bubble_outline1

repeat23

shareShare

Bryan Catanzaro

@ctnzr

7 months ago

It takes great data to make a great model. We're opening the data curation pipeline for Nemotron models, and we're also posting as much of the Nemotron training and post-training data as possible. These days, data is a fundamental part of accelerated computing.

thumb_up_off_alt125

chat_bubble_outline4

repeat29

shareShare

Mohammad Shoeybi

@mohammadshoeybi

7 months ago

Checkout our recent work on advancing math and code reasoning through RL.

thumb_up_off_alt18

chat_bubble_outline1

repeat1

shareShare

Mohammad Shoeybi

@mohammadshoeybi

6 months ago

We released reasoning models for Nemotron-H 8B and 47B. Great accuracies at 4x inference speed.

thumb_up_off_alt33

chat_bubble_outline1

repeat6

shareShare

Albert Gu

@_albertgu

6 months ago

exciting to see that hybrid models maintain reasoning performance with few attention layers. benefits of linear architectures are prominent for long reasoning traces, when efficiency is bottlenecked by decoding - seems like a free win if reasoning ability is preserved as well!

thumb_up_off_alt91

chat_bubble_outline1

repeat12

shareShare

Yang Chen

@ychennlp

6 months ago

📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B model - AceReason-Nemotron-1.1, significantly improved from version 1.0 on math and coding benchmarks. ✅AIME2025 (math): 53.6% -> 64.8% ✅LiveCodeBench

thumb_up_off_alt120

chat_bubble_outline4

repeat28

shareShare

Mohammad Shoeybi

@mohammadshoeybi

6 months ago

Checkout our detailed study on advancing math and code reasoning using SFT and RL.

thumb_up_off_alt12

chat_bubble_outline1

repeat3

shareShare

Mohammad Shoeybi

@mohammadshoeybi

4 months ago

We just released Nemotron Nano V2 with great accuracies and unprecedented inference speeds. With the goal of true open source models, we also released most of the data used to train this model. Check it out!

thumb_up_off_alt38

chat_bubble_outline0

repeat3

shareShare

clem 🤗

@clementdelangue

4 months ago

💚💚💚 9B: huggingface.co/nvidia/NVIDIA-… 9B Base: huggingface.co/nvidia/NVIDIA-… 12B Base: huggingface.co/nvidia/NVIDIA-… Pre-training datasets: huggingface.co/collections/nv…

thumb_up_off_alt166

chat_bubble_outline3

repeat29

shareShare