Sanjeev Satheesh (@issanjeev) 's Twitter Profile
Sanjeev Satheesh

@issanjeev

ID: 31904918

calendar_today16-04-2009 15:41:02

1,1K Tweet

478 Takipçi

388 Takip Edilen

clem 🤗 (@clementdelangue) 's Twitter Profile Photo

Nvidia is currently #1 trending open model & #1 trending open dataset and closed to 25,000 followers on Hugging Face. They've been really impactful for open-source AI recently!

Nvidia is currently #1 trending open model & #1 trending open dataset and closed to 25,000 followers on Hugging Face. They've been really impactful for open-source AI recently!
Adi Renduchintala (@rendu_a) 's Twitter Profile Photo

Transformers are still dominating the LLM scene but we show that higher throughput alternatives exist which are just as strong! Grateful to have a part in Nemotron-H Reasoning effort. 🙏 Technical report will be out soon, stay tuned!

Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

Oh wow, did you guys know that torch.compile can compile numpy code? And even run it on GPU? This is pretty neat for all kinds of "surrounding" code besides the model (like evals and fancy metrics) that I used to do with numba/numexpr (cuz CPU-XLA was pretty meh). Poll below

Oh wow, did you guys know that torch.compile can compile numpy code? And even run it on GPU?

This is pretty neat for all kinds of "surrounding" code besides the model (like evals and fancy metrics) that I used to do with numba/numexpr (cuz CPU-XLA was pretty meh).

Poll below
Somshubra Majumdar (@haseox94) 's Twitter Profile Photo

OpenReasoning-Nemotron are super strong on math science and code, achieving scores that often surpass models way larger in size. We also show that using GenSelect algorithm, you can perform test time compute scaling to significantly improve scores on all benchmarks

Midnight Maniac Sri (@sridatta) 's Twitter Profile Photo

'water is transparent only within a very narrow band of the electromagnetic spectrum, so living organisms evolved sensitivity to that band, and that's what we now call "visible light". ' (found via HN)

'water is transparent only within a very narrow band of the electromagnetic spectrum, 

so living organisms evolved sensitivity to that band, and that's what we now call "visible light". '

(found via HN)
Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate.

Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus.

Links to the
Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data. This model also supports runtime "thinking" budget control. HF collection with base and post trained models: huggingface.co/collections/nv…

We are excited to release Nvidia-Nemotron-Nano-V2 model! This is a 9B hybrid SSM model with open base model and training data. This model also supports runtime "thinking" budget control. HF collection with base and post trained models: huggingface.co/collections/nv…
Eric W. Tramel (@fujikanaeda) 's Twitter Profile Photo

we got you a fast ssm 9b post we got you a 9b base we got you a 12b base teacher we got you 5 datasets we got you 💚 research.nvidia.com/labs/adlr/NVID…

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

NVIDIA just released Nemotron Nano v2 - a 9B hybrid SSM (Mamba) that is 6X faster than similarly sized models, while also being more accurate. Ready for commercial use, all available on Hugging Face 💾 Nemotron Nano 2 is a 9B hybrid Mamba Transformer for fast reasoning, up to

NVIDIA just released Nemotron Nano v2 - a 9B hybrid SSM (Mamba) that is 6X faster than similarly sized models, while also being more accurate.

Ready for commercial use, all available on <a href="/huggingface/">Hugging Face</a> 

💾 Nemotron Nano 2 is a 9B hybrid Mamba Transformer for fast reasoning, up to
ADP Research (@adpresearch) 's Twitter Profile Photo

It’s true: The rise of #AI is impacting jobs, and for tenured workers it signals a benefit. But at least for one very specific group of workers. Using ADP payroll data, Erik Brynjolfsson and his team at Stanford’s Stanford Digital Economy Lab found that employment among young adults whose work is exposed

It’s true: The rise of #AI is impacting jobs, and for tenured workers it signals a benefit. But at least for one very specific group of workers. Using ADP payroll data, <a href="/erikbryn/">Erik Brynjolfsson</a> and his team at Stanford’s <a href="/DigEconLab/">Stanford Digital Economy Lab</a> found that employment among young adults whose work is exposed
Bryan Catanzaro (@ctnzr) 's Twitter Profile Photo

As part of Nemotron, we're releasing a new Math dataset, made by rendering webpages using Lynx and then using an LLM to rewrite the result into LaTeX. Our models got much better at math when we started using this dataset. We hope it's helpful to the community. 💚

Sanjeev Satheesh (@issanjeev) 's Twitter Profile Photo

Nemotron-CC-Math is a 133B-token *benchmark-agnostic, pretraining dataset* built entirely from CommonCrawl. huggingface.co/datasets/nvidi…

Syeda Nahida Akter (@snat02792153) 's Twitter Profile Photo

✨ Core Idea: Treat chain-of-thought (CoT) as an action before next-token prediction. Reward = information gain on the next token. ✅ Dense, verifier-free signal ✅ Works and scales with ordinary pretraining text

✨ Core Idea:
 
Treat chain-of-thought (CoT) as an action before next-token prediction.

Reward = information gain on the next token.

✅ Dense, verifier-free signal
✅ Works and scales with ordinary pretraining text
Shrimai (@shrimai_) 's Twitter Profile Photo

💫 Introducing RLP: Reinforcement Learning Pretraining—information-driven, verifier-free objective that teaches models to think before they predict 🔥+19% vs BASE on Qwen3-1.7B 🚀+35% vs BASE on Nemotron-Nano-12B 📄Paper: github.com/NVlabs/RLP/blo… 📝Blog: research.nvidia.com/labs/adlr/RLP/

💫 Introducing RLP: Reinforcement Learning Pretraining—information-driven, verifier-free objective that teaches models to think before they predict
 🔥+19% vs BASE on Qwen3-1.7B
 🚀+35% vs BASE on Nemotron-Nano-12B
📄Paper: github.com/NVlabs/RLP/blo…
 📝Blog: research.nvidia.com/labs/adlr/RLP/
DailyPapers (@huggingpapers) 's Twitter Profile Photo

ServiceNow's Apriel-1.5-15B-Thinker: Frontier AI on a single GPU This 15B-parameter open-weights multimodal model achieves state-of-the-art reasoning performance, matching models 8-10x its size—all without an RL phase!

ServiceNow's Apriel-1.5-15B-Thinker: Frontier AI on a single GPU

This 15B-parameter open-weights multimodal model achieves state-of-the-art reasoning performance, matching models 8-10x its size—all without an RL phase!