Hemil Desai (@hemildesai10) Twitter Tweets • TwiCopy

Agentica Project

10 months ago

🚀 Introducing DeepSWE 🤖: our fully open-sourced, SOTA software engineering agent trained purely with RL on top of Qwen3-32B. DeepSWE achieves 59% on SWEBench-Verified with test-time scaling (and 42.2% Pass@1), topping the SWEBench leaderboard for open-weight models. 💪DeepSWE

thumb_up_off_alt345

chat_bubble_outline15

repeat65

shareShare

Hemil Desai

@hemildesai10

10 months ago

NeMo-Skills is a powerhouse for just about any workflow related to LLMs. Glad to have indirectly contributed to it via github.com/NVIDIA-NeMo/Run

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

Ernest Ryu

@ernestryu

10 months ago

New lecture recordings on RL+LLM! 📺 This spring, I gave a lecture series titled **Reinforcement Learning of Large Language Models**. I have decided to re-record these lectures and share them on YouTube. (1/7)

thumb_up_off_alt1,1K

chat_bubble_outline10

repeat150

shareShare

Jacob Austin

@jacobaustin132

8 months ago

Today we're putting out an update to the JAX TPU book, this time on GPUs. How do GPUs work, especially compared to TPUs? How are they networked? And how does this affect LLM training? 1/n

thumb_up_off_alt3,3K

chat_bubble_outline36

repeat516

shareShare

Bryan Catanzaro

@ctnzr

8 months ago

Today we're releasing NVIDIA Nemotron Nano v2 - a 9B hybrid SSM that is 6X faster than similarly sized models, while also being more accurate. Along with this model, we are also releasing most of the data we used to create it, including the pretraining corpus. Links to the

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat228

shareShare

Alex Zhang

@a1zhang

8 months ago

announcing the GPU MODE x Scale ML summer speaker series happening next week, a 5⃣-day series where top researchers will teach about the algorithmic and systems-level advances that underpin `gpt-oss`! all content will be live-streamed & recorded for FREE on GPU MODE's YouTube!

announcing the <a href="/GPU_MODE/">GPU MODE</a> x <a href="/scaleml/">Scale ML</a> summer speaker series happening next week, a 5⃣-day series where top researchers will teach about the algorithmic and systems-level advances that underpin `gpt-oss`!

all content will be live-streamed & recorded for FREE on GPU MODE's YouTube!

thumb_up_off_alt270

chat_bubble_outline1

repeat44

shareShare

Thinking Machines

@thinkymachines

8 months ago

Today Thinking Machines Lab is launching our research blog, Connectionism. Our first blog post is “Defeating Nondeterminism in LLM Inference” We believe that science is better when shared. Connectionism will cover topics as varied as our research is: from kernel numerics to

thumb_up_off_alt6,6K

chat_bubble_outline205

repeat1,1K

shareShare

Horace He

@chhillee

8 months ago

Apologies that I haven't written anything since joining Thinking Machines but I hope this blog post on a topic very near and dear to my heart (reproducible floating point numerics in LLM inference) will make up for it!

thumb_up_off_alt2,2K

chat_bubble_outline57

repeat139

shareShare

Hemil Desai

@hemildesai10

7 months ago

A while back, we set out to scale Automodel to the trillion-parameter range using native PyTorch parallelism and DTensor APIs. Enabling Pipeline Parallelism (PP) was the first major milestone toward that goal. This post outlines the key challenges we had to address to make PP

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Hemil Desai

@hemildesai10

6 months ago

We implemented optimized MoEs in NeMo Automodel with >200 TFLOPs per GPU per second on H100s in BF16 for Deepseek v3, both GPTOSS variants, and Qwen 3 MoE 30b. Perf achieved by combining PyTorch native Parallelisms (FSDP, EP and PP) with NVIDIA ’s Transformer Engine and

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Hemil Desai

@hemildesai10

6 months ago

Making MoEs go brrrr in PyTorch - developer.nvidia.com/blog/accelerat…

thumb_up_off_alt4

chat_bubble_outline1

repeat0

shareShare

Hemil Desai

@hemildesai10

5 months ago

Happy to have contributed to this effort 🚀 We also published an in depth technical deep dive on the implementation. Check it out here - github.com/NVIDIA-NeMo/Au…

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

NVIDIA AI Developer

@nvidiaaidev

4 months ago

Nemotron 3 Nano is now leading its size class on the latest Artificial Analysis leaderboards, combining strong intelligence, high openness, and blazing output speed in a compact package. Part of new NVIDIA Nemotron 3 family, Nemotron 3 Nano is powered by various advanced

thumb_up_off_alt336

chat_bubble_outline8

repeat42

shareShare

Hemil Desai

@hemildesai10

3 months ago

We just added MoE expert load balance metric visualization to NVIDIA AI NeMo Automodel. Check it out here github.com/NVIDIA-NeMo/Au… Monitor expert utilization and load imbalance in your PyTorch based MoE training run with ease.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Hemil Desai

@hemildesai10

2 months ago

Day 0 support for Qwen 3.5-397B-A17B is here. Try it out with advanced MoE expert metrics - github.com/NVIDIA-NeMo/Au…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Hemil Desai

@hemildesai10

2 months ago

HybridEP will soon be available in Automodel github.com/NVIDIA-NeMo/Au…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Bryan Catanzaro

@ctnzr

2 months ago

Announcing NVIDIA Nemotron 3 Super! 💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell 💚36 on AAIndex v4 💚up to 2.2X faster than GPT-OSS-120B in FP4 💚Open data, open recipe, open weights Models, Tech report, etc. here: research.nvidia.com/labs/nemotron/… And yes, Ultra is coming!

thumb_up_off_alt496

chat_bubble_outline25

repeat89

shareShare

Jiantao Jiao

@jiantaoj

2 months ago

Nemotron 3 Super arrived! With efficiency in mind (Hybrid SSM Latent MoE, designed for Blackwell), the accuracy is also incredible. The most important aspect is scaling RL, utilizing the highly efficient and scalable Nemo Gym backend for RL environments and Nemo RL for model

thumb_up_off_alt42

chat_bubble_outline0

repeat6

shareShare