Siddharth Singh (@siddharth_3773) Twitter Tweets • TwiCopy

VantAI

a year ago

Announcing Neo-1: the world’s most advanced atomistic foundation model, unifying structure prediction and all-atom de novo generation for the first time - to decode and design the structure of life 🧵(1/10)

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat377

shareShare

Brian Bartoldson

@bartoldson

a year ago

🚀 We fixed a major LLM post-training bottleneck! Our new method (TBA) combines trajectory balance with asynchronous training to speed up LLM RL 5-50x while improving results+scalability. For example, using VinePPO's GSM8K setup, we obtain +1.2% accuracy and 50x faster RL.

thumb_up_off_alt256

chat_bubble_outline3

repeat49

shareShare

Parallel Software and Systems Group

@hpc_group

a year ago

We are on a roll, second successful dissertation defense in a week (March 28)! Congratulations to Siddharth Singh on becoming the second PhD graduate from PSSG!! Dissertation title: "Optimizing Communication in Parallel Deep Learning on Exascale-class Machines" #HPC #AI #HPC4AI

We are on a roll, second successful dissertation defense in a week (March 28)! Congratulations to <a href="/siddharth_3773/">Siddharth Singh</a> on becoming the second PhD graduate from PSSG!!

Dissertation title: "Optimizing Communication in Parallel Deep Learning on Exascale-class Machines"

#HPC #AI #HPC4AI

thumb_up_off_alt13

chat_bubble_outline0

repeat1

shareShare

Gautam Kamath

@thegautamkamath

a year ago

Passing by luxury clothing retailers (Gucci, Prada, etc), I feel lucky that CS researchers don't waste money to play these types of vapid status games. Anyway, this is Dr. Prof. Kamath (rank-n university) thrilled to announce my group's 23 accepted NeurIPS papers!!!!

thumb_up_off_alt884

chat_bubble_outline9

repeat33

shareShare

Siddharth Singh

@siddharth_3773

a year ago

There are more ways to improve model quality apart from chucking in more compute (although the latter is what keeps me employed). Great work!

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

François Fleuret

@francoisfleuret

a year ago

As expected, that was popular. Here is my attempt at consolidating all the answers into a list. - Prenorm: normalization in the residual blocks before the attention operation and the FFN respectively - GQA (Group Query Attention): more Q than (K, V)

thumb_up_off_alt704

chat_bubble_outline8

repeat64

shareShare

Stella Li

@stellalisy

a year ago

We empirically prove this with surgical experiments: 🐍 Directly rewarding string “python” → +11.8% performance 🚫 Random rewards BUT blocking code → gains disappear The "magic" is just surfacing useful patterns already learned in pre-training.

thumb_up_off_alt128

chat_bubble_outline3

repeat3

shareShare

Ahmad Beirami @ ICLR 2025

@abeirami

a year ago

As we go through a lot of excitement about RL recently with lots of cool work/results, here is a reminder that RL with a reverse KL-regularizer to the base model cannot learn new skills that were not already present in the base model. It can only amplify the existing weak skills.

thumb_up_off_alt475

chat_bubble_outline12

repeat52

shareShare

Mihir Prabhudesai

@mihirp98

10 months ago

1/ Maximizing confidence indeed improves reasoning. We worked with Shashwat Goel, Nikhil Chandak Ameya P. for the past 3 weeks (over a zoom call and many emails!) and revised our evaluations to align with their suggested prompts/parsers/sampling params. This includes changing

1/ Maximizing confidence indeed improves reasoning. We worked with <a href="/ShashwatGoel7/">Shashwat Goel</a>, <a href="/nikhilchandak29/">Nikhil Chandak</a> <a href="/AmyPrb/">Ameya P.</a> for the past 3 weeks (over a zoom call and many emails!) and revised our evaluations to align with their suggested prompts/parsers/sampling params. This includes changing

thumb_up_off_alt49

chat_bubble_outline1

repeat13

shareShare

Shashwat Goel

@shashwatgoel7

10 months ago

Glad we could together improve the scientific discourse around reasoning. Was great to see the authors reach out and incorporate all our feedback!

thumb_up_off_alt24

chat_bubble_outline1

repeat5

shareShare

Aditya Tomar

@adityastomar_

9 months ago

Can we break the memory wall for LLM inference via KV cache rematerialization? 🚨 Introducing XQuant, which leverages underutilized compute units to eliminate the memory bottleneck for LLM inference! • 10–12.5x memory savings vs. FP16 • Near-zero accuracy loss • Beats

thumb_up_off_alt666

chat_bubble_outline26

repeat91

shareShare

Siddharth Singh

@siddharth_3773

7 months ago

Scenes when an LLM invests in VOO and outperforms everyone 😂

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jonas Geiping

@jonasgeiping

7 months ago

There's been a lot of discussion recently about parallel vs sequential reasoning. The recurrent models we trained this year are sequential, which makes them good at math, but slow (see pic) However, if you squint, models with recurrent-depth/loops are like diffusion models ...

thumb_up_off_alt69

chat_bubble_outline3

repeat15

shareShare

Abhinav Bhatele

@bhatele

6 months ago

A large number of PhD students in my group have graduated or will be graduating by Spring, so I am recruiting several PhD students for the next admission cycle (Fall 2026). If you want to work with us, apply by Dec 5 and drop me a short email. Please repost/share widely. #HPC #AI

thumb_up_off_alt210

chat_bubble_outline9

repeat62

shareShare

Ben Pouladian

@benitoz

5 months ago

Nemotron-Nano-V3 is NVIDIA’s next move: hybrid Mamba-Transformer-MoE, 30B params, beats China’s Qwen3 in quality and runs 6x faster on H200. This is the blueprint for physical AI Efficient long-context, sparse compute and models that scale across everything TPUs can’t touch!

thumb_up_off_alt210

chat_bubble_outline6

repeat18

shareShare

Bryan Catanzaro

@ctnzr

5 months ago

Today, @NVIDIA is launching the open Nemotron 3 model family, starting with Nano (30B-3A), which pushes the frontier of accuracy and inference efficiency with a novel hybrid SSM Mixture of Experts architecture. Super and Ultra are coming in the next few months.

thumb_up_off_alt1,1K

chat_bubble_outline41

repeat226

shareShare

Andrew Carr (e/🤸)

@andrew_n_carr

5 months ago

nano 3 30b a3b is surprisingly good. I'm so used to benchmaxxed models, it's refreshing to talk to a solid open model

thumb_up_off_alt180

chat_bubble_outline3

repeat10

shareShare

Xeophon

@thexeophon

5 months ago

he is right i have to use it more, but so far i prefer it over Qwen3 30B-A3B 👀 nvbros cooked!!

thumb_up_off_alt79

chat_bubble_outline3

repeat6

shareShare

Zhaocheng Zhu

@zhu_zhaocheng

5 months ago

📢 Hey open-source folks — you might not want to miss this. NVIDIA dropped Nemotron v3 Nano this morning. Is it just another checkpoint claiming SOTA? Not really. What makes this release incredible is that we're shipping the entire training stack behind it: the RL infra, the

thumb_up_off_alt158

chat_bubble_outline4

repeat19

shareShare

Jared Roesch

@roeschinc

5 months ago

Thrilled to announce we're open-sourcing the CUDA Tile dialect and bytecode! github.com/NVIDIA/cuda-ti… What's included: • CUDA Tile MLIR dialect • Bytecode serialization/deserialization support • MLIR Python bindings for programmatic IR construction •

thumb_up_off_alt755

chat_bubble_outline11

repeat120

shareShare