Shubham Toshniwal (@shubhamtoshniw6) 's Twitter Profile
Shubham Toshniwal

@shubhamtoshniw6

Research Scientist @ NVIDIA.
ex-Meta, TTIC, IIT Kanpur

ID: 1366526653892079621

linkhttp://shtoshni.github.io calendar_today01-03-2021 23:12:02

69 Tweet

301 Followers

299 Following

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Normalized Transformer - tricks to keep the activations constrained, improves training convergence; from NVIDIA Was pointed to this paper by lucidrains arxiv.org/abs/2410.01131

Normalized Transformer - tricks to keep the activations constrained, improves training convergence; from NVIDIA

Was pointed to this paper by lucidrains

arxiv.org/abs/2410.01131
Najoung Kim 🫠 (@najoungkim) 's Twitter Profile Photo

🧙 Come be my colleague! We have TWO Assistant Professor positions that might be of particular interest to folks in my reach, in Linguistics and CS. These hires are parts of an AI cluster hiring initiative led by BU Computing & Data Sciences (CDS). More below 👇

Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

Llama-3.1-Nemotron-70B-Instruct model aligned by our team is now live on lmarena.ai leaderboard with overall rank 9. Everything used to create this model is public: code, data and reward model. HF checkpoint: huggingface.co/nvidia/Llama-3…

Llama-3.1-Nemotron-70B-Instruct model aligned by our team is now live on lmarena.ai leaderboard with overall rank 9.

Everything used to create this model is public: code, data and reward model. HF checkpoint: huggingface.co/nvidia/Llama-3…
Freda Shi (@fredahshi) 's Twitter Profile Photo

I’d always be proud of receiving my PhD from TTIC, a magic place which gives you the most unique (in a positive sense, of course!) experience among all PhD programs. Do apply to TTIC !

Shubham Toshniwal (@shubhamtoshniw6) 's Twitter Profile Photo

"A research team led by neurobiologist Margaret Livingstone trained three rhesus macaques to identify symbols representing the numbers zero to 25. They then taught the test subjects how to perform addition.... According to the study, all three monkeys were on average capable of

Sean Welleck (@wellecks) 's Twitter Profile Photo

Check out our new benchmark for an increasingly important capability: generating synthetic data Among other insights, it turned out that the best problem solver was indeed not always the best teacher!

(((ل()(ل() 'yoav))))👾 (@yoavgo) 's Twitter Profile Photo

Sasha Rush or maybe in other words: i feel that with DL, our previous NLP training was helpful and allowed us to identify opportunities. with LLMs, it was the exact opposite, it blocked/hid opportunities from us.

NVIDIA AI Developer (@nvidiaaidev) 's Twitter Profile Photo

🎉 Huge congrats to our NVIDIA team “NemoSkills” for winning the AIMO-2 Competition 🏆 on @Kaggle. Their system solved 34 out of 50 problems in just 5 hours using 4 L4 GPUs. 🔢✨⏱️ kaggle.com/competitions/a… How? A powerhouse squad—Christof Henkel, Darragh Hanley, Ivan Sorokin,

🎉 Huge congrats to our NVIDIA team “NemoSkills” for winning the AIMO-2 Competition 🏆 on @Kaggle.  

Their system solved 34 out of 50 problems in just 5 hours using 4 L4 GPUs. 🔢✨⏱️

kaggle.com/competitions/a…

How? A powerhouse squad—Christof Henkel, Darragh Hanley, Ivan Sorokin,
Darragh (@gonedarragh) 's Twitter Profile Photo

Our team, NemoSkills, is presumptive winner of AIMO2. Outstanding organization from AIMO, Kaggle, XTX markets, Simon Frieder. Stay tuned for more performance evaluations currently underway. #NVIDIA - Ivan Moshkov Shubham Toshniwal Igor Gitman Dieter, IvanSorokin, BenediktSchifferer

Our team, NemoSkills, is presumptive winner of AIMO2. Outstanding organization from AIMO, Kaggle, XTX markets, <a href="/friederrrr/">Simon Frieder</a>. Stay tuned for more performance evaluations currently underway. #NVIDIA - <a href="/i_vainn/">Ivan Moshkov</a> <a href="/ShubhamToshniw6/">Shubham Toshniwal</a> <a href="/igtmn/">Igor Gitman</a> <a href="/kagglingdieter/">Dieter</a>, IvanSorokin, BenediktSchifferer
Darragh (@gonedarragh) 's Twitter Profile Photo

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset abs: arxiv.org/abs/2504.16891 ‼️💹New 5.5M solution math reasoning dataset ‼️📈New models 1.5B/7B/14B/32B+ AIMO2-14b So much learning from this team & #aimoprize!

AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
abs: arxiv.org/abs/2504.16891

‼️💹New 5.5M solution math reasoning dataset   
‼️📈New models 1.5B/7B/14B/32B+ AIMO2-14b  

So much learning from this team &amp; #aimoprize!
Dieter (@kagglingdieter) 's Twitter Profile Photo

Happy to announce that we published our 🥇 1st place winning model for the AI Math Olympiad (and smaller/ bigger variants) on Hugging Face Even our tiny 1.5B version beats the mighty DeepSeek-R1 on AIME math benchmark 🦾 huggingface.co/collections/nv…

Wei Ping (@_weiping) 's Twitter Profile Photo

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B. It achieves: - AIME24: 69.0 (+13.5 gain by RL) - AIME25: 53.6 (+14.4) - LiveCodeBench: 44.4 (surprisingly, +6.8 gain after

Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deepseek-R1-Distilled-Qwen-7B.
It achieves:
- AIME24: 69.0 (+13.5 gain by RL)
- AIME25: 53.6 (+14.4)
- LiveCodeBench: 44.4 (surprisingly, +6.8 gain after
Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

NVIDIA just open sourced Open Code Reasoning models - 32B, 14B AND 7B - APACHE 2.0 licensed 🔥 > Beats O3 mini & O1 (low) on LiveCodeBench 😍 Backed by OCR dataset the models are 30% token efficient than other equivalent Reasoning models Works with llama.cpp, vLLM,

NVIDIA just open sourced Open Code Reasoning models - 32B, 14B AND 7B - APACHE 2.0 licensed 🔥

&gt; Beats O3 mini &amp; O1 (low) on LiveCodeBench 😍

Backed by OCR dataset the models are 30% token efficient than other equivalent Reasoning models

Works with llama.cpp, vLLM,
Somshubra Majumdar (@haseox94) 's Twitter Profile Photo

We finally (!) released all our SOTA Code Reasoning models ! Play around with them and get Better scores than QwQ* with 20-30% fewer tokens ! Maybe even useful for code reasoning synthetic data generation? *With caveats (only code task, on average of 64 runs :D)

Oleksii Kuchaiev (@kuchaev) 's Twitter Profile Photo

NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try github.com/NVIDIA/NeMo-RL

Jason Weston (@jaseweston) 's Twitter Profile Photo

🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 - 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the

🚨Announcing RAM 2 workshop @ COLM25 - call for papers🚨 
- 10 years on, we present the sequel to the classic RAM🐏 (Reasoning, Attention, Memory) workshop that took place in 2015 at the cusp of major change in the area. Now in 2025 we reflect on what's happened and discuss the