Siddarth Venkatraman (@siddarthv66) 's Twitter Profile
Siddarth Venkatraman

@siddarthv66

PhD at Mila | RL and other stuff I find interesting

ID: 1706322190721765376

linkhttps://hyperpotatoneo.github.io/ calendar_today25-09-2023 14:58:27

89 Tweet

189 Followers

248 Following

Shivam Agarwal (@shivamag12) 's Twitter Profile Photo

Can entropy minimization alone improve LLM performance? And how far can they go without any labeled data? This work answers both: yes, and surprisingly far 🐮 At inference EM can beat GPT4o Claude 3 opus & Gemini 1.5 pro on challenging scientific coding w/o any data/model update

Can entropy minimization alone improve LLM performance? And how far can they go without any labeled data? This work answers both: yes, and surprisingly far 🐮

At inference EM can beat GPT4o Claude 3 opus & Gemini 1.5 pro on challenging scientific coding w/o any data/model update
Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N

Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N
LawZero - LoiZéro (@lawzero_) 's Twitter Profile Photo

Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch LawZero - LoiZéro, a nonprofit dedicated to advancing safe-by-design AI. lawzero.org

Raj Ghugare (@ghugareraj) 's Twitter Profile Photo

Normalizing Flows (NFs) check all the boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models. Are NFs

Normalizing Flows (NFs) check all the boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models.

Are NFs
Siddarth Venkatraman (@siddarthv66) 's Twitter Profile Photo

I love LLM papers which repackage stuff known by basically every RL researcher and claim novelty. My favorite genre. Bonus points when this paper gets massively boosted by a twitter "ML influencer" and overshadows the original work.

Nanda H Krishna (@nandahkrishna) 's Twitter Profile Photo

New preprint! 🧠🤖 How do we build neural decoders that are: ⚡️ fast enough for real-time use 🎯 accurate across diverse tasks 🌍 generalizable to new sessions, subjects, and species? We present POSSM, a hybrid SSM architecture that optimizes for all three of these axes! 🧵1/7

New preprint! 🧠🤖
How do we build neural decoders that are:
⚡️ fast enough for real-time use
🎯 accurate across diverse tasks
🌍 generalizable to new sessions, subjects, and species?
We present POSSM, a hybrid SSM architecture that optimizes for all three of these axes!
🧵1/7
Geoffrey Hinton (@geoffreyhinton) 's Twitter Profile Photo

Congratulations to Yoshua Bengio on launching LawZero - LoiZéro — a research effort to advance safe-by-design AI, especially as frontier systems begin to exhibit signs of self-preservation and deceptive behaviour.

Sergey Levine (@svlevine) 's Twitter Profile Photo

I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post: sergeylevine.substack.com/p/language-mod…

Majdi Hassan (@majdi_has) 's Twitter Profile Photo

(1/n)🚨You can train a model solving DFT for any geometry almost without training data!🚨 Introducing Self-Refining Training for Amortized Density Functional Theory — a variational framework for learning a DFT solver that predicts the ground-state solutions for different

Emiliano Penaloza (@emilianopp_) 's Twitter Profile Photo

Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9

Jiaxin Wen @ICLR2025 (@jiaxinwen22) 's Twitter Profile Photo

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision.

Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.
Luke Rowe (@luke22r) 's Twitter Profile Photo

🚀 Our method, Poutine, was the best-performing entry in the 2025 Waymo Vision-based End-to-End Driving Challenge at #CVPR2025! Our 3 B-parameter VLM Poutine scored 7.99 RFS on the official test set—comfortably ahead of every other entry (see figure).

🚀 Our method, Poutine, was the best-performing entry in the 2025 Waymo Vision-based End-to-End Driving Challenge at #CVPR2025!

Our 3 B-parameter VLM Poutine scored 7.99 RFS on the official test set—comfortably ahead of every other entry (see figure).
Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…

Martin Klissarov (@martinklissarov) 's Twitter Profile Photo

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵