Siddarth Venkatraman (@siddarthv66) Twitter Tweets • TwiCopy

Shivam Agarwal

6 months ago

Can entropy minimization alone improve LLM performance? And how far can they go without any labeled data? This work answers both: yes, and surprisingly far 🐮 At inference EM can beat GPT4o Claude 3 opus & Gemini 1.5 pro on challenging scientific coding w/o any data/model update

thumb_up_off_alt408

chat_bubble_outline12

repeat64

shareShare

Benjamin Thérien

@benjamintherien

6 months ago

Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N

thumb_up_off_alt81

chat_bubble_outline2

repeat25

shareShare

LawZero - LoiZéro

@lawzero_

6 months ago

Every frontier AI system should be grounded in a core commitment: to protect human joy and endeavour. Today, we launch LawZero - LoiZéro, a nonprofit dedicated to advancing safe-by-design AI. lawzero.org

thumb_up_off_alt277

chat_bubble_outline23

repeat75

shareShare

Raj Ghugare

@ghugareraj

6 months ago

Normalizing Flows (NFs) check all the boxes for RL: exact likelihoods (imitation learning), efficient sampling (real-time control), and variational inference (Q-learning)! Yet they are overlooked over more expensive and less flexible contemporaries like diffusion models. Are NFs

thumb_up_off_alt214

chat_bubble_outline6

repeat18

shareShare

Siddarth Venkatraman

@siddarthv66

6 months ago

I love LLM papers which repackage stuff known by basically every RL researcher and claim novelty. My favorite genre. Bonus points when this paper gets massively boosted by a twitter "ML influencer" and overshadows the original work.

thumb_up_off_alt14

chat_bubble_outline2

repeat1

shareShare

Nanda H Krishna

@nandahkrishna

5 months ago

New preprint! 🧠🤖 How do we build neural decoders that are: ⚡️ fast enough for real-time use 🎯 accurate across diverse tasks 🌍 generalizable to new sessions, subjects, and species? We present POSSM, a hybrid SSM architecture that optimizes for all three of these axes! 🧵1/7

thumb_up_off_alt54

chat_bubble_outline4

repeat24

shareShare

Geoffrey Hinton

@geoffreyhinton

5 months ago

Congratulations to Yoshua Bengio on launching LawZero - LoiZéro — a research effort to advance safe-by-design AI, especially as frontier systems begin to exhibit signs of self-preservation and deceptive behaviour.

thumb_up_off_alt1,1K

chat_bubble_outline44

repeat138

shareShare

Sergey Levine

@svlevine

5 months ago

I always found it puzzling how language models learn so much from next-token prediction, while video models learn so little from next frame prediction. Maybe it's because LLMs are actually brain scanners in disguise. Idle musings in my new blog post: sergeylevine.substack.com/p/language-mod…

thumb_up_off_alt723

chat_bubble_outline16

repeat82

shareShare

Majdi Hassan

@majdi_has

5 months ago

(1/n)🚨You can train a model solving DFT for any geometry almost without training data!🚨 Introducing Self-Refining Training for Amortized Density Functional Theory — a variational framework for learning a DFT solver that predicts the ground-state solutions for different

thumb_up_off_alt151

chat_bubble_outline3

repeat38

shareShare

Emiliano Penaloza

@emilianopp_

5 months ago

Excited that our paper "Addressing Concept Mislabeling in Concept Bottleneck Models Through Preference Optimization" was accepted to ICML 2025! We show how Preference Optimization can reduce the impact of noisy concept labels in CBMs. 🧵/9

thumb_up_off_alt30

chat_bubble_outline1

repeat21

shareShare

Jiaxin Wen @ICLR2025

@jiaxinwen22

5 months ago

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat153

shareShare

Luke Rowe

@luke22r

5 months ago

🚀 Our method, Poutine, was the best-performing entry in the 2025 Waymo Vision-based End-to-End Driving Challenge at #CVPR2025! Our 3 B-parameter VLM Poutine scored 7.99 RFS on the official test set—comfortably ahead of every other entry (see figure).

thumb_up_off_alt16

chat_bubble_outline1

repeat9

shareShare

Benjamin Thérien

@benjamintherien

5 months ago

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…

thumb_up_off_alt31

chat_bubble_outline1

repeat7

shareShare

Martin Klissarov

@martinklissarov

5 months ago

As AI agents face increasingly long and complex tasks, decomposing them into subtasks becomes increasingly appealing. But how do we discover such temporal structure? Hierarchical RL provides a natural formalism-yet many questions remain open. Here's our overview of the field🧵

thumb_up_off_alt151

chat_bubble_outline4

repeat35

shareShare