Nikhil Anand (@nikhil_anand91) 's Twitter Profile
Nikhil Anand

@nikhil_anand91

Physicist-turned-machine-learner, currently research scientist @kempnerinst

ID: 1208096007466164229

linkhttps://nikhilanand91.github.io/ calendar_today20-12-2019 18:45:17

14 Tweet

57 Takipçi

264 Takip Edilen

Nikhil Anand (@nikhil_anand91) 's Twitter Profile Photo

Happy to share our EMNLP paper w/ Josh Tan where we apply Variance of Gradients (VoG) – originally developed by Chirag Agarwal, Daniel D'souza , and Sara Hooker – for selecting important data in language-based tasks. At EMNLP? Let's connect to discuss data quality and/or LLMs! #EMNLP

Nikhil Anand (@nikhil_anand91) 's Twitter Profile Photo

Really cool work led by Devin Kwok (McGill/Mila) on making sense of example difficulty. Addresses some key ?s: E.g, How consistent is measured difficulty across inits and for different architectures? Can we fingerprint models using a few key sensitive/hard examples?

Eran Malach (@eranmalach) 's Twitter Profile Photo

MoEs increase parameter count but not FLOPs. Do they offer "free lunch", improving performance without paying in compute? Our answer: for memorization, MoEs give performance gains "for free", but have limited benefit for reasoning! Arxiv: arxiv.org/pdf/2410.19034 🦜🦜🦜

MoEs increase parameter count but not FLOPs. Do they offer "free lunch", improving performance without paying in compute?

Our answer: for memorization, MoEs give performance gains "for free", but have limited benefit for reasoning!

Arxiv: arxiv.org/pdf/2410.19034 🦜🦜🦜
Nikhil Anand (@nikhil_anand91) 's Twitter Profile Photo

How do different data distributions interact with scaling laws? And how does training data affect test loss? We find simple shifted power law fits can relate performance across (sometimes very disparate) datasets and losses. See David's thread for more details!

David Alvarez Melis (@elmelis) 's Twitter Profile Photo

🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs. It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔

Eran Malach (@eranmalach) 's Twitter Profile Photo

How does RL improve performance on math reasoning? Studying RL from pretrained models is hard, as behavior depends on choice of base model. 🚨 In our new work, we train models *from scratch* to study the effect of the data mix on the behavior of RL. arxiv.org/abs/2504.07912

How does RL improve performance on math reasoning? Studying RL from pretrained models is hard, as behavior depends on choice of base model. 🚨 In our new work, we train models *from scratch* to study the effect of the data mix on the behavior of RL. arxiv.org/abs/2504.07912
Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

New in the #DeeperLearningBlog: Kempner researchers Nikhil Anand (Nikhil Anand) and Chloe Su (Chloe H. Su) discuss new work on how numerical precision can impact the accuracy and stability of #LLMs. kempnerinstitute.harvard.edu/research/deepe… #AI (1/2)

Nikhil Anand (@nikhil_anand91) 's Twitter Profile Photo

Excited to share this work on understanding low-precision instabilities in model training! See our thread below for more details. Paper: arxiv.org/abs/2506.20752 Blogpost: tinyurl.com/lowprecinstabi…

Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

Interested in the latest work from the #KempnerInstitute? Check out papers and preprints from June's Research Roundup. kempnerinstitute.harvard.edu/kempner-commun… Abstracts and links below. 🧵 (1/21) #AI #neuroscience #NeuroAI

Timothy Nguyen (@iamtimnguyen) 's Twitter Profile Photo

I respectfully disagree with Ed. Was Kepler's planetary analysis "real" mathematics or just astronomy? Are IMO problems "real" mathematics or just puzzles for high school students? Is photography "real" art or just tool use? The label "real" is a personal, aesthetic judgment,