Nikhil Anand (@nikhil_anand91) Twitter Tweets • TwiCopy

Nikhil Anand

@nikhil_anand91

+ Follow

Physicist-turned-machine-learner, currently research scientist @kempnerinst

ID: 1208096007466164229

linkhttps://nikhilanand91.github.io/ calendar_today20-12-2019 18:45:17

14 Tweet

57 Takipçi

264 Takip Edilen

Nikhil Anand

@nikhil_anand91

2 years ago

Happy to share our EMNLP paper w/ Josh Tan where we apply Variance of Gradients (VoG) – originally developed by Chirag Agarwal, Daniel D'souza , and Sara Hooker – for selecting important data in language-based tasks. At EMNLP? Let's connect to discuss data quality and/or LLMs! #EMNLP

thumb_up_off_alt11

chat_bubble_outline0

repeat4

shareShare

Nikhil Anand

@nikhil_anand91

2 years ago

Really cool work led by Devin Kwok (McGill/Mila) on making sense of example difficulty. Addresses some key ?s: E.g, How consistent is measured difficulty across inits and for different architectures? Can we fingerprint models using a few key sensitive/hard examples?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Eran Malach

@eranmalach

a year ago

MoEs increase parameter count but not FLOPs. Do they offer "free lunch", improving performance without paying in compute? Our answer: for memorization, MoEs give performance gains "for free", but have limited benefit for reasoning! Arxiv: arxiv.org/pdf/2410.19034 🦜🦜🦜

thumb_up_off_alt474

chat_bubble_outline6

repeat86

shareShare

Nikhil Anand

@nikhil_anand91

a year ago

How do different data distributions interact with scaling laws? And how does training data affect test loss? We find simple shifted power law fits can relate performance across (sometimes very disparate) datasets and losses. See David's thread for more details!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Nikhil Anand

@nikhil_anand91

a year ago

At NeurIPS? Come discuss loss-to-loss prediction and scaling laws with us!

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

David Alvarez Melis

@elmelis

7 months ago

🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs. It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔

thumb_up_off_alt26

chat_bubble_outline1

repeat10

shareShare

Eran Malach

@eranmalach

7 months ago

How does RL improve performance on math reasoning? Studying RL from pretrained models is hard, as behavior depends on choice of base model. 🚨 In our new work, we train models *from scratch* to study the effect of the data mix on the behavior of RL. arxiv.org/abs/2504.07912

thumb_up_off_alt138

chat_bubble_outline3

repeat35

shareShare

Kempner Institute at Harvard University

@kempnerinst

5 months ago

New in the #DeeperLearningBlog: Kempner researchers Nikhil Anand (Nikhil Anand) and Chloe Su (Chloe H. Su) discuss new work on how numerical precision can impact the accuracy and stability of #LLMs. kempnerinstitute.harvard.edu/research/deepe… #AI (1/2)

thumb_up_off_alt16

chat_bubble_outline1

repeat5

shareShare

Nikhil Anand

@nikhil_anand91

5 months ago

Excited to share this work on understanding low-precision instabilities in model training! See our thread below for more details. Paper: arxiv.org/abs/2506.20752 Blogpost: tinyurl.com/lowprecinstabi…

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Kempner Institute at Harvard University

@kempnerinst

5 months ago

Interested in the latest work from the #KempnerInstitute? Check out papers and preprints from June's Research Roundup. kempnerinstitute.harvard.edu/kempner-commun… Abstracts and links below. 🧵 (1/21) #AI #neuroscience #NeuroAI

thumb_up_off_alt19

chat_bubble_outline2

repeat3

shareShare

Kempner Institute at Harvard University

@kempnerinst

5 months ago

Isabel Papadimitriou Naomi Saphra Riley Simmons-Edler Ryan P. Badman Raymond Chua John J. Vastola Kanaka Rajan David Alvarez Melis Valérie Costa Thomas Fel Ekdeep Singh Bahareh Tolooshams (Dem + 1) x Ba Morris Yau Sam Gershman 'Decomposing Elements of Problem Solving: What "Math" Does RL Teach?' Tian Qin, Core Francisco Park, Mujin Kwun, Aaron Walsman, Eran Malach, Nikhil Anand, Hidenori Tanaka , David Alvarez Melis doi.org/10.48550/arXiv… (15/21)

<a href="/isabelpapad/">Isabel Papadimitriou</a> <a href="/nsaphra/">Naomi Saphra</a> <a href="/SimmonsEdler/">Riley Simmons-Edler</a> <a href="/RyanPaulBadman1/">Ryan P. Badman</a> <a href="/RaymondRChua/">Raymond Chua</a> <a href="/johnjvastola/">John J. Vastola</a> <a href="/KanakaRajanPhD/">Kanaka Rajan</a> <a href="/elmelis/">David Alvarez Melis</a> <a href="/_valerie_costa_/">Valérie Costa</a> <a href="/Napoolar/">Thomas Fel</a> <a href="/EkdeepL/">Ekdeep Singh</a> <a href="/BTolooshams/">Bahareh Tolooshams</a> <a href="/dunbar_ba/">(Dem + 1) x Ba</a> <a href="/MorrisYau/">Morris Yau</a> <a href="/gershbrain/">Sam Gershman</a> 'Decomposing Elements of Problem Solving: What "Math" Does RL Teach?'

Tian Qin, <a href="/corefpark/">Core Francisco Park</a>, Mujin Kwun, <a href="/aaronwalsman/">Aaron Walsman</a>, <a href="/EranMalach/">Eran Malach</a>, <a href="/nikhil_anand91/">Nikhil Anand</a>, <a href="/Hidenori8Tanaka/">Hidenori Tanaka</a> , <a href="/elmelis/">David Alvarez Melis</a>

doi.org/10.48550/arXiv…

(15/21)

thumb_up_off_alt4

chat_bubble_outline2

repeat2

shareShare

Timothy Nguyen

@iamtimnguyen

3 months ago

I respectfully disagree with Ed. Was Kepler's planetary analysis "real" mathematics or just astronomy? Are IMO problems "real" mathematics or just puzzles for high school students? Is photography "real" art or just tool use? The label "real" is a personal, aesthetic judgment,

thumb_up_off_alt235

chat_bubble_outline32

repeat13

shareShare