Sadhika Malladi (@sadhikamalladi) 's Twitter Profile
Sadhika Malladi

@sadhikamalladi

CS PhD student at Princeton

ID: 1532167008074227713

linkhttps://www.cs.princeton.edu/~smalladi/index.html calendar_today02-06-2022 01:07:52

211 Tweet

1,1K Followers

199 Following

Christina Baek (@_christinabaek) 's Twitter Profile Photo

Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N

Are current reasoning models optimal for test-time scaling? 🌠
No! Models make the same incorrect guess over and over again.

We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math!

1/N
Stanley Wei @ ICLR 2025 (@stanleyrwei) 's Twitter Profile Photo

New unlearning work at #ICLR2025! We give guarantees for unlearning a simple class of language models (topic models), and we further show it's easier to unlearn pretraining data during fine-tuning, without even modifying the base model. Paper: arxiv.org/abs/2411.12600 đź§µ:

Tanya Marwah (@__tm__157) 's Twitter Profile Photo

What is the role of memory for modeling time dependent PDEs? I will be at ICLR presenting our paper (Oral) where we study when it is beneficial for modeling time-dependent PDEs! 🔗openreview.net/forum?id=o9kqa… [Oral]: Thu 24 Apr 10:30 am @ Session 1E [Poster]: Thu 24 Apr 3 pm #617

Surbhi Goel (@surbhigoel_) 's Twitter Profile Photo

Thrilled that Abhishek Panigrahi is presenting our paper (joint with Bingbin Liu, Sadhika Malladi, Andrej Risteski) on the benefits of progressive distillation as an oral at #ICLR2025. Talk details below ⬇️ Also check out our blog post: unprovenalgos.github.io/progressive-di…

Aditi Raghunathan (@adtraghunathan) 's Twitter Profile Photo

Excited to present our recent findings on "catastrophic overtraining" where more pre-training data (shockingly) can lead to worse downstream models.

Jacob Springer (@jacspringer) 's Twitter Profile Photo

Our paper on how overtraining LLMs can make fine-tuning harder won awards at two different #ICLR2025 workshops! I'm honored and thrilled! Outstanding paper @ SCOPE Entropic Paper Award @ ICBINB

Our paper on how overtraining LLMs can make fine-tuning harder won awards at two different #ICLR2025 workshops! I'm honored and thrilled!
Outstanding paper @ SCOPE
Entropic Paper Award @ ICBINB
Sadhika Malladi (@sadhikamalladi) 's Twitter Profile Photo

Our work on the surprising negative effects of pre-training LMs for longer has received awards from both of these workshops and has been accepted to ICML 25! One of my first papers as a senior author :) arxiv.org/abs/2503.19206

Jeremy Bernstein (@jxbz) 's Twitter Profile Photo

I was really grateful to have the chance to speak at Cohere Labs and ML Collective last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here... (1/6)

I was really grateful to have the chance to speak at <a href="/Cohere_Labs/">Cohere Labs</a> and <a href="/ml_collective/">ML Collective</a> last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here...

(1/6)
Nived Rajaraman (@nived_rajaraman) 's Twitter Profile Photo

The abstract submission deadline for FoPt has been extended to the 21st of May (11:59pm UTC). Submission website: openreview.net/group?id=learn…

Yiding Jiang (@yidingjiang) 's Twitter Profile Photo

Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog (with Allan Zhou ) about this neat idea that motivated ADO but didn’t make it into the paper. yidingjiang.github.io/blog/post/curr…

Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

Adam is similar to many algorithms, but cannot be effectively replaced by any simpler variant in LMs. The community is starting to get the recipe right, but what is the secret sauce? Robert M. Gower 🇺🇦 and I found that it has to do with the beta parameters and variational inference.

Adam is similar to many algorithms, but cannot be effectively replaced by any simpler variant in LMs.
The community is starting to get the recipe right, but what is the secret sauce?

<a href="/gowerrobert/">Robert M. Gower 🇺🇦</a> and I found that it has to do with the beta parameters and variational inference.