Sadhika Malladi (@sadhikamalladi) Twitter Tweets • TwiCopy

Christina Baek

4 months ago

Are current reasoning models optimal for test-time scaling? 🌠 No! Models make the same incorrect guess over and over again. We show that you can fix this problem w/o any crazy tricks 💫 – just do weight ensembling (WiSE-FT) for big gains on math! 1/N

thumb_up_off_alt478

chat_bubble_outline6

repeat103

shareShare

Stanley Wei @ ICLR 2025

@stanleyrwei

4 months ago

New unlearning work at #ICLR2025! We give guarantees for unlearning a simple class of language models (topic models), and we further show it's easier to unlearn pretraining data during fine-tuning, without even modifying the base model. Paper: arxiv.org/abs/2411.12600 🧵:

thumb_up_off_alt67

chat_bubble_outline2

repeat15

shareShare

Tanya Marwah

@__tm__157

4 months ago

What is the role of memory for modeling time dependent PDEs? I will be at ICLR presenting our paper (Oral) where we study when it is beneficial for modeling time-dependent PDEs! 🔗openreview.net/forum?id=o9kqa… [Oral]: Thu 24 Apr 10:30 am @ Session 1E [Poster]: Thu 24 Apr 3 pm #617

thumb_up_off_alt84

chat_bubble_outline1

repeat23

shareShare

Surbhi Goel

@surbhigoel_

4 months ago

Thrilled that Abhishek Panigrahi is presenting our paper (joint with Bingbin Liu, Sadhika Malladi, Andrej Risteski) on the benefits of progressive distillation as an oral at #ICLR2025. Talk details below ⬇️ Also check out our blog post: unprovenalgos.github.io/progressive-di…

thumb_up_off_alt17

chat_bubble_outline0

repeat3

shareShare

Aditi Raghunathan

@adtraghunathan

4 months ago

Excited to present our recent findings on "catastrophic overtraining" where more pre-training data (shockingly) can lead to worse downstream models.

thumb_up_off_alt47

chat_bubble_outline3

repeat6

shareShare

Jacob Springer

@jacspringer

4 months ago

Our paper on how overtraining LLMs can make fine-tuning harder won awards at two different #ICLR2025 workshops! I'm honored and thrilled! Outstanding paper @ SCOPE Entropic Paper Award @ ICBINB

thumb_up_off_alt125

chat_bubble_outline3

repeat12

shareShare

Sadhika Malladi

@sadhikamalladi

4 months ago

Our work on the surprising negative effects of pre-training LMs for longer has received awards from both of these workshops and has been accepted to ICML 25! One of my first papers as a senior author :) arxiv.org/abs/2503.19206

thumb_up_off_alt46

chat_bubble_outline0

repeat6

shareShare

MOSS

@moss_workshop

4 months ago

Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at ICML Conference 2025! 🔗Website: sites.google.com/view/moss2025 📝 We welcome submissions! 📅 Paper & jupyter notebook deadline: May 22, 2025 Topics: – Inductive biases & generalization – Training

Announcing the 1st Workshop on Methods and Opportunities at Small Scale (MOSS) at <a href="/icmlconf/">ICML Conference</a> 2025!

🔗Website: sites.google.com/view/moss2025

📝 We welcome submissions!
📅 Paper & jupyter notebook deadline: May 22, 2025

Topics:
– Inductive biases & generalization
– Training

thumb_up_off_alt42

chat_bubble_outline0

repeat11

shareShare

Sadhika Malladi

@sadhikamalladi

4 months ago

Honored to be an invited speaker at this COLT workshop :) do consider submitting relevant work, and see you in Lyon!

thumb_up_off_alt29

chat_bubble_outline2

repeat2

shareShare

Jeremy Bernstein

@jxbz

4 months ago

I was really grateful to have the chance to speak at Cohere Labs and ML Collective last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here... (1/6)

I was really grateful to have the chance to speak at <a href="/Cohere_Labs/">Cohere Labs</a> and <a href="/ml_collective/">ML Collective</a> last week. My goal was to make the most helpful talk that I could have seen as a first-year grad student interested in neural network optimization. Sharing some info about the talk here...

(1/6)

thumb_up_off_alt560

chat_bubble_outline10

repeat50

shareShare

Nived Rajaraman

@nived_rajaraman

3 months ago

The abstract submission deadline for FoPt has been extended to the 21st of May (11:59pm UTC). Submission website: openreview.net/group?id=learn…

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare

Yiding Jiang

@yidingjiang

3 months ago

Data selection and curriculum learning can be formally viewed as a compression protocol via prequential coding. New blog (with Allan Zhou ) about this neat idea that motivated ADO but didn’t make it into the paper. yidingjiang.github.io/blog/post/curr…

thumb_up_off_alt98

chat_bubble_outline2

repeat14

shareShare

Antonio Orvieto

@orvieto_antonio

3 months ago

Adam is similar to many algorithms, but cannot be effectively replaced by any simpler variant in LMs. The community is starting to get the recipe right, but what is the secret sauce? Robert M. Gower 🇺🇦 and I found that it has to do with the beta parameters and variational inference.

thumb_up_off_alt259

chat_bubble_outline10

repeat37

shareShare