Alex Meterez (@alexmeterez) Twitter Tweets • TwiCopy

Chloe H. Su

2 months ago

What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇

thumb_up_off_alt58

chat_bubble_outline10

repeat17

shareShare

Nikhil Anand

@nikhil_anand91

2 months ago

Excited to share this work on understanding low-precision instabilities in model training! See our thread below for more details. Paper: arxiv.org/abs/2506.20752 Blogpost: tinyurl.com/lowprecinstabi…

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Sadhika Malladi

@sadhikamalladi

2 months ago

Excited to be giving this talk at COLT tomorrow :) reach out if you want to chat about deriving useful theoretical insights into modern-day language models!

thumb_up_off_alt41

chat_bubble_outline0

repeat5

shareShare

Soufiane Hayou

@hayou_soufiane

2 months ago

LoRA is amazing for finetuning large models cheaply, but WHERE you place the adapters makes a huge difference. Most people are just guessing where to put them (Attention, MLP, etc). Meet "PLoP" (Precise LoRA Placement) 🎯, our new method for automatic LoRA placement 🧵

thumb_up_off_alt25

chat_bubble_outline1

repeat5

shareShare

Hanlin Zhang

@_hanlin_zhang_

2 months ago

[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces: ✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale. ✏️Intermediate checkpoints with incomplete learning

thumb_up_off_alt238

chat_bubble_outline1

repeat13

shareShare

Jaeyeon Kim

@jaeyeon_kim_0

2 months ago

Excited to share that I’ll be presenting two oral papers in this ICML—see u guys in Vancouver!!🇨🇦 1️⃣ arxiv.org/abs/2502.06768 Understanding Masked Diffusion Models theoretically/scientifically 2️⃣ arxiv.org/abs/2502.09376 Theoretical analysis on LoRA training

thumb_up_off_alt247

chat_bubble_outline4

repeat31

shareShare

Yixiao Huang

@yixiaoh22

2 months ago

🧵1/7 Why do LLMs generalize so well, yet also hallucinate? 🤔 Our new paper argues they are two sides of the same coin, driven by a single powerful mechanism: Out-of-Context Reasoning (OCR). 🔗 arXiv: arxiv.org/abs/2506.10887

thumb_up_off_alt26

chat_bubble_outline3

repeat6

shareShare

Cengiz Pehlevan

@cpehlevan

2 months ago

Great to see this one finally out in PNAS! Asymptotic theory of in-context learning by linear attention pnas.org/doi/10.1073/pn… Many thanks to my amazing co-authors Yue Lu, Mary Letey, Jacob Zavatone-Veth and Anindita Maiti

thumb_up_off_alt119

chat_bubble_outline1

repeat14

shareShare

Piersilvio De Bartolomeis

@pdebartols

2 months ago

Excited to announce our workshop on Causality in Science at #NeurIPS2025! See you in San Diego 🌴🇺🇸

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Lorenzo Noci

@lorenzo_noci

2 months ago

Pass by if you want to know about scaling up your model under distribution shifts of the training data. Take away: muP needs to be tuned to the optimal amount of feature learning that optimizes the forgetting/plasticity trade off.

thumb_up_off_alt25

chat_bubble_outline0

repeat4

shareShare

M Ganesh Kumar

@mgkumar138

2 months ago

First #ICML2025 conference proceeding (icml.cc/virtual/2025/p…)! We (Blake Bordelon, Jacob Zavatone-Veth, Cengiz Pehlevan) developed a simple model to better understand state representation learning dynamics in both artificial and biological intelligent systems! Comments appreciated!

thumb_up_off_alt13

chat_bubble_outline0

repeat4

shareShare

Kempner Institute at Harvard University

@kempnerinst

2 months ago

A team from #KempnerInstitute, Harvard SEAS & Computer Science at UT Austin has won a best paper award at #ICML2025 for work unlocking the potential of masked diffusion models. Congrats to Jaeyeon (Jay) Kim @ICML, Kulin Shah, Vasilis Kontonis, Sham Kakade and Sitan Chen. kempnerinstitute.harvard.edu/news/kempner-i… #AI

thumb_up_off_alt34

chat_bubble_outline0

repeat7

shareShare

Blake Bordelon ☕️🧪👨‍💻

@blake__bordelon

2 months ago

ICML this week! Come by T PM Clarissa Lauditi's work on muP BNNs arxiv.org/abs/2502.07998 W AM, model of place field adaptation@mgkumar138, Jacob ZV biorxiv.org/content/10.110… W PM a model of LR transfer in linear NNs arxiv.org/abs/2502.02531 all from senior author Cengiz Pehlevan!

thumb_up_off_alt26

chat_bubble_outline0

repeat4

shareShare

Antonio Orvieto

@orvieto_antonio

a month ago

As Micah Goldblum and coauthors, we also found that small batches make SGD effective in LM training. It's cool that our papers came out around the same time, and each has a different perspective! Below, our take on why this happens. Our awesome team: Teodora Jonas Geiping

As <a href="/micahgoldblum/">Micah Goldblum</a> and coauthors, we also found that small batches make SGD effective in LM training. It's cool that our papers came out around the same time, and each has a different perspective!

Below, our take on why this happens.

Our awesome team: <a href="/teodorasrec/">Teodora</a> <a href="/jonasgeiping/">Jonas Geiping</a>

thumb_up_off_alt147

chat_bubble_outline4

repeat17

shareShare

Antonio Orvieto

@orvieto_antonio

a month ago

Come to HilD tomorrow International Conference on Minority Languages ! We have 4 posters on optimization: - In Search of Adam’s Secret Sauce - Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling - On the Interaction of Noise, Compression Role, and Adaptivity under (L0,L1)-Smoothness

thumb_up_off_alt47

chat_bubble_outline0

repeat5

shareShare

Kempner Institute at Harvard University

@kempnerinst

a month ago

New in the #DeeperLearningBlog: the #KempnerInstitute's Mary Letey presents work recently published in PNAS that offers generalizable insights into in-context learning (ICL) in an analytically-solvable model architecture. bit.ly/4lPK15p #AI PNASNews (1/2)

thumb_up_off_alt24

chat_bubble_outline1

repeat8

shareShare

Alex Meterez

@alexmeterez

23 days ago

blake taught me a bunch of stuff in my 1st year at Harvard, and he's an all around swell fella, you should work with him!

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Cengiz Pehlevan

@cpehlevan

13 days ago

Very excited to be a part of this amazing collaboration! Thanks Simons Foundation for the support!

thumb_up_off_alt80

chat_bubble_outline3

repeat6

shareShare

Jaeyeon Kim

@jaeyeon_kim_0

12 days ago

🚨New video! I talked about my research — 1️⃣ My work at Seoul National University 2️⃣ Our Outstanding Paper Award at ICML 2025 Also check out another podcast with Yilun Du — an amazing researcher & AP at Harvard CS! Harvard CS is growing fast 🫡

thumb_up_off_alt84

chat_bubble_outline1

repeat4

shareShare

Antonio Orvieto

@orvieto_antonio

10 days ago

This winter I'm speaking at the LLM & Control workshop of CDC in Rio! If you are interested in this awesome combo, pls register!! sites.google.com/view/cdc2025-l…

thumb_up_off_alt19

chat_bubble_outline0

repeat3

shareShare