Alex Meterez (@alexmeterez) 's Twitter Profile
Alex Meterez

@alexmeterez

cs phd student @harvard
deep learning theory, optimization, scale pilled

ID: 1243303287752925190

linkhttps://alexandrumeterez.github.io/ calendar_today26-03-2020 22:26:27

197 Tweet

173 Takipçi

788 Takip Edilen

Chloe H. Su (@huangyu58589918) 's Twitter Profile Photo

What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇

What precision should we use to train large AI models effectively? Our latest research probes the subtle nature of training instabilities under low precision formats like MXFP8 and ways to mitigate them. Thread 🧵👇
Nikhil Anand (@nikhil_anand91) 's Twitter Profile Photo

Excited to share this work on understanding low-precision instabilities in model training! See our thread below for more details. Paper: arxiv.org/abs/2506.20752 Blogpost: tinyurl.com/lowprecinstabi…

Sadhika Malladi (@sadhikamalladi) 's Twitter Profile Photo

Excited to be giving this talk at COLT tomorrow :) reach out if you want to chat about deriving useful theoretical insights into modern-day language models!

Soufiane Hayou (@hayou_soufiane) 's Twitter Profile Photo

LoRA is amazing for finetuning large models cheaply, but WHERE you place the adapters makes a huge difference. Most people are just guessing where to put them (Attention, MLP, etc). Meet "PLoP" (Precise LoRA Placement) 🎯, our new method for automatic LoRA placement 🧵

LoRA is amazing for finetuning large models cheaply, but WHERE you place the adapters makes a huge difference. Most people are just guessing where to put them (Attention, MLP, etc).

Meet "PLoP" (Precise LoRA Placement) 🎯, our new method for automatic LoRA placement 🧵
Hanlin Zhang (@_hanlin_zhang_) 's Twitter Profile Photo

[1/n] Discussions about LM reasoning and post-training have gained momentum. We identify several missing pieces: ✏️Post-training based on off-the-shelf base models without transparent pre-training data components and scale. ✏️Intermediate checkpoints with incomplete learning

Jaeyeon Kim (@jaeyeon_kim_0) 's Twitter Profile Photo

Excited to share that I’ll be presenting two oral papers in this ICML—see u guys in Vancouver!!🇨🇦 1️⃣ arxiv.org/abs/2502.06768 Understanding Masked Diffusion Models theoretically/scientifically 2️⃣ arxiv.org/abs/2502.09376 Theoretical analysis on LoRA training

Yixiao Huang (@yixiaoh22) 's Twitter Profile Photo

🧵1/7 Why do LLMs generalize so well, yet also hallucinate? 🤔 Our new paper argues they are two sides of the same coin, driven by a single powerful mechanism: Out-of-Context Reasoning (OCR). 🔗 arXiv: arxiv.org/abs/2506.10887

🧵1/7 Why do LLMs generalize so well, yet also hallucinate? 🤔

Our new paper argues they are two sides of the same coin, driven by a single powerful mechanism: Out-of-Context Reasoning (OCR).

🔗 arXiv: arxiv.org/abs/2506.10887
Cengiz Pehlevan (@cpehlevan) 's Twitter Profile Photo

Great to see this one finally out in PNAS! Asymptotic theory of in-context learning by linear attention pnas.org/doi/10.1073/pn… Many thanks to my amazing co-authors Yue Lu, Mary Letey, Jacob Zavatone-Veth and Anindita Maiti

Lorenzo Noci (@lorenzo_noci) 's Twitter Profile Photo

Pass by if you want to know about scaling up your model under distribution shifts of the training data. Take away: muP needs to be tuned to the optimal amount of feature learning that optimizes the forgetting/plasticity trade off.

M Ganesh Kumar (@mgkumar138) 's Twitter Profile Photo

First #ICML2025 conference proceeding (icml.cc/virtual/2025/p…)! We (Blake Bordelon, Jacob Zavatone-Veth, Cengiz Pehlevan) developed a simple model to better understand state representation learning dynamics in both artificial and biological intelligent systems! Comments appreciated!

Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

A team from #KempnerInstitute, Harvard SEAS & Computer Science at UT Austin has won a best paper award at #ICML2025 for work unlocking the potential of masked diffusion models. Congrats to Jaeyeon (Jay) Kim @ICML, Kulin Shah, Vasilis Kontonis, Sham Kakade and Sitan Chen. kempnerinstitute.harvard.edu/news/kempner-i… #AI

Blake Bordelon ☕️🧪👨‍💻 (@blake__bordelon) 's Twitter Profile Photo

ICML this week! Come by T PM Clarissa Lauditi's work on muP BNNs arxiv.org/abs/2502.07998 W AM, model of place field adaptation@mgkumar138, Jacob ZV biorxiv.org/content/10.110… W PM a model of LR transfer in linear NNs arxiv.org/abs/2502.02531 all from senior author Cengiz Pehlevan!

Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

As Micah Goldblum and coauthors, we also found that small batches make SGD effective in LM training. It's cool that our papers came out around the same time, and each has a different perspective! Below, our take on why this happens. Our awesome team: Teodora Jonas Geiping

As <a href="/micahgoldblum/">Micah Goldblum</a> and coauthors, we also found that small batches make SGD effective in LM training. It's cool that our papers came out around the same time, and each has a different perspective!

Below, our take on why this happens.

Our awesome team: <a href="/teodorasrec/">Teodora</a> <a href="/jonasgeiping/">Jonas Geiping</a>
Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

Come to HilD tomorrow International Conference on Minority Languages ! We have 4 posters on optimization: - In Search of Adam’s Secret Sauce - Is your batch size the problem? Revisiting the Adam-SGD gap in language modeling - On the Interaction of Noise, Compression Role, and Adaptivity under (L0,L1)-Smoothness

Kempner Institute at Harvard University (@kempnerinst) 's Twitter Profile Photo

New in the #DeeperLearningBlog: the #KempnerInstitute's Mary Letey presents work recently published in PNAS that offers generalizable insights into in-context learning (ICL) in an analytically-solvable model architecture. bit.ly/4lPK15p #AI PNASNews (1/2)

Jaeyeon Kim (@jaeyeon_kim_0) 's Twitter Profile Photo

🚨New video! I talked about my research — 1️⃣ My work at Seoul National University 2️⃣ Our Outstanding Paper Award at ICML 2025 Also check out another podcast with Yilun Du — an amazing researcher & AP at Harvard CS! Harvard CS is growing fast 🫡

Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

This winter I'm speaking at the LLM & Control workshop of CDC in Rio! If you are interested in this awesome combo, pls register!! sites.google.com/view/cdc2025-l…

This winter I'm speaking at the LLM &amp; Control workshop of CDC in Rio! If you are interested in this awesome combo, pls register!!

sites.google.com/view/cdc2025-l…