Ekaterina Lobacheva (@katelobacheva) Twitter Tweets • TwiCopy

Sara Hooker

6 months ago

I'm starting a new project. Working on what I consider to be the most important problem: building thinking machines that adapt and continuously learn. We have incredibly talent dense founding team + are hiring for engineering, ops, design. Join us: adaptionlabs.ai

thumb_up_off_alt2,2K

chat_bubble_outline183

repeat184

shareShare

Ekaterina Lobacheva

@katelobacheva

6 months ago

Mathieu is great!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Mila - Institut québécois d'IA

@mila_quebec

5 months ago

Mila's annual supervision request process is now open to receive MSc and PhD applications for Fall 2026 admission! For more information, visit mila.quebec/en/prospective…

thumb_up_off_alt106

chat_bubble_outline2

repeat62

shareShare

Ekaterina Lobacheva

@katelobacheva

5 months ago

Nadia is a pleasure to work with - really recommend! 🚀

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Atli Kosson

@atlikosson

5 months ago

The Maximal Update Parameterization (µP) allows LR transfer from small to large models, saving costly tuning. But why is independent weight decay (IWD) essential for it to work? We find µP stabilizes early training (like an LR warmup), but IWD takes over in the long term! 🧵

thumb_up_off_alt289

chat_bubble_outline11

repeat41

shareShare

Ekaterina Lobacheva

@katelobacheva

5 months ago

🚨 We’re recruiting new students for Fall 2026 — come join Chandar Lab! 🚨

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Mathieu

@miniapeur

5 months ago

thumb_up_off_alt4,4K

chat_bubble_outline17

repeat449

shareShare

Ekaterina Lobacheva

@katelobacheva

5 months ago

A talk on our recent paper on zero-sum learning, now with new links to generalization, circuit emergence, and optimizer dynamics! 🚀

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Goodfire

@goodfireai

5 months ago

LLMs memorize a lot of training data, but memorization is poorly understood. Where does it live inside models? How is it stored? How much is it involved in different tasks? Jack Merullo & Srihita Vatsavaya's new paper examines all of these questions using loss curvature! (1/7)

thumb_up_off_alt796

chat_bubble_outline10

repeat130

shareShare

Andrew Lampinen

@andrewlampinen

5 months ago

Really cool finding (and paper), and makes a ton of sense! Way back in the day when we/others were trying to make sense of memorization vs. generalization in simpler models, we made an argument that generalizing signals will be top eigenvalues due to shared structure, while 1/2

thumb_up_off_alt242

chat_bubble_outline11

repeat17

shareShare

Verna Dankers

@vernadankers

5 months ago

Ready for day 3 of #EMNLP2025 🎉🎉 I've been on the lookout for memorization, unlearning, interp, memory module papers & more, chat w me if these topics fascinate you too😻 Looking forward to more of Suzhou, the conf & my BlackboxNLP keynote Sunday 1.45PM! blackboxnlp.github.io/2025/

thumb_up_off_alt56

chat_bubble_outline0

repeat12

shareShare

Goodfire

@goodfireai

4 months ago

New research: are prompting and activation steering just two sides of the same coin? Eric Bigelow Daniel Wurgaft Ekdeep Singh and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4)

New research: are prompting and activation steering just two sides of the same coin?

<a href="/EricBigelow/">Eric Bigelow</a> <a href="/danielwurgaft/">Daniel Wurgaft</a> <a href="/EkdeepL/">Ekdeep Singh</a> and coauthors argue they are: ICL and steering have formally equivalent effects. (1/4)

thumb_up_off_alt342

chat_bubble_outline9

repeat47

shareShare

Andrei Mircea

@mirandrom

4 months ago

thumb_up_off_alt7,7K

chat_bubble_outline22

repeat643

shareShare

Ekaterina Lobacheva

@katelobacheva

4 months ago

Very cool analysis of pruning vs merging experts in SMoE models! 🔥

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Irina Saparina

@irisaparina

3 months ago

Reasoning models are powerful, but they burn thousands of tokens on potentially wrong interpretations for ambiguous requests! 👉 We teach models to think about intent first and provide all interpretations and answers in a single response via RL with dual reward. 🧵1/6

thumb_up_off_alt34

chat_bubble_outline1

repeat12

shareShare

Vaishnavh Nagarajan

@_vaishnavh

2 months ago

1/ We found that deep sequence models memorize atomic facts "geometrically" -- not as an associative lookup table as often imagined. This opens up practical questions on reasoning/memory/discovery, and also poses a theoretical "memorization puzzle."

thumb_up_off_alt1,1K

chat_bubble_outline58

repeat244

shareShare

Ekaterina Lobacheva

@katelobacheva

2 months ago

Recent paper from our lab on Can LLMs become CAD designers? 🛠️

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Ekaterina Lobacheva

@katelobacheva

2 months ago

Recent paper from our lab on LLMs playing Hangman 🎣

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Chandar Lab

@chandarlab

2 months ago

Excited to share that we have 3 papers accepted at #ICLR2026! 🇧🇷 Our work this year focuses on efficiency and expressivity: deriving theoretical limits for SSMs, achieving linear scaling for reasoning, and modernizing encoder architectures. A summary of our work 👇 🧵

thumb_up_off_alt14

chat_bubble_outline1

repeat11

shareShare

Yizhou Liu

@yizhouliu0

2 months ago

🚨 New Paper Alert: Why LLM training follows a slow power law? ⁉️We find the neural scaling law with time arises intrinsically from softmax and cross-entropy! (1/6) When learning peaked (or low-temperature or low-entropy) distributions like next-token distributions, these

thumb_up_off_alt153

chat_bubble_outline10

repeat27

shareShare