Nicolas Zucchet (@nicolaszucchet) 's Twitter Profile
Nicolas Zucchet

@nicolaszucchet

PhD student @CSatETH
prev. student researcher @GoogleDeepMind | @Polytechnique

ID: 936865625233817600

linkhttps://nicolaszucchet.github.io calendar_today02-12-2017 07:52:25

140 Tweet

415 Followers

325 Following

K.Ishi@生成AIの産業応用 (@k_ishi_ai) 's Twitter Profile Photo

Google DeepMindより、LLMの知識獲得プロセスを解明した論文が出た。 LLMの学習初期には知識獲得の停滞期(プラトー期)が存在する。 だが実は、この期間に特定の要素に着目し、知識獲得を行う効率的な注意パターンを確立。そして急速な知識獲得を始める。 これは幼児の知識獲得プロセスと類似する。

Google DeepMindより、LLMの知識獲得プロセスを解明した論文が出た。

LLMの学習初期には知識獲得の停滞期(プラトー期)が存在する。

だが実は、この期間に特定の要素に着目し、知識獲得を行う効率的な注意パターンを確立。そして急速な知識獲得を始める。

これは幼児の知識獲得プロセスと類似する。
Soham De (@sohamde_) 's Twitter Profile Photo

Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his thread👇

Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

This is just a reminder for your NeurIPS experiments: if you are comparing architectures, optimizers, or whatever at a single hyperparameter setting (e.g., LR), you are automatically not a scientist. You can be better than this. Produce science, not hype.

Andrew Lampinen (@andrewlampinen) 's Twitter Profile Photo

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/
Tyler John (@tyler_m_john) 's Twitter Profile Photo

I really like this new op ed from David Duvenaud on how so many different kinds of pressures could drive towards loss of human control over AI. It's rare to read anything well written on this topic but this piece was elegant and smart enough that I wanted to keep on reading.

I really like this new op ed from <a href="/DavidDuvenaud/">David Duvenaud</a> on how so many different kinds of pressures could drive towards loss of human control over AI. It's rare to read anything well written on this topic but this piece was elegant and smart enough that I wanted to keep on reading.
Stephanie Chan (@scychan_brains) 's Twitter Profile Photo

Smooth predictable scaling laws are central to our conceptions and forecasts about AI -- but lots of capabilities actually *emerge* in sudden ways. Awesome work by Nicolas Zucchet Francesco D'Angelo bringing more predictability to emergent phenomena, by studying one type: sparse attention

Antonio Orvieto (@orvieto_antonio) 's Twitter Profile Photo

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. 

It's surprising how much one can delve into, and how beautiful it can become.

With (and only thanks to) the amazing Alexandre and <a href="/BachFrancis/">Francis Bach</a> 

arxiv.org/pdf/2502.09287
Stephanie Chan (@scychan_brains) 's Twitter Profile Photo

Emergence in transformers is a real phenomenon! Behaviors and capabilities can appear in models in sudden ways. Emergence is not always just a "mirage". Compiling some examples here (please share any I missed): 🧵

Johannes Oswald (@oswaldjoh) 's Twitter Profile Photo

Super happy and proud to share our novel scalable RNN model - the MesaNet! This work builds upon beautiful ideas of 𝗹𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (TTT), and combines ideas of in-context learning, test-time training and mesa-optimization.

Super happy and proud to share our novel scalable RNN model - the MesaNet! 

This work builds upon beautiful ideas of 𝗹𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (TTT), and combines ideas of in-context learning, test-time training and mesa-optimization.
Jeff Dean (@jeffdean) 's Twitter Profile Photo

AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an

AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an