Nicolas Zucchet (@nicolaszucchet) Twitter Tweets • TwiCopy

Nicolas Zucchet

@nicolaszucchet

+ Follow

PhD student @CSatETH
prev. student researcher @GoogleDeepMind | @Polytechnique

ID: 936865625233817600

linkhttps://nicolaszucchet.github.io calendar_today02-12-2017 07:52:25

140 Tweet

415 Followers

325 Following

K.Ishi@生成AIの産業応用

@k_ishi_ai

8 months ago

Google DeepMindより、LLMの知識獲得プロセスを解明した論文が出た。 LLMの学習初期には知識獲得の停滞期(プラトー期)が存在する。だが実は、この期間に特定の要素に着目し、知識獲得を行う効率的な注意パターンを確立。そして急速な知識獲得を始める。これは幼児の知識獲得プロセスと類似する。

thumb_up_off_alt1,1K

chat_bubble_outline37

repeat253

shareShare

Our new paper sheds light on the process of knowledge acquisition in language models, with implications for - data curricula - the challenges of learning new knowledge when fine-tuning - the emergence of hallucinations. Nicolas did a great job on the project! See his thread👇

thumb_up_off_alt36

chat_bubble_outline1

repeat6

shareShare

Antonio Orvieto

@orvieto_antonio

7 months ago

This is just a reminder for your NeurIPS experiments: if you are comparing architectures, optimizers, or whatever at a single hyperparameter setting (e.g., LR), you are automatically not a scientist. You can be better than this. Produce science, not hype.

thumb_up_off_alt106

chat_bubble_outline2

repeat5

shareShare

Andrew Lampinen

@andrewlampinen

7 months ago

How do language models generalize from information they learn in-context vs. via finetuning? We show that in-context learning can generalize more flexibly, illustrating key differences in the inductive biases of these modes of learning — and ways to improve finetuning. Thread: 1/

thumb_up_off_alt751

chat_bubble_outline7

repeat146

shareShare

Tyler John

@tyler_m_john

7 months ago

I really like this new op ed from David Duvenaud on how so many different kinds of pressures could drive towards loss of human control over AI. It's rare to read anything well written on this topic but this piece was elegant and smart enough that I wanted to keep on reading.

I really like this new op ed from <a href="/DavidDuvenaud/">David Duvenaud</a> on how so many different kinds of pressures could drive towards loss of human control over AI. It's rare to read anything well written on this topic but this piece was elegant and smart enough that I wanted to keep on reading.

thumb_up_off_alt350

chat_bubble_outline19

repeat35

shareShare

Andrew Lampinen

@andrewlampinen

6 months ago

Some nice analysis by Nicolas & Francesco of a clear case of emergence — and how to accelerate its acquisition!

thumb_up_off_alt42

chat_bubble_outline0

repeat3

shareShare

Stephanie Chan

@scychan_brains

6 months ago

Smooth predictable scaling laws are central to our conceptions and forecasts about AI -- but lots of capabilities actually *emerge* in sudden ways. Awesome work by Nicolas Zucchet Francesco D'Angelo bringing more predictability to emergent phenomena, by studying one type: sparse attention

thumb_up_off_alt15

chat_bubble_outline0

repeat3

shareShare

Antonio Orvieto

@orvieto_antonio

6 months ago

We have a new SSM theory paper, just accepted to COLT, revisiting recall properties of linear RNNs. It's surprising how much one can delve into, and how beautiful it can become. With (and only thanks to) the amazing Alexandre and Francis Bach arxiv.org/pdf/2502.09287

thumb_up_off_alt100

chat_bubble_outline2

repeat26

shareShare

Stephanie Chan

@scychan_brains

6 months ago

Emergence in transformers is a real phenomenon! Behaviors and capabilities can appear in models in sudden ways. Emergence is not always just a "mirage". Compiling some examples here (please share any I missed): 🧵

thumb_up_off_alt356

chat_bubble_outline10

repeat41

shareShare

Johannes Oswald

@oswaldjoh

5 months ago

Super happy and proud to share our novel scalable RNN model - the MesaNet! This work builds upon beautiful ideas of 𝗹𝗼𝗰𝗮𝗹𝗹𝘆 𝗼𝗽𝘁𝗶𝗺𝗮𝗹 𝘁𝗲𝘀𝘁-𝘁𝗶𝗺𝗲 𝘁𝗿𝗮𝗶𝗻𝗶𝗻𝗴 (TTT), and combines ideas of in-context learning, test-time training and mesa-optimization.

thumb_up_off_alt377

chat_bubble_outline3

repeat63

shareShare

Nino Scherrer

@ninoscherrer

4 months ago

Super excited to host a student researcher together with Johannes Oswald this year! Please sign up if you wanna have some research fun with us :)

thumb_up_off_alt32

chat_bubble_outline2

repeat3

shareShare

Jeff Dean

@jeffdean

3 months ago

AI efficiency is important. Today, Google is sharing a technical paper detailing our comprehensive methodology for measuring the environmental impact of Gemini inference. We estimate that the median Gemini Apps text prompt uses 0.24 watt-hours of energy (equivalent to watching an

thumb_up_off_alt1,1K

chat_bubble_outline60

repeat359

shareShare

Nicolas Zucchet

K.Ishi@生成AIの産業応用

Soham De

Antonio Orvieto

Andrew Lampinen

Tyler John

Andrew Lampinen

Stephanie Chan

Antonio Orvieto

Stephanie Chan

Johannes Oswald

Nino Scherrer

Jeff Dean