Evan Miller (@evmill) Twitter Tweets • TwiCopy

Jonathan Whitaker

2 years ago

New blog post: datasciencecastnet.home.blog/2023/08/04/exp… I've had fun joining in the community effort to investigate Evan Miller's claims about softmax1 as a quantization-friendly modification to attention. Seems promising! But to me, the most exciting thing is watching open science in action :)

thumb_up_off_alt56

chat_bubble_outline5

repeat12

shareShare

Faris Sbahi 🏴‍☠️

@farissbahi

2 years ago

Controlling language models has a long way to go - and clever techniques - involving Finite State Machines - offer a way to eliminate hallucinations at record-setting speeds. New work by Rémi 📎 Phoebe Klett Dan // Normal Computing 🧠🌡️ blog.normalcomputing.ai/posts/2023-07-…

thumb_up_off_alt119

chat_bubble_outline5

repeat21

shareShare

Evan Miller

@evmill

2 years ago

Softmax1, Week 2. Second set of empirical results are in, and they are… 🌸 promising 🌸 Weight kurtosis is roughly the same – but activation kurtosis improved 30X (!!) and maximum activation magnitude reduced 15X (!). Read more from Jonathan Whitaker: datasciencecastnet.home.blog/2023/08/04/exp…

thumb_up_off_alt106

chat_bubble_outline1

repeat17

shareShare

Evan Miller

@evmill

2 years ago

Softmax1 update… We now have support for ⚡️Flash Attention ⚡️ This lets us test much larger models than before! To get the code, just pip install flash-attention-softmax-n Or clone / star the GitHub repo here: github.com/softmax1/Flash… All credit / kudos to Chris Murphy.

thumb_up_off_alt52

chat_bubble_outline2

repeat12

shareShare

Panagiota Papakonstantinou

@ppapakonnucl

2 years ago

Kurt Vonnegut's 1969 address to the American Physical Society American Physical Society --on the innocence of the "old-fashioned scientist" and its loss after World War II. For physicists, artists, and other humans. I have transcribed it in its entirety as a google doc: docs.google.com/document/d/1Mn…

thumb_up_off_alt68

chat_bubble_outline5

repeat18

shareShare

Thomas Capelle

@capetorch

2 years ago

Following Evan Miller great blog post on encountered issues on the GPT-like models training that appear to be related to the SoftMax function, I wrote this small piece mostly to understand what was going on. wandb.me/tinyllama

thumb_up_off_alt47

chat_bubble_outline2

repeat10

shareShare

Astatide

@astatide42

2 years ago

Results of my latest nerdsnipe from Tetraspace 💎! The plot below shows the predicted shape of the water flow, with a model taking into account gravity and surface tension. It looks just like the real thing! Conclusion: yep, it's surface tension details below 😁

Results of my latest nerdsnipe from <a href="/TetraspaceWest/">Tetraspace 💎</a>!

The plot below shows the predicted shape of the water flow, with a model taking into account gravity and surface tension. It looks just like the real thing!

Conclusion: yep, it's surface tension

details below 😁

thumb_up_off_alt36

chat_bubble_outline3

repeat4

shareShare

Evan Miller

@evmill

2 years ago

👀

thumb_up_off_alt13

chat_bubble_outline1

repeat0

shareShare

Georgi Gerganov

@ggerganov

2 years ago

Have a few thoughts about this approach But most importantly, I'm happy to see Evan Miller's idea on softmax1 recognized - to my very basic and intuitive understanding of LLMs, it made enough sense to warrant further analysis arxiv.org/abs/2309.17453

thumb_up_off_alt162

chat_bubble_outline4

repeat16

shareShare

Beidi Chen

@beidichen

2 years ago

Georgi Gerganov Evan Miller The blog about Softmax+1 plays a very important role when we were trying to identify the root cause of the sink Guangxuan Xiao can comment more!

thumb_up_off_alt15

chat_bubble_outline0

repeat1

shareShare

Nat Friedman

@natfriedman

2 years ago

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000

thumb_up_off_alt65,65K

chat_bubble_outline2,2K

repeat15,15K

shareShare

Evan Miller

@evmill

a year ago

I think I've finally cracked quantiles… A/B testing medians, instead of means, usually requires an expensive bootstrap. But we can use a likelihood-ratio test (Wilks' theorem) instead. This reduces the quantile problem to a few simple formulas. Read on! arxiv.org/abs/2401.10233

thumb_up_off_alt17

chat_bubble_outline0

repeat1

shareShare

Evan Miller

@evmill

a year ago

New sequential A/B test from Zalando based on the Lévy inequality – check it out! arxiv.org/abs/2406.16523…

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Dario Amodei

@darioamodei

a year ago

Machines of Loving Grace: my essay on how AI could transform the world for the better darioamodei.com/machines-of-lo…

thumb_up_off_alt5,5K

chat_bubble_outline0

repeat1,1K

shareShare

Marius Hobbhahn

@mariushobbhahn

a year ago

This paper on the statistics of evals is great (and seems to be flying under the radar): arxiv.org/abs/2411.00640… The author basically shows all the relevant statistical tools needed for evals, e.g. how to do compute the right error bars, how to compare model performance, and how

thumb_up_off_alt218

chat_bubble_outline7

repeat37

shareShare

Anthropic

@anthropicai

a year ago

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…

thumb_up_off_alt2,2K

chat_bubble_outline51

repeat309

shareShare

Ethan Mollick

@emollick

a year ago

I cannot agree with this more. Please use basic research methods on AI benchmarking!

thumb_up_off_alt232

chat_bubble_outline4

repeat30

shareShare

Jeremy Fox 🦊

@jeremydanielfox

a year ago

Awesome new research by my friend and colleague Evan Miller — adding error bars to evals! Always great to see the Central Limit Theorem!

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Anthropic

@anthropicai

a year ago

We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.

thumb_up_off_alt2,2K

chat_bubble_outline73

repeat304

shareShare

Klaviyo

@klaviyo

10 months ago

🚀 New on the Klaviyo Data Science Podcast: Evan Miller joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations. AI metrics are everywhere—but how much uncertainty is behind them? Understanding variability matters. Listen now:

🚀 New on the Klaviyo Data Science Podcast: <a href="/EvMill/">Evan Miller</a> joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations.

AI metrics are everywhere—but how much uncertainty is behind them? Understanding variability matters. Listen now:

thumb_up_off_alt7

chat_bubble_outline0

repeat4

shareShare