Evan Miller (@evmill) 's Twitter Profile
Evan Miller

@evmill

Statistically inclined software developer, occasional blogger about math + stats stuff. Working on evals @AnthropicAI

ID: 42923034

linkhttps://www.evanmiller.org/ calendar_today27-05-2009 16:55:16

1,1K Tweet

5,5K Takipçi

197 Takip Edilen

Jonathan Whitaker (@johnowhitaker) 's Twitter Profile Photo

New blog post: datasciencecastnet.home.blog/2023/08/04/exp… I've had fun joining in the community effort to investigate Evan Miller's claims about softmax1 as a quantization-friendly modification to attention. Seems promising! But to me, the most exciting thing is watching open science in action :)

Faris Sbahi 🏴‍☠️ (@farissbahi) 's Twitter Profile Photo

Controlling language models has a long way to go - and clever techniques - involving Finite State Machines - offer a way to eliminate hallucinations at record-setting speeds. New work by Rémi 📎 Phoebe Klett Dan // Normal Computing 🧠🌡️ blog.normalcomputing.ai/posts/2023-07-…

Evan Miller (@evmill) 's Twitter Profile Photo

Softmax1, Week 2. Second set of empirical results are in, and they are… 🌸 promising 🌸 Weight kurtosis is roughly the same – but activation kurtosis improved 30X (!!) and maximum activation magnitude reduced 15X (!). Read more from Jonathan Whitaker: datasciencecastnet.home.blog/2023/08/04/exp…

Evan Miller (@evmill) 's Twitter Profile Photo

Softmax1 update… We now have support for ⚡️Flash Attention ⚡️ This lets us test much larger models than before! To get the code, just pip install flash-attention-softmax-n Or clone / star the GitHub repo here: github.com/softmax1/Flash… All credit / kudos to Chris Murphy.

Panagiota Papakonstantinou (@ppapakonnucl) 's Twitter Profile Photo

Kurt Vonnegut's 1969 address to the American Physical Society American Physical Society --on the innocence of the "old-fashioned scientist" and its loss after World War II. For physicists, artists, and other humans. I have transcribed it in its entirety as a google doc: docs.google.com/document/d/1Mn…

Thomas Capelle (@capetorch) 's Twitter Profile Photo

Following Evan Miller great blog post on encountered issues on the GPT-like models training that appear to be related to the SoftMax function, I wrote this small piece mostly to understand what was going on. wandb.me/tinyllama

Astatide (@astatide42) 's Twitter Profile Photo

Results of my latest nerdsnipe from Tetraspace 💎! The plot below shows the predicted shape of the water flow, with a model taking into account gravity and surface tension. It looks just like the real thing! Conclusion: yep, it's surface tension details below 😁

Results of my latest nerdsnipe from <a href="/TetraspaceWest/">Tetraspace 💎</a>!

The plot below shows the predicted shape of the water flow, with a model taking into account gravity and surface tension. It looks just like the real thing!

Conclusion: yep, it's surface tension

details below 😁
Georgi Gerganov (@ggerganov) 's Twitter Profile Photo

Have a few thoughts about this approach But most importantly, I'm happy to see Evan Miller's idea on softmax1 recognized - to my very basic and intuitive understanding of LLMs, it made enough sense to warrant further analysis arxiv.org/abs/2309.17453

Nat Friedman (@natfriedman) 's Twitter Profile Photo

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD. Today we are overjoyed to announce that our crazy project has succeeded. After 2000

Ten months ago, we launched the Vesuvius Challenge to solve the ancient problem of the Herculaneum Papyri, a library of scrolls that were flash-fried by the eruption of Mount Vesuvius in 79 AD.

Today we are overjoyed to announce that our crazy project has succeeded. After 2000
Evan Miller (@evmill) 's Twitter Profile Photo

I think I've finally cracked quantiles… A/B testing medians, instead of means, usually requires an expensive bootstrap. But we can use a likelihood-ratio test (Wilks' theorem) instead. This reduces the quantile problem to a few simple formulas. Read on! arxiv.org/abs/2401.10233

Marius Hobbhahn (@mariushobbhahn) 's Twitter Profile Photo

This paper on the statistics of evals is great (and seems to be flying under the radar): arxiv.org/abs/2411.00640… The author basically shows all the relevant statistical tools needed for evals, e.g. how to do compute the right error bars, how to compare model performance, and how

Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Adding Error Bars to Evals. AI model evaluations don’t usually include statistics or uncertainty. We think they should. Read the blog post here: anthropic.com/research/stati…

Anthropic (@anthropicai) 's Twitter Profile Photo

We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time. Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.

We’re starting a Fellows program to help engineers and researchers transition into doing frontier AI safety research full-time.

Beginning in March 2025, we'll provide funding, compute, and research mentorship to 10–15 Fellows with strong coding and technical backgrounds.
Klaviyo (@klaviyo) 's Twitter Profile Photo

🚀 New on the Klaviyo Data Science Podcast: Evan Miller joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations. AI metrics are everywhere—but how much uncertainty is behind them? Understanding variability matters. Listen now:

🚀 New on the Klaviyo Data Science Podcast: <a href="/EvMill/">Evan Miller</a> joins us to discuss his paper, Adding Error Bars to Evals: A Statistical Approach to Language Model Evaluations. 

AI metrics are everywhere—but how much uncertainty is behind them? Understanding variability matters. Listen now: