Sam Bowyer (@sambowyer__) 's Twitter Profile
Sam Bowyer

@sambowyer__

Bristol ML PhD Student, Compass CDT

ID: 1635998612604821504

linkhttps://sambowyer.com/ calendar_today15-03-2023 13:37:27

24 Tweet

54 Followers

83 Following

Edward Milsom (@edward_milsom) 's Twitter Profile Photo

Our paper "Function-Space Learning Rates" is on arXiv! We give an efficient way to estimate the magnitude of changes to NN outputs caused by a particular weight update. We analyse optimiser dynamics in function space, and enable hyperparameter transfer with our scheme FLeRM! 🧵👇

Our paper "Function-Space Learning Rates" is on arXiv! We give an efficient way to estimate the magnitude of changes to NN outputs caused by a particular weight update. We analyse optimiser dynamics in function space, and enable hyperparameter transfer with our scheme FLeRM! 🧵👇
Desi R. Ivanova (@desirivanova) 's Twitter Profile Photo

I’ve been complaining about lack of error bars in LLM papers for some time. Rather than just complaining, here’s a guide on how to do it! ⬇️ We’ve done a small Python lib that you can install… or copy-paste one file into your projects (dependencies are annoying, we get it 🙃)

Thomas Heap (@thomaseheap) 's Twitter Profile Photo

Link: arxiv.org/abs/2503.08264 Code: github.com/alan-ppl/alan This work was a team effort, I'm very grateful for my collaborators Sam Bowyer, and Laurence Aitchison. Thanks also to gavin leech who was involved in the MP-RWS paper.

Sam Bowyer (@sambowyer__) 's Twitter Profile Photo

Really happy to have this paper out on arXiv! Scalable GPU-based Bayesian inference for hierarchical models without requiring gradients wrt model parameters (unlike e.g. VI). arxiv.org/abs/2503.08264

Laurence Aitchison (@laurence_ai) 's Twitter Profile Photo

(Spotlight) LLM evals are increasingly based on tiny datasets (e.g. AIME), so considering uncertainty is becoming critical. We show approaches based on the CLT don't work, and give Bayesian+frequentist alternatives. (Sam Bowyer Desi R. Ivanova) arxiv.org/abs/2503.01747

Ben Anson (@benaibean) 's Twitter Profile Photo

Is it possible to _derive_ an attention scheme with effective zero-shot generalisation? The answer turns out to be yes! To achieve this, we began by thinking about desirable properties for attention over long contexts, and we distilled 2 key conditions:

Is it possible to _derive_ an attention scheme with effective zero-shot generalisation? The answer turns out to be yes! To achieve this, we began by thinking about desirable properties for attention over long contexts, and we distilled 2 key conditions:
Xidulu (@xidulu) 's Twitter Profile Photo

Thoughts after reading Sam Bowyer 's amazing position paper: Are there more sensible approaches to draw error bar when reporting pass@k than just computing the standard deviation? arxiv.org/abs/2503.01747