 
                                Sam Bowyer
@sambowyer__
Bristol ML PhD Student, Compass CDT
ID: 1635998612604821504
https://sambowyer.com/ 15-03-2023 13:37:27
24 Tweet
54 Followers
83 Following
 
         
         
         
        Link: arxiv.org/abs/2503.08264 Code: github.com/alan-ppl/alan This work was a team effort, I'm very grateful for my collaborators Sam Bowyer, and Laurence Aitchison. Thanks also to gavin leech who was involved in the MP-RWS paper.
 
         
         
        (Spotlight) LLM evals are increasingly based on tiny datasets (e.g. AIME), so considering uncertainty is becoming critical. We show approaches based on the CLT don't work, and give Bayesian+frequentist alternatives. (Sam Bowyer Desi R. Ivanova) arxiv.org/abs/2503.01747
 
         
        Thoughts after reading Sam Bowyer 's amazing position paper: Are there more sensible approaches to draw error bar when reporting pass@k than just computing the standard deviation? arxiv.org/abs/2503.01747
 
         
                        