Tamay Besiroglu(@tamaybes) 's Twitter Profileg
Tamay Besiroglu

@tamaybes

Thinking about economics, computing and machine learning @EpochAIResearch @MIT_CSAIL

ID:995052639602839552

linkhttps://besiroglu.github.io/webpage/ calendar_today11-05-2018 21:26:51

1,0K Tweets

3,0K Followers

720 Following

Tom Adamczewski(@tmkadamcz) 's Twitter Profile Photo

I've been working on a new product: MakeDistribution.com!

It's the best solution to creating a subjective probability distribution, i.e. one that reflects human judgement rather than being fit to data.

Sounds simple, but most existing solutions have important flaws (1/15)

account_circle
Maria de la Lama(@merilalama) 's Twitter Profile Photo

Come work with me! You might be a great fit for this role if you have an operations mindset, strong communications skills, are service-minded and organized, and care deeply about Epochโ€™s mission.

account_circle
Tamay Besiroglu(@tamaybes) 's Twitter Profile Photo

Cool to see our replication of Chinchilla amongst the top ML papers of the week in what was a packed week for AI.

account_circle
Tamay Besiroglu(@tamaybes) 's Twitter Profile Photo

This is how you would train compute-optimal models to match Llama3 using our updated Chinchilla scaling law.

Models are clearly being overtrained 5x to 10x further than is *pre-training* compute optimal.

account_circle
Tamay Besiroglu(@tamaybes) 's Twitter Profile Photo

I'm thrilled to see that our work has apparently unified the Chinchilla scaling laws. It's great to hear that they're making the data open source!

account_circle
Mathieu Acher(@acherm) 's Twitter Profile Photo

I don't know and can't assess the impact of the results on the topic of scaling laws, but the reproducibility effort is remarkable! We need more work like this, in many fields of CS.

account_circle
Matthew Barnett(@MatthewJBar) 's Twitter Profile Photo

tl;dr: the parametric Chinchilla scaling law appears to have been poorly fit, undermining any analysis that relied on its exact fitted values. We fit the same scaling law to a reconstruction of their data, getting different and IMO better results.

tl;dr: the parametric Chinchilla scaling law appears to have been poorly fit, undermining any analysis that relied on its exact fitted values. We fit the same scaling law to a reconstruction of their data, getting different and IMO better results.
account_circle
Gabriele Sarti(@gsarti_) 's Twitter Profile Photo

Exhibit #49864 on the absurd lengths reproducibility studies must go in the era of proprietary LLMs ๐Ÿ˜ญ

account_circle