profile-img
Mike Lewis

@ml_perception

Llama3 pre-training lead. Partially to blame for things like the Cicero Diplomacy bot, BART, RoBERTa, kNN-LM, top-k sampling & Deal Or No Deal.

calendar_today07-09-2019 05:58:31

259 Tweets

6,5K Followers

230 Following

Mike Lewis(@ml_perception) 's Twitter Profile Photo

Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.

account_circle