ml_perception : Yes, both the 8B and 70B are trained way more than • TwiCopy

Mike Lewis

@ml_perception

+ Follow

Llama3 pre-training lead. Partially to blame for things like the Cicero Diplomacy bot, BART, RoBERTa, kNN-LM, top-k sampling & Deal Or No Deal.

calendar_today07-09-2019 05:58:31

259 Tweets

6,5K Followers

230 Following

Mike Lewis

@ml_perception

1 month ago

Yes, both the 8B and 70B are trained way more than is Chinchilla optimal - but we can eat the training cost to save you inference cost! One of the most interesting things to me was how quickly the 8B was improving even at 15T tokens.

account_circle