
Mike Lewis
@ml_perception
Llama3 pre-training lead. Partially to blame for things like the Cicero Diplomacy bot, BART, RoBERTa, kNN-LM, top-k sampling & Deal Or No Deal.
ID: 1170214705056452609
07-09-2019 05:58:31
272 Tweet
7,7K Takipçi
233 Takip Edilen













How can we reduce pretraining costs for multi-modal models without sacrificing quality? We study this Q in our new work: arxiv.org/abs/2411.04996 At AI at Meta, We introduce Mixture-of-Transformers (MoT), a sparse architecture with modality-aware sparsity for every non-embedding





Don’t miss this - I’ve worked with Mike (Mike Lewis) very closely at Meta and his talks are super informative and fun.

