Daria Soboleva
@dmsobol
ML Research @Cerebras | Making MoE models work | Creator of SlimPajama | Ex-@Google @Yandex @Cisco.
ID: 1540768587488448512
25-06-2022 18:47:34
268 Tweet
288 Followers
427 Following
We've been building MoE models for years, optimizing load balancing. But we forgot the original premise: expert specialization. And we never properly defined what it even means. Talked about this at TNG Technology Consulting GmbH's Big TechDay during q&a youtube.com/watch?v=zYvGVS…