Aiden Chaoyang He (@aidenchaoyanghe) 's Twitter Profile
Aiden Chaoyang He

@aidenchaoyanghe

Co-founder at @TensorOpera and @ChainOpera_AI, ex-@AWS, @Meta, @Google, and Tencent, CS/AI PhD graduated from @USC, at LA. My Tweets in Chinese:@ChaoyangHeAiden

ID: 16977264

calendar_today26-10-2008 05:44:40

871 Tweet

832 Takipçi

1,1K Takip Edilen

Aiden Chaoyang He (@aidenchaoyanghe) 's Twitter Profile Photo

#llama Llama 4 brings back memories of my days training MoE models at AWS. My takeaways with diagrams: ** Architecture Design ** 1. Hybrid MoE: MoE layers use a number of routed experts and a shared expert. Each token is sent to the shared expert and also to one of the routed

#llama Llama 4 brings back memories of my days training MoE models at AWS. My takeaways with diagrams:

** Architecture Design **

1. Hybrid MoE: MoE layers use a number of routed experts and a shared expert. Each token is sent to the shared expert and also to one of the routed