Wenhao Chai (@wenhaocha1) 's Twitter Profile
Wenhao Chai

@wenhaocha1

Incoming CS Ph.D. Student @PrincetonCS. Prev @UW @Stanford @pika_labs @MSFTResearch @UofIllinois @ZJU_China. I work on computer vision, but it's not all I do.

ID: 1483945570595127298

linkhttp://wenhaochai.com calendar_today19-01-2022 23:32:58

493 Tweet

843 Takipçi

1,1K Takip Edilen

Wenhao Chai (@wenhaocha1) 's Twitter Profile Photo

Deep dive into Sink Value in GPT-OSS models! Analyzed 20B (24 layers) and 120B (36 layers) models and found (correct me if I'm wrong) Key Findings: 1. 20B model has larger sink value, 20B: mean=2.45, 120B: mean=1.93, 2. Clear swa/full-attn layer alternation: full-attn layers

Deep dive into Sink Value in GPT-OSS models! 
Analyzed 20B (24 layers) and 120B (36 layers) models and found (correct me if I'm wrong) Key Findings:   
1. 20B model has larger sink value, 20B: mean=2.45, 120B: mean=1.93,
2. Clear swa/full-attn layer alternation: full-attn layers