Daniel Han (@danielhanchen) 's Twitter Profile
Daniel Han

@danielhanchen

Building @UnslothAI. Finetune train LLMs faster. LLMs bug hunter. OSS package github.com/unslothai/unsl…. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.

ID: 717359704226172928

linkhttps://unsloth.ai/ calendar_today05-04-2016 14:34:16

2,2K Tweet

23,23K Takipçi

1,1K Takip Edilen

Daniel Han (@danielhanchen) 's Twitter Profile Photo

OpenAI's OSS model possible breakdown: 1. 120B MoE 5B active + 20B text only 2. Trained with Float4 maybe Blackwell chips 3. SwiGLU clip (-7,7) like ReLU6 4. 128K context via YaRN from 4K 5. Sliding window 128 + attention sinks 6. Llama/Mixtral arch + biases Details: 1. 120B MoE

OpenAI's OSS model possible breakdown:
1. 120B MoE 5B active + 20B text only
2. Trained with Float4 maybe Blackwell chips
3. SwiGLU clip (-7,7) like ReLU6
4. 128K context via YaRN from 4K
5. Sliding window 128 + attention sinks
6. Llama/Mixtral arch + biases

Details:
1. 120B MoE