
Daniel Han
@danielhanchen
Building @UnslothAI. Finetune train LLMs faster. LLMs bug hunter. OSS package github.com/unslothai/unsl…. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.
ID: 717359704226172928
https://unsloth.ai/ 05-04-2016 14:34:16
2,2K Tweet
23,23K Takipçi
1,1K Takip Edilen

Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes. 1. First reported by Benjamin Marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match. 2. We reproed the issue, and further investigation
