Daniel Han (@danielhanchen) 's Twitter Profile
Daniel Han

@danielhanchen

Building @UnslothAI. Finetune train LLMs faster. LLMs bug hunter. OSS package github.com/unslothai/unsl…. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.

ID: 717359704226172928

linkhttps://unsloth.ai/ calendar_today05-04-2016 14:34:16

2,2K Tweet

23,23K Takipçi

1,1K Takip Edilen

Daniel Han (@danielhanchen) 's Twitter Profile Photo

Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes. 1. First reported by Benjamin Marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match. 2. We reproed the issue, and further investigation

Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes.

1. First reported by <a href="/bnjmn_marie/">Benjamin Marie</a>, GA is supposed to be mathematically equivalent to full batch training, but losses did not match.
2. We reproed the issue, and further investigation