@danielhanchen : Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes. 1. First reported by @bnjmn_marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match. 2. We reproed the issue, and further investigation • TwiCopy

Daniel Han

@danielhanchen

+ Follow

Building @UnslothAI. Finetune train LLMs faster. LLMs bug hunter. OSS package github.com/unslothai/unsl…. YC S24. Prev ML at NVIDIA. Hyperlearn used by NASA.

ID: 717359704226172928

linkhttps://unsloth.ai/ calendar_today05-04-2016 14:34:16

2,2K Tweet

23,23K Takipçi

1,1K Takip Edilen

Daniel Han

@danielhanchen

a year ago

Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes. 1. First reported by Benjamin Marie, GA is supposed to be mathematically equivalent to full batch training, but losses did not match. 2. We reproed the issue, and further investigation

Fixed a bug which caused all training losses to diverge for large gradient accumulation sizes.

1. First reported by <a href="/bnjmn_marie/">Benjamin Marie</a>, GA is supposed to be mathematically equivalent to full batch training, but losses did not match.
2. We reproed the issue, and further investigation

thumb_up_off_alt758

chat_bubble_outline21

repeat133

shareShare