driss guessous (@drisspg) 's Twitter Profile
driss guessous

@drisspg

bytes and nuggets @pytorch

ID: 1734997526778970112

linkhttps://github.com/drisspg calendar_today13-12-2023 18:03:55

87 Tweet

441 Takipçi

151 Takip Edilen

driss guessous (@drisspg) 's Twitter Profile Photo

Are there any good papers exploring precision schedules for training? Something like higher precision initially, than fp8 or lower during the meat of training and then maybe back up for the end? or maybe even lower precision for the end?

PyTorch (@pytorch) 's Twitter Profile Photo

Training at 2K scale on Crusoe B200 GPUs shows 1.22x–1.28x acceleration using TorchAO’s MXFP8 implementation on TorchTitan, with equivalent convergence to BF16. 🔗 Read our latest blog to learn more: hubs.la/Q03GHFnc0 #PyTorch #OpenSourceAI #AIInfrastructure

Training at 2K scale on Crusoe B200 GPUs shows 1.22x–1.28x acceleration using TorchAO’s MXFP8 implementation on TorchTitan, with equivalent convergence to BF16.

🔗 Read our latest blog to learn more: hubs.la/Q03GHFnc0

#PyTorch #OpenSourceAI #AIInfrastructure
driss guessous (@drisspg) 's Twitter Profile Photo

I caved and purchased chatgpt plus for the month.. ehh I think I am still team orange. Fits my style of plan mode, iterate iterate iterate -> implement

driss guessous (@drisspg) 's Twitter Profile Photo

github.com/pytorch/pytorc… Major W for claude. Could I have bisected this env difference? Yes. Did I give it a good starting point to work from? Yes. But still going dangerously skip and sitting back was alot easier on a Friday

Jared Palmer (@jaredpalmer) 's Twitter Profile Photo

RE: Stacked Diffs on @GitHub After discussion w Taylor Blau, we can implement stacked PRs/PR groups already (in fact we kind of do with Copilot) but restacking (automatically fanning out changes from the bottom of the the stack upwards) would be wildly inefficient. To do it

stochasm (@stochasticchasm) 's Twitter Profile Photo

UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'.