driss guessous (@drisspg) Twitter Tweets • TwiCopy

driss guessous

3 months ago

Are there any good papers exploring precision schedules for training? Something like higher precision initially, than fp8 or lower during the meat of training and then maybe back up for the end? or maybe even lower precision for the end?

thumb_up_off_alt26

chat_bubble_outline3

repeat2

shareShare

PyTorch

@pytorch

3 months ago

Training at 2K scale on Crusoe B200 GPUs shows 1.22x–1.28x acceleration using TorchAO’s MXFP8 implementation on TorchTitan, with equivalent convergence to BF16. 🔗 Read our latest blog to learn more: hubs.la/Q03GHFnc0 #PyTorch #OpenSourceAI #AIInfrastructure

thumb_up_off_alt107

chat_bubble_outline3

repeat14

shareShare

driss guessous

@drisspg

3 months ago

Having claude code iterate on slash commands w/ me feels so wrong

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

driss guessous

@drisspg

3 months ago

:)

thumb_up_off_alt8

chat_bubble_outline0

repeat0

shareShare

driss guessous

@drisspg

3 months ago

Google utilizing tanh softcapping for its models perfectly encapsulates the differences between gpus and tpus

thumb_up_off_alt129

chat_bubble_outline10

repeat5

shareShare

driss guessous

@drisspg

3 months ago

I caved and purchased chatgpt plus for the month.. ehh I think I am still team orange. Fits my style of plan mode, iterate iterate iterate -> implement

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

driss guessous

@drisspg

2 months ago

blog.demofox.org/2017/11/21/flo… A very good table

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

driss guessous

@drisspg

2 months ago

github.com/pytorch/pytorc… I tried my best to explain this here

thumb_up_off_alt46

chat_bubble_outline0

repeat1

shareShare

driss guessous

@drisspg

2 months ago

I actually still really like deepwiki and am dumbfounded that GitHub hasn’t bear hugged this product space

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

driss guessous

@drisspg

2 months ago

github.com/pytorch/pytorc… Major W for claude. Could I have bisected this env difference? Yes. Did I give it a good starting point to work from? Yes. But still going dangerously skip and sitting back was alot easier on a Friday

thumb_up_off_alt18

chat_bubble_outline0

repeat1

shareShare

driss guessous

@drisspg

2 months ago

"I can't directly modify the formatting of the cells in your document." WTF IS YOUR PURPOSE THEN...

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

driss guessous

@drisspg

2 months ago

Someone make my day :) github.com/meta-pytorch/a…

thumb_up_off_alt12

chat_bubble_outline1

repeat0

shareShare

driss guessous

@drisspg

2 months ago

Claude is my worker Codex is my explorer GPT5 is my scholar

thumb_up_off_alt6

chat_bubble_outline1

repeat0

shareShare

driss guessous

@drisspg

2 months ago

github.com/microsoft/vsco… please give me more tab completion options. I dont like cursor but I want their tab model

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Jared Palmer

@jaredpalmer

a month ago

RE: Stacked Diffs on @GitHub After discussion w Taylor Blau, we can implement stacked PRs/PR groups already (in fact we kind of do with Copilot) but restacking (automatically fanning out changes from the bottom of the the stack upwards) would be wildly inefficient. To do it

thumb_up_off_alt1,1K

chat_bubble_outline70

repeat70

shareShare

driss guessous

@drisspg

a month ago

Actually, if it’s not based off of commits I’m a little bearish

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

stochasm

@stochasticchasm

a month ago

UserWarning: Please use the new API settings to control TF32 behavior, such as torch.backends.cudnn.conv.fp32_precision = 'tf32' or torch.backends.cuda.matmul.fp32_precision = 'ieee'.

thumb_up_off_alt27

chat_bubble_outline1

repeat1

shareShare

driss guessous

@drisspg

a month ago

The San Francisco billboards are the strongest evidence I have seen for AI being a bubble

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare