profile-img
Andrej Karpathy

@karpathy

πŸ§‘β€πŸ³. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets πŸ§ πŸ€–πŸ’₯

calendar_today21-04-2009 06:49:15

8,7K Tweets

982,1K Followers

905 Following

Andrej Karpathy(@karpathy) 's Twitter Profile Photo

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower πŸ“ˆ

The biggest improvements were:
- turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a…

A few new CUDA hacker friends joined the effort and now llm.c is only 2X slower than PyTorch (fp32, forward pass) compared to 4 days ago, when it was at 4.2X slower πŸ“ˆ The biggest improvements were: - turn on TF32 (NVIDIA TensorFLoat-32) instead of FP32 for matmuls. This is a…
account_circle