Ross Wightman (@wightmanr) 's Twitter Profile
Ross Wightman

@wightmanr

Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.

ID: 557902603

linkhttp://rwightman.com/ calendar_today19-04-2012 17:34:53

4,4K Tweet

21,21K Takipçi

1,1K Takip Edilen

PyTorch (@pytorch) 's Twitter Profile Photo

Update from the PyTorch maintainers: 2.7 is out now. 🔹 Support for NVIDIA Blackwell (CUDA 12.8) 🔹 Mega Cache 🔹 torch.compile for Function Modes 🔹 FlexAttention updates 🔹 Intel GPU perf boost 🔗 Blog: hubs.la/Q03jBPSL0 📄 Release notes: hubs.la/Q03jBPlW0 #PyTorch

Update from the PyTorch maintainers: 2.7 is out now.
🔹 Support for NVIDIA Blackwell (CUDA 12.8)
🔹 Mega Cache
🔹 torch.compile for Function Modes
🔹 FlexAttention updates
🔹 Intel GPU perf boost
🔗 Blog: hubs.la/Q03jBPSL0
📄 Release notes: 
hubs.la/Q03jBPlW0
#PyTorch
Ross Wightman (@wightmanr) 's Twitter Profile Photo

This sort of thing is such an own goal for the USA, hard to fathom. Also sucks for anyone caught up in it, to have you life suddenly uprooted for no good reason. But as a Canadian, can't help but be a little hopeful it might lead to some of our best a brightest sticking around or

Ross Wightman (@wightmanr) 's Twitter Profile Photo

I thought I knew PyTorch but found a bug in some recent code today and learned something new... did you know that these two lines are different? One works as I expected, and one is a sneaky bug... x[indices, :seq_len] += pos_embed[:, :seq_len] x[indices,

Ross Wightman (@wightmanr) 's Twitter Profile Photo

I decided to see which LLM would pick up the difference here without too much leading... * Sonnet 3.5 and 3.7 were both wrong, stating that add_ works properly modifying original tensor in-place * 4o, o4-mini-high, and Gemini 2.5 Pro were similar, correct though could have used

Ross Wightman (@wightmanr) 's Twitter Profile Photo

o3 reminds me of a dev I had to fire many many moons ago after two weeks on the job. Signs of talent but so full of himself and unable to admit to any wrong. Would lie to your face that he got the job done and made it 10x faster than you asked. When in reality it was a steaming

Cihang Xie (@cihangxie) 's Twitter Profile Photo

Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧 We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and

Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧

We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and
Pablo Montalvo (@m_olbap) 's Twitter Profile Photo

Had the pleasure of speaking last week at PyTorch Day France about PyTorch 🔥, the ML community, vLLM, and 🤗 Transformers! I’ve pushed my slides to the Hub directly — much easier to share with practitioners 📤.

Had the pleasure of speaking last week at <a href="/PyTorch/">PyTorch</a> Day France about PyTorch 🔥, the ML community, <a href="/vllm_project/">vLLM</a>, and 🤗 Transformers!

I’ve pushed my slides to the Hub directly — much easier to share with practitioners 📤.
Vladimir Iglovikov (@viglovikov) 's Twitter Profile Photo

1️⃣ / 4️⃣ 📊 GitHub Computer Vision Stars - May 2025 Update githublb.vercel.app/computer-vision Key highlights from the top 0.001% packages (1000 out of 100,000,000): 🔹 # 34 transformers by Hugging Face +0 🔹 # 102,OpenCV Live +0 🔹# 143, Stable Diffusion by Stability AI +0

Mike A. Merrill (@mike_a_merrill) 's Twitter Profile Photo

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? 

We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr
Alex Zhang (@a1zhang) 's Twitter Profile Photo

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

Ross Wightman (@wightmanr) 's Twitter Profile Photo

Sometimes o3 + canvas mode really goes off the rails... after the third retry to output all of the code instead of erasing everything but one fn, how about I throw little arabic in your attention mask refactoring?

Sometimes o3 + canvas mode really goes off the rails... after the third retry to output all of the code instead of erasing everything but one fn, how about I throw little arabic in your attention mask refactoring?
Aaron Defazio (@aaron_defazio) 's Twitter Profile Photo

Why do gradients increase near the end of training? Read the paper to find out! We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training. arxiv.org/abs/2506.02285

Why do gradients increase near the end of training? 
Read the paper to find out!
We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training.
arxiv.org/abs/2506.02285
Ludwig Schmidt (@lschmidt3) 's Twitter Profile Photo

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

Very excited to finally release our paper for OpenThoughts!

After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.
Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world!

We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature.

🧵