Ross Wightman (@wightmanr) Twitter Tweets • TwiCopy

Ross Wightman

@wightmanr

+ Follow

Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.

ID: 557902603

linkhttp://rwightman.com/ calendar_today19-04-2012 17:34:53

4,4K Tweet

21,21K Takipçi

1,1K Takip Edilen

PyTorch

@pytorch

4 months ago

Update from the PyTorch maintainers: 2.7 is out now. 🔹 Support for NVIDIA Blackwell (CUDA 12.8) 🔹 Mega Cache 🔹 torch.compile for Function Modes 🔹 FlexAttention updates 🔹 Intel GPU perf boost 🔗 Blog: hubs.la/Q03jBPSL0 📄 Release notes: hubs.la/Q03jBPlW0 #PyTorch

thumb_up_off_alt10

chat_bubble_outline0

repeat1

shareShare

Ross Wightman

@wightmanr

4 months ago

This sort of thing is such an own goal for the USA, hard to fathom. Also sucks for anyone caught up in it, to have you life suddenly uprooted for no good reason. But as a Canadian, can't help but be a little hopeful it might lead to some of our best a brightest sticking around or

thumb_up_off_alt14

chat_bubble_outline0

repeat0

shareShare

Ross Wightman

@wightmanr

4 months ago

I thought I knew PyTorch but found a bug in some recent code today and learned something new... did you know that these two lines are different? One works as I expected, and one is a sneaky bug... x[indices, :seq_len] += pos_embed[:, :seq_len] x[indices,

thumb_up_off_alt429

chat_bubble_outline14

repeat26

shareShare

Ross Wightman

@wightmanr

4 months ago

I decided to see which LLM would pick up the difference here without too much leading... * Sonnet 3.5 and 3.7 were both wrong, stating that add_ works properly modifying original tensor in-place * 4o, o4-mini-high, and Gemini 2.5 Pro were similar, correct though could have used

thumb_up_off_alt18

chat_bubble_outline2

repeat1

shareShare

Ross Wightman

@wightmanr

4 months ago

o3 reminds me of a dev I had to fire many many moons ago after two weeks on the job. Signs of talent but so full of himself and unable to admit to any wrong. Would lie to your face that he got the job done and made it 10x faster than you asked. When in reality it was a steaming

thumb_up_off_alt33

chat_bubble_outline3

repeat0

shareShare

Cihang Xie

@cihangxie

4 months ago

Still relying on OpenAI’s CLIP — a model released 4 years ago with limited architecture configurations — for your Multimodal LLMs? 🚧 We’re excited to announce OpenVision: a fully open, cost-effective family of advanced vision encoders that match or surpass OpenAI’s CLIP and

thumb_up_off_alt1,1K

chat_bubble_outline19

repeat193

shareShare

Pablo Montalvo

@m_olbap

4 months ago

Had the pleasure of speaking last week at PyTorch Day France about PyTorch 🔥, the ML community, vLLM, and 🤗 Transformers! I’ve pushed my slides to the Hub directly — much easier to share with practitioners 📤.

Had the pleasure of speaking last week at <a href="/PyTorch/">PyTorch</a> Day France about PyTorch 🔥, the ML community, <a href="/vllm_project/">vLLM</a>, and 🤗 Transformers!

I’ve pushed my slides to the Hub directly — much easier to share with practitioners 📤.

thumb_up_off_alt22

chat_bubble_outline1

repeat4

shareShare

Vladimir Iglovikov

@viglovikov

4 months ago

1️⃣ / 4️⃣ 📊 GitHub Computer Vision Stars - May 2025 Update githublb.vercel.app/computer-vision Key highlights from the top 0.001% packages (1000 out of 100,000,000): 🔹 # 34 transformers by Hugging Face +0 🔹 # 102,OpenCV Live +0 🔹# 143, Stable Diffusion by Stability AI +0

thumb_up_off_alt5

chat_bubble_outline3

repeat2

shareShare

Mike A. Merrill

@mike_a_merrill

3 months ago

Many agents (Claude Code, Codex CLI) interact with the terminal to do valuable tasks, but do they currently work well enough to deploy en masse? We’re excited to introduce Terminal-Bench: An evaluation environment and benchmark for AI agents on real-world terminal tasks. Tl;dr

thumb_up_off_alt220

chat_bubble_outline14

repeat57

shareShare

Alex Zhang

@a1zhang

3 months ago

Can GPT, Claude, and Gemini play video games like Zelda, Civ, and Doom II? 𝗩𝗶𝗱𝗲𝗼𝗚𝗮𝗺𝗲𝗕𝗲𝗻𝗰𝗵 evaluates VLMs on Game Boy & MS-DOS games given only raw screen input, just like how a human would play. The best model (Gemini) completes just 0.48% of the benchmark! 🧵👇

thumb_up_off_alt518

chat_bubble_outline23

repeat71

shareShare

Ross Wightman

@wightmanr

3 months ago

Sometimes o3 + canvas mode really goes off the rails... after the third retry to output all of the code instead of erasing everything but one fn, how about I throw little arabic in your attention mask refactoring?

thumb_up_off_alt17

chat_bubble_outline3

repeat0

shareShare

Aaron Defazio

@aaron_defazio

3 months ago

Why do gradients increase near the end of training? Read the paper to find out! We also propose a simple fix to AdamW that keeps gradient norms better behaved throughout training. arxiv.org/abs/2506.02285

thumb_up_off_alt492

chat_bubble_outline12

repeat62

shareShare

Ludwig Schmidt

@lschmidt3

3 months ago

Very excited to finally release our paper for OpenThoughts! After DataComp and DCLM, this is the third large open dataset my group has been building in collaboration with the DataComp community. This time, the focus is on post-training, specifically reasoning data.

thumb_up_off_alt1,1K

chat_bubble_outline20

repeat208

shareShare

Yu Su @#ICLR2025

@ysu_nlp

3 months ago

📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world! We trained a foundation model on 214M images of ~1M species (50% of named species on Earth 🐨🐠🌻🦠) and found emergent properties capturing hidden regularities in nature. 🧵

thumb_up_off_alt268

chat_bubble_outline5

repeat57

shareShare