Benjamin F Spector (@bfspector) Twitter Tweets • TwiCopy

Benjamin F Spector

@bfspector

+ Follow

stanford cs phd student. i make ml go brr.

ID: 1313966496549351425

linkhttp://benjaminfspector.com calendar_today07-10-2020 22:16:48

87 Tweet

2,2K Takipçi

130 Takip Edilen

Dan Fu

@realdanfu

10 months ago

A little pre-GTC present for everyone... new Blackwell kernels, all written in ThunderKittens! ⚡️🐱 BF16 & FP8 GEMMs, attention forwards & backwards - fast (competitive with cuDNN and cuBLAS) and open-source! w/ Benjamin F Spector Aaryan Singhal hazyresearch Together AI 1/

thumb_up_off_alt91

chat_bubble_outline2

repeat13

shareShare

Tanishq Kumar

@tanishqkumar07

9 months ago

trained a nanoGPT? feeling behind before o4-mini? 🚨🚨i'm open-sourcing beyond-nanoGPT, an internal codebase to help people go from LLM basics to research-level understanding. 🚨🚨 it contains thousands of lines of from-scratch, annotated pytorch implementing advanced

thumb_up_off_alt318

chat_bubble_outline6

repeat48

shareShare

Jonathan Jacobi

@j0nathanj

8 months ago

Introducing Multiverse: the first AI-generated multiplayer game. Multiplayer was the missing piece in AI-generated worlds — now it’s here. Players can interact and shape a shared AI-simulated world, in real-time. Training and research cost < $1.5K. Run it on your own PC. We

thumb_up_off_alt1,1K

chat_bubble_outline79

repeat193

shareShare

Jordan Juravsky

@jordanjuravsky

7 months ago

We wrote a megakernel! Excited to share how we fused Llama-1B into a single kernel to reach SOTA latency. Check out our blog post and code below!

thumb_up_off_alt64

chat_bubble_outline3

repeat9

shareShare

ollama

@ollama

7 months ago

3 months ago, Stanford's Hazy Research lab introduced Minions, a project that connects Ollama to frontier cloud models to reduce cloud costs by 5-30x while achieving 98% of frontier model accuracy. Secure Minion turns an H100 into a secure enclave, where all memory and

thumb_up_off_alt1,1K

chat_bubble_outline21

repeat171

shareShare

Jordan Juravsky

@jordanjuravsky

7 months ago

Happy Throughput Thursday! We’re excited to release Tokasaurus: an LLM inference engine designed from the ground up for high-throughput workloads with large and small models. (Joint work with Ayush Chakravarthy, Ryan Ehrlich, Sabri Eyuboglu, Bradley Brown, Joseph Shetaye,

thumb_up_off_alt168

chat_bubble_outline3

repeat38

shareShare

Jerry Liu

@jerrywliu

6 months ago

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

thumb_up_off_alt579

chat_bubble_outline12

repeat109

shareShare

typedfemale

@typedfemale

6 months ago

presenting: big jeff's trainium hell

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat173

shareShare

Decart

@decartai

6 months ago

Introducing MirageLSD: The First Live-Stream Diffusion (LSD) AI Model Input any video stream, from a camera or video chat to a computer screen or game, and transform it into any world you desire, in real-time (<40ms latency). Here’s how it works (w/ demo you can use!):

thumb_up_off_alt1,1K

chat_bubble_outline108

repeat333

shareShare

Anna Monaco

@annarmonaco

5 months ago

Paradigm is the AI-native spreadsheet to eliminate menial work. Thousands of users have saved 10,000+ hours with Paradigm, and you can be next. Get your first month free today, then plans start at just $20/month.

thumb_up_off_alt1,1K

chat_bubble_outline197

repeat151

shareShare

Stuart Sul

@stuart_sul

5 months ago

MoE layers can be really slow. When training our coding models Cursor, they ate up 27–53% of training time. So we completely rebuilt it at the kernel level and transitioned to MXFP8. The result: 3.5x faster MoE layer and 1.5x end-to-end training speedup. We believe our

MoE layers can be really slow. When training our coding models <a href="/cursor_ai/">Cursor</a>, they ate up 27–53% of training time.

So we completely rebuilt it at the kernel level and transitioned to MXFP8. The result: 3.5x faster MoE layer and 1.5x end-to-end training speedup.

We believe our

thumb_up_off_alt381

chat_bubble_outline14

repeat53

shareShare