Shizhe He (@shizhehe) Twitter Tweets • TwiCopy

Shizhe He

@shizhehe

+ Follow

rethinking computing @ Brains in Silicon; building excavators; prev. STAILab @Stanford

ID: 1524037330000138242

linkhttp://shizhehe.com calendar_today10-05-2022 14:43:52

23 Tweet

152 Takipçi

339 Takip Edilen

Avanika Narayan

@avanika15

5 months ago

The U.S.–China AI race won’t be decided by who builds the most datacenters, but by who deploys the most intelligence. We call this Gross Domestic Intelligence (GDI): intelligence per watt × usable power. If the U.S. activates its dense installed base of local AI accelerators in

thumb_up_off_alt134

chat_bubble_outline9

repeat41

shareShare

Shizhe He

@shizhehe

5 months ago

It’s been exciting watching this work shape up over the past few months, this is important research

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Scott Linderman

@scott_linderman

5 months ago

My lab is presenting several papers at NeurIPS this week. Swing by the posters below to learn about our work! 🧵

thumb_up_off_alt26

chat_bubble_outline1

repeat9

shareShare

Radical Numerics

@radicalnumerics

4 months ago

Scaling scientific world models requires co-designing architectures, training objectives, and numerics. Today, we share the first posts in our series on low-precision pretraining, starting with NVIDIA's NVFP4 recipe for stable 4-bit training. Part 1: radicalnumerics.ai/blog/nvfp4-par… Part

thumb_up_off_alt519

chat_bubble_outline9

repeat93

shareShare

Flapping Airplanes

@flappyairplanes

3 months ago

Announcing Flapping Airplanes! We’ve raised $180M from GV, Sequoia, and Index to assemble a new guard in AI: one that imagines a world where models can think at human level without ingesting half the internet.

thumb_up_off_alt3,3K

chat_bubble_outline330

repeat245

shareShare

jack morris

@jxmnop

3 months ago

at long last, the final paper of my phd 🧮 Learning to Reason in 13 Parameters 🧮 we develop TinyLoRA, a new ft method. with TinyLoRA + RL, models learn well with dozens or hundreds of params example: we use only 13 parameters to train 7B Qwen model from 76 to 91% on GSM8K 🤯

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat197

shareShare

Mayee Chen

@mayeechen

2 months ago

Data mixing - determining ratios across your training datasets - matters a lot for model quality. While building Olmo 3, we learned it’s hard to set up a method that finds a strong mix, and hard to maintain that mix as datasets change throughout development. Introducing Olmix👇

thumb_up_off_alt236

chat_bubble_outline7

repeat61

shareShare

Stuart Sul

@stuart_sul

2 months ago

(1/7) We're releasing ThunderKittens 2.0! Faster kernels, cleaner code, industry contributions, and new state-of-the-art BF16 / MXFP8 / NVFP4 GEMMs that match or surpass cuBLAS! Alongside this release, we’re equally excited to share some insights we learned while squeezing every

thumb_up_off_alt468

chat_bubble_outline11

repeat78

shareShare