Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profileg
Dwarkesh Patel

@dwarkesh_sp

Being pretrained

Host of Dwarkesh Podcast

https://t.co/3SXlu7fy6N
https://t.co/rEhnfYywXY
https://t.co/hQfIWdM1Un

ID:1209960539390201864

linkhttps://www.dwarkeshpatel.com/ calendar_today25-12-2019 22:14:46

4,1K Tweets

54,0K Followers

699 Following

Follow People
Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profile Photo

Meta is going to have 350,000 H100s by the end of the year.

Given lead times, they probably had to start ordering them in 2022.

How did Zuck know he'd need all these GPUs?

account_circle
Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profile Photo

'More of what we call training for these big models is actually going to be inference generating synthetic data to then go feed into the model

We trained Llama 3 on around 15 trillion tokens.

Our prediction was that it was going to asymptote more, but even by the end, it was

account_circle
Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profile Photo

Last 28 days 🤯

While the Zuck & Trenton/Sholto episodes are doing extremely well on YouTube, what I'm proudest of is that most of these views are actually from Sarah Paine content!

She is one of the greatest living historians, but her work wasn't really publicly well known

Last 28 days 🤯 While the Zuck & Trenton/Sholto episodes are doing extremely well on YouTube, what I'm proudest of is that most of these views are actually from Sarah Paine content! She is one of the greatest living historians, but her work wasn't really publicly well known
account_circle
John Coogan(@johncoogan) 's Twitter Profile Photo

Pretty remarkable that the only press Zuck shared from the Llama 3 launch was Dwarkesh Patel and Roberto Nickson

Meta did give embargoed comments to mainstream publishers, but only the independent interviewers got posted to his story.

GOING DIRECT! Lulu Cheng Meservey

Pretty remarkable that the only press Zuck shared from the Llama 3 launch was @dwarkesh_sp and @rpnickson Meta did give embargoed comments to mainstream publishers, but only the independent interviewers got posted to his story. GOING DIRECT! @lulumeservey
account_circle
Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profile Photo

Every time I fail a spaced repetition review for a card which I remember thinking was almost too trivial to write down, I become more convinced that everything I read without making cards for is a waste of time.

account_circle
Tsarathustra(@tsarnick) 's Twitter Profile Photo

Mark Zuckerberg doesn't want AI to end up like mobile apps, where gatekeepers like Apple and Google tell you what you're allowed to build

account_circle
Johannes Tammekänd(@johannestknd) 's Twitter Profile Photo

Wildest part of Zuck's interview with Dwarkesh Patel when asked how much energy necessary to train frontier models

Zuck: we need nuclear power plants

account_circle
Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profile Photo

'1 GW - that's the size of a meaningful nuclear power plant, only going towards training a model.

Over the last few years, there was this issue of GPU production. Now I think that's getting less.

But I actually think before we hit that, you're gonna run into energy constraints.

account_circle
David Perell(@david_perell) 's Twitter Profile Photo

The meteoric rise of Dwarkesh is a reminder that no matter how crowded the space, there’s always room for people who do great work.

There’s a trillion podcasts out there but his is going gangbusters because it’s so darn good.

account_circle
kache (dingboard.com)(@yacineMTB) 's Twitter Profile Photo

this is a really good podcast. like a really really good podcast.

Zuck is talking about gigawatts and you're still talking about flops. You're trying to raise money, zuck is trying to convince governments to take the money he already has.

account_circle
Dwarkesh Patel(@dwarkesh_sp) 's Twitter Profile Photo

Zuck:

“For Llama 3, we focused on training with a lot of code

Training the model on code helps it reason across a lot of different types of domains

If someone else solves reasoning, and we're sitting here with a basic chatbot, our product is lame

We realized we've got to

account_circle