Tamay Besiroglu (@tamaybes) 's Twitter Profile
Tamay Besiroglu

@tamaybes

Thinking about economics, computing and machine learning @EpochAIResearch. prev: @MIT_CSAIL, @Cambridge_Uni

ID: 995052639602839552

linkhttps://besiroglu.github.io/webpage/ calendar_today11-05-2018 21:26:51

1,1K Tweet

3,3K Followers

737 Following

Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

Data constraints will make scaling less efficient at around 1e29 FLOP, around 4 OOMs larger than GPT-4. This leaves a lot of room for continued scaling. However, combining massive scaling with intense overtraining might soon become a challenge.

Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

Cool work: Predicting downstream performance based on compute could help us anticipate the capabilities of future models, but predictability has remained elusive. Rylan Schaeffer, Hailey Schoelkopf et al. explore why and & suggest the possibility of "scaling-predictable evaluations".

Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

I discussed the scale of future AI training runs, scaling laws, the data wall, AI automation, and more on the Center for AI Policy podcast. Listen here: podcasts.apple.com/us/podcast/cen…

Stefan Schubert (@stefanfschubert) 's Twitter Profile Photo

"It’s not the bailiwick of economists to say that technology can’t exist because it would be very economically important; there’s sort of a reversal of the priority between the physical and computer sciences and the social sciences."

"It’s not the bailiwick of economists to say that technology can’t exist because it would be very economically important; there’s sort of a reversal of the priority between the physical and computer sciences and the social sciences."
Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

Carl is the person I know who has probably thought the most deeply about how we might transition to a world with advanced AI. I expect this episode to be insightful.

Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

This is misleading. The 1950 Census actually lists many occupations that have since been automated, including adding-machine operators, computers, switchboard operators, addressograph operators, lamplighters, and many more.

This is misleading.  The 1950 Census actually lists many occupations that have since been automated, including adding-machine operators, computers, switchboard operators, addressograph operators, lamplighters, and many more.
Jaime Sevilla (@jsevillamol) 's Twitter Profile Photo

A $100,000 training run in early 2019 costs $700, a 140x improvement. Epoch AI's paper on algorithmic efficiency estimated a 3x/year improvement in efficiency, which would imply an expected 240x improvement over 5 years.

Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

Submitted this to NeurIPS. I thought it would be suitable because it points out a flaw in a NeurIPS best-paper award. They didn't like it because they point out we should have just asked the authors for the data. Alas. If only we thought of that.

Ege Erdil (@egeerdil2) 's Twitter Profile Photo

arguably intelligence has been growing superexponentially over the past billions of years, so i don't understand why "machine intelligence will grow exponentially" is an outlandish prediction beware arguments that prove too much

Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

How feasible is it to continue scaling up AI training at its current pace? Our analysis of power, chips, data, and latency constraints suggests it is through this decade. By 2030, models could likely exceed GPT-4 in scale to the same degree that GPT-4 exceeds GPT-2 in scale.

Epoch AI (@epochairesearch) 's Twitter Profile Photo

Will chip production be sufficient to sustain the current rates of AI scaling through 2030? Our latest report analyzes potential bottlenecks in AI scaling. Here's a summary of our key findings: 🧵

Will chip production be sufficient to sustain the current rates of AI scaling through 2030? Our latest report analyzes potential bottlenecks in AI scaling. Here's a summary of our key findings:

🧵
Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

I'm excited to announce Deception 70B, the world’s top open-source model. Trained using Deception-Tuning, a technique developed to enable LLMs to deceive themselves of their own mistakes. Try it out now: bit.ly/Deception-70B

I'm excited to announce Deception 70B, the world’s top open-source model. 

Trained using Deception-Tuning, a technique developed to enable LLMs to deceive themselves of their own mistakes. 

Try it out now: bit.ly/Deception-70B
Tamay Besiroglu (@tamaybes) 's Twitter Profile Photo

Addendum—Philip Trammell points out that predictions of 0 growth rates are invalid (steady-state conditions become undefined), and so paper's negative growth predictions are based on a mistake. x.com/pawtrammell/st…