Peter Humphreys (@p_humphreys) Twitter Tweets • TwiCopy

Peter Humphreys

@p_humphreys

+ Follow

AI, quantum and neuroscience. Scientist @ DeepMind

ID: 1244596843

calendar_today05-03-2013 20:45:57

10 Tweet

63 Followers

33 Following

Adam Santoro

@santoroai

2 years ago

Transformers can be made sparse across their depth. When trained isoFLOP, we can match or exceed the performance of vanilla models, while saving inference FLOPs arxiv.org/abs/2404.02258

thumb_up_off_alt75

chat_bubble_outline4

repeat19

shareShare

Piotr Padlewski

@piotrpadlewski

2 years ago

Big fan of the work of Adam Santoro and others. Glad Google decided to finally release it! arxiv.org/abs/2404.02258

thumb_up_off_alt15

chat_bubble_outline1

repeat4

shareShare

... and with one experiment, I was able to roughly reproduce their results for a ~220M GPT-2. It gives a speedup of ~20min (80min dense vs 60min MoD, 4 A100s) while keeping the pplx close. This roughly matches Fig. 3 or 4 in the paper arxiv.org/pdf/2404.02258…

thumb_up_off_alt38

chat_bubble_outline2

repeat7

shareShare

George Grigorev

@iamgrigorev

2 years ago

I have implemented Mixture-of-Depths and it shows significant memory reduction during training and 10% speed increase. I will verify if it achieves the same quality with 12.5% active tokens. github.com/thepowerfuldee… thanks Alex Hägele for initial code

thumb_up_off_alt361

chat_bubble_outline6

repeat51

shareShare

PHD Comics

@phdcomics

9 years ago

A new method for reviews phdcomics.com/comics.php?f=1…

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat1,1K

shareShare