Khue Le (@netw0rkf10w) 's Twitter Profile
Khue Le

@netw0rkf10w

Head of R&D at getvocal.ai.
Building conversational AI by day, doing optimization research by night.

ID: 195372160

linkhttps://khue.fr calendar_today26-09-2010 14:52:22

521 Tweet

292 Followers

131 Following

Khue Le (@netw0rkf10w) 's Twitter Profile Photo

TIL GitHub has a great new feature: Discussions. Now any repo is basically an online forum. This is really awesome! Separating Discussions from Issues makes it much more easier to manage the content, especially for popular repos.

TIL <a href="/github/">GitHub</a> has a great new feature: Discussions. Now any repo is basically an online forum. This is really awesome! Separating Discussions from Issues makes it much more easier to manage the content, especially for popular repos.
Daniela Witten (@daniela_witten) 's Twitter Profile Photo

Remember when ML was a hugely important area w/far-reaching implications in literally every field, and then an ML conference ever-so-slightly changed its name to avoid alienating 50% of ppl, which caused the ML community to collapse & the field to die out? Yeah, neither do I.

Julien Mairal (@julienmairal) 's Twitter Profile Photo

Congratulations to Dr. Mathilde Caron Mathilde Caron, who successfully defended her PhD **in person** after a brilliant presentation. The committee was prestigious with Cordelia Schmid, Andrew Zisserman, Alyosha Efros, Diane Larlus, and Alexey Dosovitskiy.

Congratulations to Dr. Mathilde Caron <a href="/mcaron31/">Mathilde Caron</a>, who successfully defended her PhD **in person** after a brilliant presentation. The committee was prestigious with <a href="/CordeliaSchmid/">Cordelia Schmid</a>, Andrew Zisserman, Alyosha Efros, <a href="/dlarlus/">Diane Larlus</a>, and Alexey Dosovitskiy.
Khue Le (@netw0rkf10w) 's Twitter Profile Photo

In our NeurIPS 2021 paper (with Karteek Alahari) we showed that CCCP is Frank-Wolfe in disguise. Happy to see other people recently rediscovering this fact and presenting it as a striking result. Want to know another equally striking fact? Mean Field is also Frank-Wolfe!👇

Khue Le (@netw0rkf10w) 's Twitter Profile Photo

An ICLR 2023 submission has been accused of being a rehash of previous work, claim supported by detailed technical arguments. If true then there must be consequences. Intentional misleading contributions should not be tolerated in academic research. openreview.net/forum?id=CQsmM…

Tri Dao (@tri_dao) 's Twitter Profile Photo

We're releasing an optimized implementation of GPT2/GPT3 with FlashAttention🚀! This trains 3-5x faster than the Huggingface version, reaching up to 189 TFLOPs/sec per A100, 60.6% (model) FLOPs util of the theoretical maximum. 1/6 github.com/HazyResearch/f…

Francesco Orabona (@bremen79) 's Twitter Profile Photo

FYI the so-called AdaGrad norm stepsize was proposed for the first time in arxiv.org/abs/1002.4862 (see theorem 2) I have seen several papers and talks at #NeurIPS22 citing the wrong work

Tri Dao (@tri_dao) 's Twitter Profile Photo

I’ve been working with Inge and we’ve made FlashAttention even faster for long sequences! For seqlen 8K, FlashAttention is now up to 2.7x faster than a standard PyTorch implementation even at small batch, making it easier to train better LMs with longer context 1/7

I’ve been working with <a href="/AdeptAILabs/">Inge</a> and we’ve made FlashAttention even faster for long sequences! For seqlen 8K, FlashAttention is now up to 2.7x faster than a standard PyTorch implementation even at small batch, making it easier to train better LMs with longer context 1/7
Centre Inria de l'Université Grenoble Alpes (@inria_grenoble) 's Twitter Profile Photo

[DISTINCTION 🏆] Toutes nos félicitations à Julien Mairal de l'équipe-projet Thoth du centre @inria de l'Université Grenoble Alpes, lauréat d'une bourse European Research Council (ERC) Consolidator Grant 👏 Découvrez-en ➕ ici : inria.fr/fr/julien-mair… #MachineLearning #Algorithm

[DISTINCTION 🏆]

Toutes nos félicitations à <a href="/julienmairal/">Julien Mairal</a> de l'équipe-projet Thoth du centre @inria de l'Université Grenoble Alpes, lauréat d'une bourse <a href="/ERC_Research/">European Research Council (ERC)</a> Consolidator Grant 👏

Découvrez-en ➕ ici :
inria.fr/fr/julien-mair…

#MachineLearning #Algorithm
Guillaume Champeau (@gchampeau) 's Twitter Profile Photo

Elle est géniale cette pub d’Orange ! (bon 3 millions de vues je suis sûrement le dernier à la découvrir) youtu.be/D_HPiaAx_QA

Francesco Orabona (@bremen79) 's Twitter Profile Photo

New blog post: Yet Another ICML Award Fiasco The story of the ICML Conference 2023 Outstanding Paper Award to the D-Adaptation paper with worse results that the ones from 9 years ago Please share it to start a needed conversation on mistakenly granted awards parameterfree.com/2023/08/30/yet…

Khue Le (@netw0rkf10w) 's Twitter Profile Photo

Hi Aaron Defazio. Here's the result of my optimizer, compared to yours (still running). Can you beat my blue curve with hyper-parameter tuning? ;) Please give it a try using this code: github.com/facebookresear…

Hi <a href="/aaron_defazio/">Aaron Defazio</a>. Here's the result of my optimizer, compared to yours (still running). Can you beat my blue curve with hyper-parameter tuning? ;) Please give it a try using this code: github.com/facebookresear…
Khue Le (@netw0rkf10w) 's Twitter Profile Photo

While waiting for Aaron Defazio's tuning result, here's my full run of his method (green curve). Interestingly, some modifications inspired by my optimizer seem to boost its performance. Note: MAE's default hyper-params are used for all experiments.

While waiting for <a href="/aaron_defazio/">Aaron Defazio</a>'s tuning result, here's my full run of his method (green curve). Interestingly, some modifications inspired by my optimizer seem to boost its performance.
Note: MAE's default hyper-params are used for all experiments.
Yann LeCun (@ylecun) 's Twitter Profile Photo

🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next

🥁 Llama3 is out 🥁
8B and 70B models available today.
8k context length.
Trained with 15 trillion tokens on a custom-built 24k GPU cluster.
Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases.
More versions are coming over the next