Timothée Lesort (@tlesort) Twitter Tweets • TwiCopy

Gintare Karolina Dziugaite

a year ago

#ECMLPKDD 2024 is happening in my beautiful hometown Vilnius this week! There? Come see my keynote on memorization and generalization this evening @ 6pm.

thumb_up_off_alt45

chat_bubble_outline1

repeat5

shareShare

We have two open post-doc positions. You dont' have to be a Bayesian but somebody who is interested to work with at the intersection of DL, Bayes, and optimization. riken.jp/en/careers/res… Interest in understanding deep learning and continual lifelong learning is a plus!

thumb_up_off_alt124

chat_bubble_outline2

repeat36

shareShare

Massimo Caccia

@masscaccia

a year ago

🚨Internship Alert! Join ServiceNow Research to develop **generalist web agents** that handle complex tasks via browsers—from automating research to managing IT workflows! 🌐 Fine-tune LLMs into agents, explore datasets, and many more —all in Montreal! forms.gle/wHjb5L6A9rNBEW…

🚨Internship Alert! Join <a href="/ServiceNowRSRCH/">ServiceNow Research</a> to develop **generalist web agents** that handle complex tasks via browsers—from automating research to managing IT workflows!

🌐 Fine-tune LLMs into agents, explore datasets, and many more —all in Montreal!

forms.gle/wHjb5L6A9rNBEW…

thumb_up_off_alt64

chat_bubble_outline1

repeat26

shareShare

Quentin Anthony

@quentinanthon15

a year ago

GPT-NeoX 3.0 coming soon... 🟩➡️✅ Efficient post-training ✅ Efficient pre-training ✅ Runs on both NVIDIA and AMD ✅ Open source ✅ Linear scaling to 1k+ GPUs ✅ Transformer, RWKV, Mamba, MoE

thumb_up_off_alt76

chat_bubble_outline1

repeat10

shareShare

Massimo Caccia

@masscaccia

a year ago

🚨 Preprint Alert! 🚨 It's 12 hours before your conference deadline. Tic, toc. ⏰ You're obviously last minute and need to write code for some fancy plots. 📊 You counted on your coding assistant to do the heavy lifting, but it's not version-aware. 🤖❌ You keep hitting

thumb_up_off_alt68

chat_bubble_outline4

repeat26

shareShare

Pau Rodríguez

@prlz77

a year ago

I’m thrilled to announce 3 #internship openings Apple ML Research in beautiful ☀️ #Barcelona ☀️ for 2025! Two internships on Generative Models (GM), Controllability, Interpretability, and Model Editing; and one on GM &🔈Spatial Audio. Apply: jobs.apple.com/en-us/details/… Details 🧵

thumb_up_off_alt419

chat_bubble_outline2

repeat79

shareShare

Arthur Ouaknine

@arthurouaknine

a year ago

So proud to announce that we (Etienne Laliberté and our team at Mila - Institut québécois d'IA) were part of Limelight who won the $10M XPrize focused on the biodiversity monitoring of the Amazon Rainforest. More details will come Rubisco AI Special thanks to IVADO for the support! #XPRIZERainforest

thumb_up_off_alt30

chat_bubble_outline0

repeat2

shareShare

Joaquin Vanschoren

@joavanschoren

a year ago

🚀 Ready to push the boundaries of #AI & #ML? We're hiring 7(!) brilliant PhDs, PostDocs, and Engineers to work on cutting-edge #LLMs #AutoML #ContinualLearning TU Eindhoven 🌟 Think big and shape the future of AI! 🧑‍💻 Apply now! openml-labs.github.io/website/join/j… ❣️Please share❣️ 🧵 1/6

thumb_up_off_alt33

chat_bubble_outline1

repeat13

shareShare

Benjamin Thérien

@benjamintherien

a year ago

Simo Ryu In the context of LLM pre-training, our conclusion is that there are generally two causes for this jump in loss on the original pre-training dataset: (1) The distribution has shifted between pre-training and fine-tuning/continual pre-training, causing forgetting of previously

thumb_up_off_alt75

chat_bubble_outline2

repeat10

shareShare

Arthur Douillard

@ar_douillard

a year ago

Workshop alert 🚨 We'll host in ICLR 2025 a workshop on modularity, encompassing collaborative + decentralized + continual learning. Those topics are on the critical path to building better AIs. Interested? submit a paper and join us in Singapore! sites.google.com/corp/view/mcdc…

thumb_up_off_alt140

chat_bubble_outline4

repeat39

shareShare

Benjamin Thérien

@benjamintherien

9 months ago

How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N

thumb_up_off_alt39

chat_bubble_outline1

repeat20

shareShare

clem 🤗

@clementdelangue

9 months ago

Great research on open-source by Harvard University: - $4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created) - Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist I

Great research on open-source by <a href="/Harvard/">Harvard University</a>:
- $4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created)
- Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist

I

thumb_up_off_alt5,5K

chat_bubble_outline124

repeat1,1K

shareShare

Arthur Douillard

@ar_douillard

7 months ago

30+ accepted papers 6 oral papers 6 guest speakers join us at ICLR 2026 on the 27th Hall 4 #3 for a full day of workshop on Modularity for Collaborative, Decentralized, and Continual Learning sites.google.com/corp/view/mcdc… Lucio Dery Jnr Mwinm, Fengyuan Liu, and myself will be organizing

30+ accepted papers

6 oral papers

6 guest speakers

join us at <a href="/iclr_conf/">ICLR 2026</a> on the 27th Hall 4 #3 for a full day of workshop on Modularity for Collaborative, Decentralized, and Continual Learning

sites.google.com/corp/view/mcdc…

<a href="/derylucio/">Lucio Dery Jnr Mwinm</a>, Fengyuan Liu, and myself will be organizing

thumb_up_off_alt103

chat_bubble_outline3

repeat28

shareShare

Kartik Ahuja

@kartikahuja1

7 months ago

Explore fundamental questions surrounding large language models by applying for a postdoctoral position with our Generalization Team at FAIR Paris. If you're attending ICLR, visit our booth to connect with my colleagues and learn more about this exciting opportunity.

thumb_up_off_alt19

chat_bubble_outline0

repeat3

shareShare

Timothée Lesort

@tlesort

7 months ago

If it is AGI, shall it be able to convince anyone that this is AGI?

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Arthur Douillard

@ar_douillard

6 months ago

MuLoCo: Muon x DiLoCo = ❤️ arxiv.org/abs/2505.23725 from Benjamin Thérien, Xiaolong Huang, Irina Rish, Eugene Belilovsky * Using Muon as inner optimizer * Add quantization of the outer gradient to 2 bits (!) * Add error feedback

MuLoCo: Muon x DiLoCo = ❤️

arxiv.org/abs/2505.23725
from <a href="/benjamintherien/">Benjamin Thérien</a>, Xiaolong Huang, <a href="/irinarish/">Irina Rish</a>, <a href="/ebelilov/">Eugene Belilovsky</a>

* Using Muon as inner optimizer
* Add quantization of the outer gradient to 2 bits (!)
* Add error feedback

thumb_up_off_alt142

chat_bubble_outline4

repeat20

shareShare

Benjamin Thérien

@benjamintherien

6 months ago

Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N

thumb_up_off_alt81

chat_bubble_outline2

repeat25

shareShare

Benjamin Thérien

@benjamintherien

6 months ago

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…

thumb_up_off_alt31

chat_bubble_outline1

repeat7

shareShare

Lucas Caccia

@lucaspcaccia

5 months ago

RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks

thumb_up_off_alt39

chat_bubble_outline1

repeat13

shareShare

Massimo Caccia

@masscaccia

3 months ago

🔥 We stress-tested today’s best AI code generators in 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑦 ℎ𝑒𝑙𝑙. Introducing 𝐆𝐢𝐭𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧 𝟐.𝟎: 328 challenges for version-controlled code generation. The verdict? Even top models only hit ~50% success.

thumb_up_off_alt41

chat_bubble_outline3

repeat25

shareShare

Timothée Lesort

Gintare Karolina Dziugaite

Emtiyaz Khan

Massimo Caccia

Quentin Anthony

Massimo Caccia

Pau Rodríguez

Arthur Ouaknine

Joaquin Vanschoren

Benjamin Thérien

Arthur Douillard

Benjamin Thérien

clem 🤗

Arthur Douillard

Kartik Ahuja

Timothée Lesort

Arthur Douillard

Benjamin Thérien

Benjamin Thérien

Lucas Caccia

Massimo Caccia