Timothée Lesort (@tlesort) 's Twitter Profile
Timothée Lesort

@tlesort

Senior Data Scientist @AIgnostics.
Continual/Transfert Leaning and Generalisation at Scale for Vision and LLMs.
Prev. Postdoc. @Mila_Quebec, PhD @IP_Paris_

ID: 3014054421

linkhttps://www.notion.so/Research-Webpage-Timoth-e-Lesort-d11f26cf796c4e9ba627bebf252173db calendar_today03-02-2015 07:35:40

837 Tweet

1,1K Followers

711 Following

Gintare Karolina Dziugaite (@gkdziugaite) 's Twitter Profile Photo

#ECMLPKDD 2024 is happening in my beautiful hometown Vilnius this week! There? Come see my keynote on memorization and generalization this evening @ 6pm.

Emtiyaz Khan (@emtiyazkhan) 's Twitter Profile Photo

We have two open post-doc positions. You dont' have to be a Bayesian but somebody who is interested to work with at the intersection of DL, Bayes, and optimization. riken.jp/en/careers/res… Interest in understanding deep learning and continual lifelong learning is a plus!

Massimo Caccia (@masscaccia) 's Twitter Profile Photo

🚨Internship Alert! Join ServiceNow Research to develop **generalist web agents** that handle complex tasks via browsers—from automating research to managing IT workflows! 🌐 Fine-tune LLMs into agents, explore datasets, and many more —all in Montreal! forms.gle/wHjb5L6A9rNBEW…

🚨Internship Alert! Join <a href="/ServiceNowRSRCH/">ServiceNow Research</a> to develop **generalist web agents** that handle complex tasks via browsers—from automating research to managing IT workflows!

🌐 Fine-tune LLMs into agents, explore datasets, and many more —all in Montreal! 

forms.gle/wHjb5L6A9rNBEW…
Quentin Anthony (@quentinanthon15) 's Twitter Profile Photo

GPT-NeoX 3.0 coming soon... 🟩➡️✅ Efficient post-training ✅ Efficient pre-training ✅ Runs on both NVIDIA and AMD ✅ Open source ✅ Linear scaling to 1k+ GPUs ✅ Transformer, RWKV, Mamba, MoE

Massimo Caccia (@masscaccia) 's Twitter Profile Photo

🚨 Preprint Alert! 🚨 It's 12 hours before your conference deadline. Tic, toc. ⏰ You're obviously last minute and need to write code for some fancy plots. 📊 You counted on your coding assistant to do the heavy lifting, but it's not version-aware. 🤖❌ You keep hitting

🚨 Preprint Alert! 🚨

It's 12 hours before your conference deadline. Tic, toc. ⏰
You're obviously last minute and need to write code for some fancy plots. 📊
You counted on your coding assistant to do the heavy lifting, but it's not version-aware. 🤖❌
You keep hitting
Pau Rodríguez (@prlz77) 's Twitter Profile Photo

I’m thrilled to announce 3 #internship openings Apple ML Research in beautiful ☀️ #Barcelona ☀️ for 2025! Two internships on Generative Models (GM), Controllability, Interpretability, and Model Editing; and one on GM &🔈Spatial Audio. Apply: jobs.apple.com/en-us/details/… Details 🧵

Arthur Ouaknine (@arthurouaknine) 's Twitter Profile Photo

So proud to announce that we (Etienne Laliberté and our team at Mila - Institut québécois d'IA) were part of Limelight who won the $10M XPrize focused on the biodiversity monitoring of the Amazon Rainforest. More details will come Rubisco AI Special thanks to IVADO for the support! #XPRIZERainforest

Joaquin Vanschoren (@joavanschoren) 's Twitter Profile Photo

🚀 Ready to push the boundaries of #AI & #ML? We're hiring 7(!) brilliant PhDs, PostDocs, and Engineers to work on cutting-edge #LLMs #AutoML #ContinualLearning TU Eindhoven 🌟 Think big and shape the future of AI! 🧑‍💻 Apply now! openml-labs.github.io/website/join/j… ❣️Please share❣️ 🧵 1/6

🚀 Ready to push the boundaries of #AI &amp; #ML?  We're hiring 7(!) brilliant PhDs, PostDocs, and Engineers to work on cutting-edge #LLMs #AutoML #ContinualLearning <a href="/TUeindhoven/">TU Eindhoven</a>
🌟 Think big and shape the future of AI!  
🧑‍💻 Apply now! openml-labs.github.io/website/join/j…
❣️Please share❣️
🧵 1/6
Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

Simo Ryu In the context of LLM pre-training, our conclusion is that there are generally two causes for this jump in loss on the original pre-training dataset: (1) The distribution has shifted between pre-training and fine-tuning/continual pre-training, causing forgetting of previously

Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

Workshop alert 🚨 We'll host in ICLR 2025 a workshop on modularity, encompassing collaborative + decentralized + continual learning. Those topics are on the critical path to building better AIs. Interested? submit a paper and join us in Singapore! sites.google.com/corp/view/mcdc…

Workshop alert 🚨

We'll host in ICLR 2025 a workshop on modularity, encompassing collaborative + decentralized + continual learning.

Those topics are on the critical path to building better AIs.

Interested? submit a paper and join us in Singapore!

sites.google.com/corp/view/mcdc…
Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N

How do MoE transformers, like DeepSeek, behave under distribution shifts? Do their routers collapse? Can they still match full re-training performance? Excited to present “Continual Pre-training of MoEs: How robust is your router?”!🧵arxiv.org/abs/2503.05029 1/N
clem 🤗 (@clementdelangue) 's Twitter Profile Photo

Great research on open-source by Harvard University: - $4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created) - Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist I

Great research on open-source by <a href="/Harvard/">Harvard University</a>:
- $4.15B invested in open-source generates $8.8T of value for companies (aka $1 invested in open-source = $2,000 of value created)
- Companies would need to spend 3.5 times more on software than they currently do if OSS did not exist

I
Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

30+ accepted papers 6 oral papers 6 guest speakers join us at ICLR 2026 on the 27th Hall 4 #3 for a full day of workshop on Modularity for Collaborative, Decentralized, and Continual Learning sites.google.com/corp/view/mcdc… Lucio Dery Jnr Mwinm, Fengyuan Liu, and myself will be organizing

30+ accepted papers

6 oral papers

6 guest speakers

join us at <a href="/iclr_conf/">ICLR 2026</a> on the 27th Hall 4 #3 for a full day of workshop on Modularity for Collaborative, Decentralized, and Continual Learning

sites.google.com/corp/view/mcdc…

<a href="/derylucio/">Lucio Dery Jnr Mwinm</a>, Fengyuan Liu, and myself will be organizing
Kartik Ahuja (@kartikahuja1) 's Twitter Profile Photo

Explore fundamental questions surrounding large language models by applying for a postdoctoral position with our Generalization Team at FAIR Paris. If you're attending ICLR, visit our booth to connect with my colleagues and learn more about this exciting opportunity.

Arthur Douillard (@ar_douillard) 's Twitter Profile Photo

MuLoCo: Muon x DiLoCo = ❤️ arxiv.org/abs/2505.23725 from Benjamin Thérien, Xiaolong Huang, Irina Rish, Eugene Belilovsky * Using Muon as inner optimizer * Add quantization of the outer gradient to 2 bits (!) * Add error feedback

MuLoCo: Muon x DiLoCo = ❤️

arxiv.org/abs/2505.23725
from <a href="/benjamintherien/">Benjamin Thérien</a>, Xiaolong Huang, <a href="/irinarish/">Irina Rish</a>, <a href="/ebelilov/">Eugene Belilovsky</a> 

* Using Muon as inner optimizer
* Add quantization of the outer gradient to 2 bits (!)
* Add error feedback
Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N

Is AdamW the best inner optimizer for DiLoCo? Does the inner optimizer affect the compressibility of the DiLoCo delta? Excited to introduce MuLoCo: Muon is a practical inner optimizer for DiLoCo! 🧵arxiv.org/abs/2505.23725 1/N
Benjamin Thérien (@benjamintherien) 's Twitter Profile Photo

Tired of tuning hyperparameters? Introducing PyLO! We’re bringing hyperparameter-free learned optimizers to PyTorch with drop in torch.optim support and faster step times thanks to our custom cuda kernels. Check out our code here: github.com/Belilovsky-Lab…

Lucas Caccia (@lucaspcaccia) 's Twitter Profile Photo

RAG and in-context learning are the go-to approaches for integrating new knowledge into LLMs, making inference very inefficient We propose instead 𝗞𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗲 𝗠𝗼𝗱𝘂𝗹𝗲𝘀 : lightweight LoRA modules trained offline that can match RAG performance without the drawbacks

Massimo Caccia (@masscaccia) 's Twitter Profile Photo

🔥 We stress-tested today’s best AI code generators in 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑦 ℎ𝑒𝑙𝑙. Introducing 𝐆𝐢𝐭𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧 𝟐.𝟎: 328 challenges for version-controlled code generation. The verdict? Even top models only hit ~50% success.

🔥 We stress-tested today’s best AI code generators in 𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑐𝑦 ℎ𝑒𝑙𝑙.

Introducing 𝐆𝐢𝐭𝐂𝐡𝐚𝐦𝐞𝐥𝐞𝐨𝐧 𝟐.𝟎: 328 challenges for version-controlled code generation.

The verdict? Even top models only hit ~50% success.