Alessandro Sordoni (@murefil) 's Twitter Profile
Alessandro Sordoni

@murefil

ML Team / MSR Montréal. Views are my own.

ID: 124319949

calendar_today19-03-2010 01:02:45

499 Tweet

846 Followers

908 Following

Lucas Caccia (@lucaspcaccia) 's Twitter Profile Photo

New preprint : To promote generalisation to new tasks, modular LLMs reuse and adapt previously acquired skills. We propose a more expressive “multi-head” routing strategy, which achieves consistent gains. Code: github.com/microsoft/mttl Paper: arxiv.org/abs/2211.03831

Edoardo Ponti (@pontiedoardo) 's Twitter Profile Photo

The applications for the ELLIS PhD programme are now open! If you'd like to join EdinburghNLP and do research on modular deep learning (parameter-efficient fine-tuning, routing in mixture-of-experts, model merging, ...) or computational typology, drop me a message!

Alexandra Olteanu (@o_saja) 's Twitter Profile Photo

The #FAccT2024 CFP encourages authors to include in their papers 1) an ethical considerations statement, 2) a researcher positionality statement & 3) an adverse impact statement. To explain a bit the reasoning behind the change, we wrote a longer blogpost medium.com/@alexandra.olt…

Lucas Caccia (@lucaspcaccia) 's Twitter Profile Photo

New updated version of the paper, where we show our learned routing can enable **zero-shot transfer** ! You can 1. learn a set of LoRA modules with multi-head routing 2. fuse them back in a single LoRA 👉 0-shot acc. with T0-11B goes from 61 to 64.5%. arxiv.org/pdf/2211.03831…

Alan Jeffares (@jeffaresalan) 's Twitter Profile Photo

Have we been training deep ensembles with the wrong objective? 😱 Our new #NeurIPS paper investigates why training ensembles *jointly* is almost never observed in practice and uncovers some pretty surprising behaviour… 🧵 [1/N]

Have we been training deep ensembles with the wrong objective? 😱

Our new #NeurIPS paper investigates why training ensembles *jointly* is almost never observed in practice and uncovers some pretty surprising behaviour… 

🧵 [1/N]
arindam mitra (@arindam1408) 's Twitter Profile Photo

With Orca, we're excited about the potential of redefining the reasoning capabilities of smaller LLMs. We're still at the beginning phases of this intriguing journey, but our preliminary explorations have yielded encouraging results. 1/7

Alexandra Olteanu (@o_saja) 's Twitter Profile Photo

📢📢📢 We will start reviewing intern applications soon! Topics of particular interest are responsible NLP, human agency, human centered AI & impacts of AI systems. If interested, try to apply soon! #EMNLP2023 #NeurIPS2023 #FAccT2024 \w Su Lin Blodgett Vera Liao

Lucas Caccia (@lucaspcaccia) 's Twitter Profile Photo

nisten cohere skunkworks crumb "The worlds first MoE of Loras" is not quite true 😅. The idea has been around since at least ~2 years. FWIW the paper (correctly) cites arxiv.org/abs/2202.13914 and arxiv.org/abs/2211.03831 (disclaimer : I am an author on the last one)

Tao Sun (@taosunvoyage) 's Twitter Profile Photo

Thanks for the mention! 🎉🎉I'm thrilled to contribute to the Hugging Face community with Polytropon & MHR Edoardo Ponti Alessandro Sordoni 🚀🚀Your awesome methods have shown impressive results in our multitask use cases Ant Group ...and we have more to share soon. Stay tuned! 😉

Xinyi Wang (@xinyiwang98) 's Twitter Profile Photo

Happy to share the preprint of my MSR intern project: Guiding Language Model Reasoning with Planning Tokens (arxiv.org/abs/2310.05707). We propose to insert tunable special planning tokens in front of each chain-of-thought step to guide the generation of the actual reasoning step.

Happy to share the preprint of my MSR intern project: Guiding Language Model Reasoning with Planning Tokens (arxiv.org/abs/2310.05707). We propose to insert tunable special planning tokens in front of each chain-of-thought step to guide the generation of the actual reasoning step.
Arian Hosseini (@ariantbd) 's Twitter Profile Photo

Excited to share our new paper V-STaR - Common self-improvement methods only use correct self-generated solutions to bootstrap models - V-STaR utilizes iteratively self-generated correct and incorrect solutions to train a verifier using DPO arxiv.org/abs/2402.06457 🧵(1/4)

Excited to share our new paper V-STaR

- Common self-improvement methods only use correct self-generated solutions to bootstrap models
- V-STaR utilizes iteratively self-generated correct and incorrect solutions to train a verifier using DPO

arxiv.org/abs/2402.06457
🧵(1/4)
Alessandro Sordoni (@murefil) 's Twitter Profile Photo

LLM self-improvement works (STaR, SPIN, Self-Rewarding LM). We use correct/incorrect solutions generated during self-improvement to train a verifier with DPO, and use it to rank solutions at test time. DPO rankers work well! Thx Arian Hosseini and Rishabh Agarwal for leading the project

babyLM (@babylmchallenge) 's Twitter Profile Photo

👶 BabyLM Challenge is back! Can you improve pretraining with a small data budget? BabyLMs for better LLMs & for understanding how humans learn from 100M words New: How vision affects learning Bring your own data Paper track babylm.github.io 🧵

Elvis Dohmatob (@dohmatobelvis) 's Twitter Profile Photo

Money could buy happiness: Catastrophic "model collapse" (due to self-consuming loops) can be avoided at the extra cost of feedback on data quality (i.e via pruning). Please RT!

Alessandro Sordoni (@murefil) 's Twitter Profile Photo

Reach out to Lucas Caccia & Zhan Su to learn more about some of our post-hoc moe work in Montreal arxiv.org/abs/2405.11157, our codebase github.com/microsoft/mttl and what's next 👀 Oleksiy will also be around!

Silviu Pitis (@silviupitis) 's Twitter Profile Photo

When evaluating LLMs, added context such as a criteria or user profile, may be critical for determining preferred behavior. But can reward models effectively incorporate this additional context? 📝 New paper: arxiv.org/abs/2407.14916 🤗 Dataset: huggingface.co/datasets/micro…

When evaluating LLMs, added context such as a criteria or user profile, may be critical for determining preferred behavior.

But can reward models effectively incorporate this additional context?

📝 New paper: arxiv.org/abs/2407.14916
🤗 Dataset: huggingface.co/datasets/micro…
Prateek Yadav (@prateeky2806) 's Twitter Profile Photo

We just released our survey on "Model MoErging", But what is MoErging?🤔Read on! Imagine a world where fine-tuned models, each specialized in a specific domain, can collaborate and "compose/remix" their skills using some routing mechanism to tackle new tasks and queries! 🧵👇

We just released our survey on "Model MoErging", But what is MoErging?🤔Read on!  

Imagine a world where fine-tuned models, each specialized in a specific domain, can collaborate and "compose/remix" their skills using some routing mechanism to tackle new tasks and queries!
🧵👇