Pierre Chambon (@pierrechambon6) 's Twitter Profile
Pierre Chambon

@pierrechambon6

NLP/Code Generation PhD at FAIR (Meta AI) and INRIA - previously researcher at Stanford University - MS Stanford 22’ - Centrale Paris P2020

ID: 1465370445843214346

calendar_today29-11-2021 17:22:14

77 Tweet

689 Takipçi

1,1K Takip Edilen

Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

Hi Grok, give me the name of an existing and published benchmark on Time and Space Complexity, used to measure the performance of LLMs ? Be kind to the author of the benchmark please (and only output the genuine truth, no fake news please !) 🥰

Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

Quite impressive especially without these reasoning tokens - also correlate with the findings on BigO(Bench) where updated V3 indeed scores higher than R1 on Time Complexity Generation Wonder what downstream performance we will see in R2

Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

🔥Very happy to introduce BigO(Bench) dataset on Hugging Face 🤗 ✨3,105 coding problems and 1,190,250 solutions from CodeContests ✨Time/Space Complexity labels and curve coefficients ✨Up to 5k Runtime/Memory Footprint measures for each solution  huggingface.co/datasets/faceb…

Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

Great work that prevents classifiers from relying too much on spurious correlations - also helps increase fairness for medical imaging models ! ❤️

Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

And if you’re interested in the PhD program in Paris, amazing (and pretty much unique) opportunity to do research at scale in an industry setting (so not allowed to do one single massive notebook for each research paper unfortunately :/)

Pierre Chambon (@pierrechambon6) 's Twitter Profile Photo

Great paper on how to do RL for directly optimizing inference usage of your models ! If you want a model that is great for multiple sampling + majority voting, you need to choose your RL training objective adequately (better explained in the paper itself ❤️)

Baptiste Rozière (@b_roziere) 's Twitter Profile Photo

We released Devstral. It is a 24B model released under the Apache 2.0 license. It the best open model on SWE-Bench verified today. You can check our blog post or test it with OpenHands (from All Hands AI ) following the instructions here: huggingface.co/mistralai/Devs…