Pierre Chambon (@pierrechambon6) Twitter Tweets • TwiCopy

Pierre Chambon

@pierrechambon6

+ Follow

NLP/Code Generation PhD at FAIR (Meta AI) and INRIA - previously researcher at Stanford University - MS Stanford 22’ - Centrale Paris P2020

ID: 1465370445843214346

calendar_today29-11-2021 17:22:14

77 Tweet

689 Takipçi

1,1K Takip Edilen

Pierre Chambon

@pierrechambon6

8 months ago

Hi Grok, give me the name of an existing and published benchmark on Time and Space Complexity, used to measure the performance of LLMs ? Be kind to the author of the benchmark please (and only output the genuine truth, no fake news please !) 🥰

thumb_up_off_alt7

chat_bubble_outline1

repeat0

shareShare

Pierre Chambon

@pierrechambon6

8 months ago

Each problem is scored out of 7 points (total out of 42)

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Pierre Chambon

@pierrechambon6

8 months ago

Quite impressive especially without these reasoning tokens - also correlate with the findings on BigO(Bench) where updated V3 indeed scores higher than R1 on Time Complexity Generation Wonder what downstream performance we will see in R2

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Pierre Chambon

@pierrechambon6

8 months ago

🔥Very happy to introduce BigO(Bench) dataset on Hugging Face 🤗 ✨3,105 coding problems and 1,190,250 solutions from CodeContests ✨Time/Space Complexity labels and curve coefficients ✨Up to 5k Runtime/Memory Footprint measures for each solution huggingface.co/datasets/faceb…

thumb_up_off_alt17

chat_bubble_outline1

repeat5

shareShare

Pierre Chambon

@pierrechambon6

8 months ago

Great work that prevents classifiers from relying too much on spurious correlations - also helps increase fairness for medical imaging models ! ❤️

thumb_up_off_alt5

chat_bubble_outline0

repeat0

shareShare

Pierre Chambon

@pierrechambon6

7 months ago

And if you’re interested in the PhD program in Paris, amazing (and pretty much unique) opportunity to do research at scale in an industry setting (so not allowed to do one single massive notebook for each research paper unfortunately :/)

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Pierre Chambon

@pierrechambon6

7 months ago

Great paper on how to do RL for directly optimizing inference usage of your models ! If you want a model that is great for multiple sampling + majority voting, you need to choose your RL training objective adequately (better explained in the paper itself ❤️)

thumb_up_off_alt9

chat_bubble_outline0

repeat0

shareShare

Baptiste Rozière

@b_roziere

7 months ago

We released Devstral. It is a 24B model released under the Apache 2.0 license. It the best open model on SWE-Bench verified today. You can check our blog post or test it with OpenHands (from All Hands AI ) following the instructions here: huggingface.co/mistralai/Devs…

thumb_up_off_alt134

chat_bubble_outline5

repeat14

shareShare