Jean Kaddour (@jeankaddour) Twitter Tweets • TwiCopy

Robert Kirk

6 months ago

Very cool work I've had the pleasure of advising Yi Xu to perform, investigating non-transitive preferences in LLM-judges, showing how that can lead to inconsistent rankings of models, and demonstrating how to fix this while maintaining computational efficiency!

thumb_up_off_alt19

chat_bubble_outline1

repeat1

shareShare

Max Bartolo

@max_nlp

5 months ago

I really enjoyed my Machine Learning Street Talk chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth

I really enjoyed my <a href="/MLStreetTalk/">Machine Learning Street Talk</a> chat with Tim at #NeurIPS2024 about some of the research we've been doing on reasoning, robustness and human feedback. If you have an hour to spare and are interested in some semi-coherent thoughts revolving around AI robustness, it may be worth

thumb_up_off_alt67

chat_bubble_outline3

repeat18

shareShare

Antonin Schrab

@antoninschrab

5 months ago

🎓PhD in Foundational AI done☑️ UCL Centre for Artificial Intelligence Gatsby Computational Neuroscience Unit Huge thanks to my supervisors Benjamin Guedj Arthur Gretton & to all collaborators! Check out my article & summary table unifying all my PhD works together! A Unified View of Optimal Kernel Hypothesis Testing arxiv.org/abs/2503.07084

🎓PhD in Foundational AI done☑️
<a href="/ai_ucl/">UCL Centre for Artificial Intelligence</a> <a href="/GatsbyUCL/">Gatsby Computational Neuroscience Unit</a>

Huge thanks to my supervisors <a href="/bguedj/">Benjamin Guedj</a> <a href="/ArthurGretton/">Arthur Gretton</a> & to all collaborators!

Check out my article & summary table unifying all my PhD works together!

A Unified View of Optimal Kernel Hypothesis Testing
arxiv.org/abs/2503.07084

thumb_up_off_alt83

chat_bubble_outline5

repeat10

shareShare

Andreas Köpf

@neurosp1ke

5 months ago

Finally r1 completed thinking … and wow, not bad.

thumb_up_off_alt238

chat_bubble_outline7

repeat21

shareShare

hardmaru

@hardmaru

5 months ago

Excited to release our technical report: “The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search”‼️ pub.sakana.ai/ai-scientist-v… The AI Scientist-v2 incorporates an “Agentic Tree Search” approach into the workflow, enabling deeper and more

thumb_up_off_alt306

chat_bubble_outline13

repeat63

shareShare

Terry Yue Zhuo

@terryyuezhuo

5 months ago

DeepCoder-14B on BigCodeBench-Hard Prefilling w/o Reasoning (Ranked 81th/195) 22.3% Complete 18.2% Instruct 20.3% on Average No Prefilling, w/ Reasoning (Ranked 87th/195) 22.3% Complete 16.9% Instruct 19.6% on Average o1 (reasoning=high) & o3 (reasoning=medium) -- 35.5% on

thumb_up_off_alt38

chat_bubble_outline5

repeat8

shareShare

Jude Wells

@_judewells

5 months ago

I really like this ProGen3 paper because, contrary to the title, I think it actually shows there is relatively little to be gained from massively scaling protein language models. 1/n

thumb_up_off_alt301

chat_bubble_outline3

repeat50

shareShare

Max Deichmann

@maxdeichmann

4 months ago

I think langfuse.com is a tech company now. Hassieb Pakzad just became a 10x engineer.

I think <a href="/langfuse/">langfuse.com</a> is a tech company now. <a href="/hassiebpakzad/">Hassieb Pakzad</a> just became a 10x engineer.

thumb_up_off_alt19

chat_bubble_outline5

repeat2

shareShare

Aryo Pradipta Gema

@aryopg

4 months ago

MMLU-Redux just touched down at #NAACL2025! 🎉 Wish I could be there for our "Are We Done with MMLU?" poster today (9:00-10:30am in Hall 3, Poster Session 7), but visa drama said nope 😅 If anyone's swinging by, give our research some love! Hit me up if you check it out! 👋

thumb_up_off_alt55

chat_bubble_outline0

repeat13

shareShare

Arthur Douillard

@ar_douillard

3 months ago

REASONING GYM: Reasoning Environments for Reinforcement Learning with Verifiable Rewards arxiv.org/abs/2505.24760

thumb_up_off_alt233

chat_bubble_outline6

repeat27

shareShare

Oliver Stanley

@_oliverstanley

3 months ago

Introducing Reasoning Gym: Over 100 procedurally generated reasoning environments for evaluation and RLVR of language models. Generate virtually infinite training or evaluation data with fine-grained difficulty control and automatic verifiers. 🧵 1/

thumb_up_off_alt271

chat_bubble_outline3

repeat45

shareShare

Shizhe Diao

@shizhediao

3 months ago

Does RL truly expand a model’s reasoning🧠capabilities? Contrary to recent claims, the answer is yes—if you push RL training long enough! Introducing ProRL 😎, a novel training recipe that scales RL to >2k steps, empowering the world’s leading 1.5B reasoning model💥and offering

thumb_up_off_alt382

chat_bubble_outline17

repeat64

shareShare

Zafir Stojanovski

@zafstojano

3 months ago

Super excited to share 💪🧠Reasoning Gym! 🧵 We provide over 100 data generators and verifiers spanning several domains (algebra, arithmetic, code, geometry, logic, games) for training the next generation of reasoning models. In essence, we can generate an infinite amount of

thumb_up_off_alt137

chat_bubble_outline2

repeat22

shareShare

Vram Altman

@vramaltman

3 months ago

Good luck with your computer using agents guys!

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Niccolò Ajroldi

@n_ajroldi

3 months ago

New ICML paper! 🎉⚡️ Averaging checkpoints is a well-known method to accelerate training and improve performance of ML models. Can we see these benefits on tasks from a structured and diverse benchmark for optimization algorithms such as AlgoPerf? mlcommons.org/benchmarks/alg…

thumb_up_off_alt32

chat_bubble_outline2

repeat6

shareShare

Robert Lange

@roberttlange

3 months ago

Text-to-LoRA: What if you no longer had to fine-tune your LLM for every single downstream task? 🚀 Stoked to share our work on instant LLM adaptation using meta-learned hypernetworks 📝 → 🔥 The idea is simple yet elegant: We text-condition a hypernetwork to output LoRA

thumb_up_off_alt384

chat_bubble_outline7

repeat62

shareShare

Robert Lange

@roberttlange

3 months ago

thumb_up_off_alt14

chat_bubble_outline1

repeat2

shareShare

Vram Altman

@vramaltman

3 months ago

Me describing the new feature to Cursor

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare

Reuben Adams

@reubenjadams

3 months ago

Thread on Apple paper and LLM reasoning claims in general. LLMs often fall back to baseline patterns when they can’t reason things through, making up plausible answers when they don’t know, or providing generic but incorrect coding solutions when the situation is complex.

thumb_up_off_alt5

chat_bubble_outline1

repeat2

shareShare

Vram Altman

@vramaltman

3 months ago

ThePrimeagen every time i try a non-IDE, CLI-only coding tool

thumb_up_off_alt4

chat_bubble_outline1

repeat2

shareShare