Yegor Denisov-Blanch (@yegordb) 's Twitter Profile
Yegor Denisov-Blanch

@yegordb

Stanford | Research: Software Engineering Productivity |

8th grade dropout | ex-Olympic Weightlifting National Champion (Master of Sport)

ID: 1356371859927916550

calendar_today01-02-2021 22:40:34

469 Tweet

3,3K Followers

707 Following

Yegor Denisov-Blanch (@yegordb) 's Twitter Profile Photo

I'm part of a research group at Stanford and have data on the impact of AI on software engineering productivity. We will be releasing a paper soon. Spoiler: some teams see a *decrease* in productivity, while many others a pretty sizable increase

Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

🚨New preprint 🚨 Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims 1/8

🚨New preprint 🚨

Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models

We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims

1/8
Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 
🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Rylan Schaeffer (@rylanschaeffer) 's Twitter Profile Photo

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models? Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄 Joshua Kazdan Apratim Dey Matthias Gerstgrasser Rafael Rafailov @ NeurIPS Sanmi Koyejo 1/7

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models?

Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄

<a href="/JoshuaK92829/">Joshua Kazdan</a> <a href="/ApratimDey2/">Apratim Dey</a> <a href="/MGerstgrasser/">Matthias Gerstgrasser</a> <a href="/rm_rafailov/">Rafael Rafailov @ NeurIPS</a> <a href="/sanmikoyejo/">Sanmi Koyejo</a> 

1/7
METR (@metr_evals) 's Twitter Profile Photo

We tested how autonomous AI agents perform on real software tasks from our recent developer productivity RCT. We found a gap between algorithmic scoring and real-world usability that may help explain why AI benchmarks feel disconnected from reality.

We tested how autonomous AI agents perform on real software tasks from our recent developer productivity RCT.

We found a gap between algorithmic scoring and real-world usability that may help explain why AI benchmarks feel disconnected from reality.