Yegor Denisov-Blanch (@yegordb) Twitter Tweets • TwiCopy

Yegor Denisov-Blanch

@yegordb

+ Follow

Stanford | Research: Software Engineering Productivity |

8th grade dropout | ex-Olympic Weightlifting National Champion (Master of Sport)

ID: 1356371859927916550

calendar_today01-02-2021 22:40:34

469 Tweet

3,3K Followers

707 Following

Yegor Denisov-Blanch

@yegordb

7 months ago

Oh, is this how you get papers accepted at top ML conferences?

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

I'm part of a research group at Stanford and have data on the impact of AI on software engineering productivity. We will be releasing a paper soon. Spoiler: some teams see a *decrease* in productivity, while many others a pretty sizable increase

thumb_up_off_alt32

chat_bubble_outline4

repeat2

shareShare

Yegor Denisov-Blanch

@yegordb

5 months ago

Excited to be speaking in the AI Architects track of AI Engineer in June!

thumb_up_off_alt5

chat_bubble_outline2

repeat0

shareShare

Rylan Schaeffer

@rylanschaeffer

4 months ago

🚨New preprint 🚨 Turning Down the Heat: A Critical Analysis of Min-p Sampling in Language Models We examine min-p sampling (ICLR 2025 oral) & find significant problems in all 4 lines of evidence: human eval, NLP evals, LLM-as-judge evals, community adoption claims 1/8

thumb_up_off_alt285

chat_bubble_outline12

repeat35

shareShare

Jon Saad-Falcon

@jonsaadfalcon

4 months ago

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

thumb_up_off_alt204

chat_bubble_outline11

repeat56

shareShare

Rylan Schaeffer

@rylanschaeffer

4 months ago

Third #ICML2025 paper! What effect will web-scale synthetic data have on future deep generative models? Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World 🔄 Joshua Kazdan Apratim Dey Matthias Gerstgrasser Rafael Rafailov @ NeurIPS Sanmi Koyejo 1/7

thumb_up_off_alt107

chat_bubble_outline4

repeat20

shareShare

Rylan Schaeffer

@rylanschaeffer

4 months ago

New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track Joint w/ Sanmi Koyejo Joshua Kazdan Yegor Denisov-Blanch Francesco Orabona Koustuv Sinha Jessica Zosa Forde Jesse Dodge Susan Zhang Brando Miranda Matthias Gerstgrasser isha Elyas Obbad 1/6

New position paper! Machine Learning Conferences Should Establish a “Refutations and Critiques” Track

Joint w/ <a href="/sanmikoyejo/">Sanmi Koyejo</a> <a href="/JoshuaK92829/">Joshua Kazdan</a> <a href="/yegordb/">Yegor Denisov-Blanch</a> <a href="/bremen79/">Francesco Orabona</a> <a href="/koustuvsinha/">Koustuv Sinha</a> <a href="/in4dmatics/">Jessica Zosa Forde</a> <a href="/JesseDodge/">Jesse Dodge</a> <a href="/suchenzang/">Susan Zhang</a> <a href="/BrandoHablando/">Brando Miranda</a> <a href="/MGerstgrasser/">Matthias Gerstgrasser</a> <a href="/is_h_a/">isha</a> <a href="/ObbadElyas/">Elyas Obbad</a>

1/6

thumb_up_off_alt400

chat_bubble_outline12

repeat49

shareShare

METR

@metr_evals

3 months ago

We tested how autonomous AI agents perform on real software tasks from our recent developer productivity RCT. We found a gap between algorithmic scoring and real-world usability that may help explain why AI benchmarks feel disconnected from reality.

thumb_up_off_alt552

chat_bubble_outline17

repeat77

shareShare