Jesper N. Wulff (@jesper_wulff) Twitter Tweets • TwiCopy

Deedy

6 months ago

LLMs are far worse at competitive programming than we thought. Every one scored 0% on Hard problems. LiveCodeBench-Pro is a new benchmark with 584 always updating problems from IOI, ICPC and Codeforces. What's most interesting is the categories they perform really poorly on:

thumb_up_off_alt2,2K

chat_bubble_outline80

repeat219

shareShare

Phil

@nonrealbrandon

6 months ago

Deedy It's unfair to expect LLMs to perform well on always changing benchmarks. How are they to overfit on the data if it keeps changing?

thumb_up_off_alt439

chat_bubble_outline12

repeat10

shareShare

METR

@metr_evals

5 months ago

We ran a randomized controlled trial to see how much AI coding tools speed up experienced open-source developers. The results surprised us: Developers thought they were 20% faster with AI tools, but they were actually 19% slower when they had access to AI than when they didn't.

thumb_up_off_alt5,5K

chat_bubble_outline200

repeat1,1K

shareShare

ℏεsam

@hesamation

5 months ago

"I use AI in a separate window. I don't enjoy Cursor or Windsurf, I can literally feel competence draining out of my fingers." DHH, the legendary programmer and creator of Ruby on Rails has the most beautiful and philosophical idea about what AI takes away from programmers.

thumb_up_off_alt10,10K

chat_bubble_outline272

repeat1,1K

shareShare

Arvind Narayanan

@random_walker

5 months ago

Back in grad school, when I realized how the “marketplace of ideas” actually works, it felt like I’d found the cheat codes to a research career. Today, this is the most important stuff I teach students, more than anything related to the substance of our research. A quick

thumb_up_off_alt443

chat_bubble_outline9

repeat56

shareShare

The Daily Show

@thedailyshow

4 months ago

Trump claims he couldn't have made Epstein's creepy birthday card, but allow Jon Stewart to counter with exhibits A through Z

thumb_up_off_alt8,8K

chat_bubble_outline107

repeat2,2K

shareShare

meowbooks

@untitled01ipynb

4 months ago

in case you are wondering this is academia now

thumb_up_off_alt4,4K

chat_bubble_outline43

repeat743

shareShare

Mackenzie Lockhart

@lockhartm

4 months ago

Excited that our (apoorva.lal, Yiqing Xu, Gary ziwen_Zu) paper won the Political Analysis' 2024 Editor's Choice award! It was really a lot of work (we started this in 2018!), so nice to see we've had some impact on the field. It's also open access. cambridge.org/core/journals/…

thumb_up_off_alt301

chat_bubble_outline3

repeat117

shareShare

Ben Ansell

@benwansell

4 months ago

Brutal analysis of ChatGPT5 from Gary Marcus. This was a big moment for OpenAI and so far a dud. Since US economy is largely being kept afloat by AI investment, this could be inflection point. Hold onto your hats. garymarcus.substack.com/p/gpt-5-overdu…

thumb_up_off_alt48

chat_bubble_outline4

repeat20

shareShare

Gary Marcus

@garymarcus

4 months ago

🤔 @Sama in January: “we are now confident we know how to build AGI” @Sama in August: who said anything about AGI?

thumb_up_off_alt522

chat_bubble_outline37

repeat51

shareShare

Kareem Carr, Statistics Person

@kareem_carr

4 months ago

In today's article, I explain why everything is statistics. Link in replies.

thumb_up_off_alt125

chat_bubble_outline5

repeat18

shareShare

Daniël Lakens

@lakens

4 months ago

Too often, I see people talk about a replication as if the first study has established something, and the replication study is a double-check. What people often fail to understand is that we do not do replication studies to *check* a finding, but to *establish* a finding. 1/x

thumb_up_off_alt161

chat_bubble_outline2

repeat47

shareShare

Ernest Ryu

@ernestryu

4 months ago

The proof is something an experienced PhD student could work out in a few hours. That GPT-5 can do it with just ~30 sec of human input is impressive and potentially very useful to the right user. However, GPT5 is by no means exceeding the capabilities of human experts. (9/9)

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat72

shareShare

Daniël Lakens

@lakens

3 months ago

If you are preparing your bachelor statistics course and would like to add optional material for students to better understand statistics on a conceptual level (see topics in the screenshot) my free textbook provides a state of the art overview. lakens.github.io/statistical_in…

thumb_up_off_alt297

chat_bubble_outline2

repeat60

shareShare

Gary Marcus

@garymarcus

3 months ago

GenAI models “often match patterns instead of truly reasoning” Say it to yourself over and over til you full understand it. The amount of confirmation that is coming this year for my basic view is insane.

thumb_up_off_alt537

chat_bubble_outline64

repeat88

shareShare

Gary Marcus

@garymarcus

3 months ago

One minute Matt Turck is telling me that hallucinations are “a largely fixed problem”; the next minute ChatGPT 5 is telling a friend that Trump “is not in office”. 🤔

One minute <a href="/mattturck/">Matt Turck</a> is telling me that hallucinations are “a largely fixed problem”; the next minute ChatGPT 5 is telling a friend that Trump “is not in office”.

🤔

thumb_up_off_alt170

chat_bubble_outline24

repeat23

shareShare

Will Kinney

@wkcosmo

3 months ago

thumb_up_off_alt236

chat_bubble_outline7

repeat19

shareShare

Victor

@victor_explore

3 months ago

This comprehensive guide explains how Large Language Models work from scratch - assuming you only know how to add and multiply numbers. It covers everything from simple neural networks to the full Transformer architecture, stripping away all the jargon and representing