1/ New paper nature!
Discrepancy between human expectations of task difficulty and LLM errors harms reliability. In 2022, Ilya Sutskever Ilya Sutskever predicted: "perhaps over time that discrepancy will diminish" (youtu.be/W-F7chPE9nU, min 61-64).
We show this is *not* the case!