Matt Mistele (@mattgmistele) Twitter Tweets • TwiCopy

Russell Kaplan

5 months ago

Friend who is a doctor told me everyone in his hospital uses ChatGPT now. Me: “Do you all use o3?” Him: “No, 4o. Isn’t it best to use the latest model? 4 vs 3?” OpenAI we really gotta fix these model names 🤦‍♂️

thumb_up_off_alt6,6K

chat_bubble_outline215

repeat170

shareShare

Andreas Kirsch 🇺🇦

@blackhc

5 months ago

I'm late to review the "Illusion of Thinking" paper, so let me collect some of the best threads by and critical takes by Lisan al Gaib in one place and sprinkle some of my own thoughts in as well. The paper is rather critical of reasoning LLMs (LRMs): x.com/MFarajtabar/st…

thumb_up_off_alt1,1K

chat_bubble_outline30

repeat229

shareShare

Justine Moore

@venturetwins

5 months ago

"The Prompt Floor" is my favorite Veo 3 video yet. And it's wild that we already have AI mockumentaries... (from u/AmadeusMS)

thumb_up_off_alt5,5K

chat_bubble_outline113

repeat621

shareShare

Jiaxin Wen @ICLR2025

@jiaxinwen22

5 months ago

New Anthropic research: We elicit capabilities from pretrained models using no external supervision, often competitive or better than using human supervision. Using this approach, we are able to train a Claude 3.5-based assistant that beats its human-supervised counterpart.

thumb_up_off_alt1,1K

chat_bubble_outline35

repeat153

shareShare

Alex Albert

@alexalbert__

5 months ago

Heard that some eng teams at big co's are now are now testing their API designs against LLMs before release. They run evals to see which API structure is easiest for the model to work with and redesign it if the model struggles to understand the format. I expect this to scale to

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat125

shareShare

Andriy Burkov

@burkov

5 months ago

Reinforcement learning in action: Gemini "thinks" for much longer at the beginning of the debugging. But as the conversation grows, its thinking becomes shorter and shorter. Close to 300k tokens, it almost doesn't think at all. This is because it doesn't matter for the model

thumb_up_off_alt323

chat_bubble_outline17

repeat23

shareShare

Simon Willison

@simonw

5 months ago

"Design Patterns for Securing LLM Agents against Prompt Injections" is an excellent new paper that provides six design patterns to help protect LLM tool-using systems (call them "agents" if you like) against prompt injection attacks

thumb_up_off_alt1,1K

chat_bubble_outline8

repeat176

shareShare

Pliny the Liberator 🐉󠅫󠄼󠄿󠅆󠄵󠄐󠅀󠄼󠄹󠄾󠅉󠅭

@elder_plinius

5 months ago

I'm not the biggest fan of how the memory (bios tool) in ChatGPT is constantly creating overly simplistic memories, almost always without user permission or a chance to review/edit. So I made a new memory that vastly improves my bios functionality! Now, ChatGPT writes far more

thumb_up_off_alt948

chat_bubble_outline36

repeat51

shareShare

Simon Willison

@simonw

5 months ago

If you use "AI agents" (LLMs that call tools) you need to be aware of the Lethal Trifecta Any time you combine access to private data with exposure to untrusted content and the ability to externally communicate an attacker can trick the system into stealing your data!

thumb_up_off_alt2,2K

chat_bubble_outline70

repeat474

shareShare

Michael Scherer

@michaelscherer

5 months ago

What a graphic. What a story.

thumb_up_off_alt2,2K

chat_bubble_outline28

repeat715

shareShare

Rohan Paul

@rohanpaul_ai

5 months ago

Tool-calling turns GPT-4.1 into a near-o1-preview without a single gradient step. No retraining, just smarter prompts for near-RL performance. 🤯 pass@1 performance on AIME2024 from 26.7% to 43.3%, bringing it very close to the performance of o1-preview. Swapping one prompt

thumb_up_off_alt398

chat_bubble_outline10

repeat50

shareShare

Essential AI

@essential_ai

5 months ago

[1/5] 🚀 Meet Essential-Web v1.0, a 24-trillion-token pre-training dataset with rich metadata built to effortlessly curate high-performing datasets across domains and use cases!

thumb_up_off_alt297

chat_bubble_outline11

repeat54

shareShare

John Loeber 🎢

@johnloeber

5 months ago

Nobody understands the true mechanics of signing bonuses, so let me spell it out: Conventionally, a signing bonus always comes with a clawback — if you leave within 12/24/etc months, you have to pay it back But! Signing bonuses are taxed, and the taxes must be paid on day one!

thumb_up_off_alt76

chat_bubble_outline2

repeat4

shareShare

Simon Willison

@simonw

5 months ago

Context rot is another reason I don't trust the new ChatGPT memory feature where it includes summarized notes from previous conversations automatically - makes it harder to truly reset the context when something rotten makes it in there

thumb_up_off_alt489

chat_bubble_outline29

repeat19

shareShare

Theo - t3.gg

@theo

5 months ago

This is the first point in the FAQ on the official MIT publication, btw

thumb_up_off_alt7,7K

chat_bubble_outline66

repeat355

shareShare

Gabriele Berton

@gabriberton

5 months ago

Few months ago researchers from Meta empirically found out that LayerNorm acts similar to a TanH, squeezing in the weights that are too high into more tractable values They tried to replace the LayerNorm with a TanH, and achieved similar results at higher speed What??? [1/8]

thumb_up_off_alt599

chat_bubble_outline14

repeat54

shareShare

Matt Mistele

@mattgmistele

5 months ago

Super cool! Latest in a series of papers showing that LLMs can in fact apply learnings, in ways we might not have expected after the reversal curse paper (subtitled “LLMs trained on ‘A is B’ fail to learn ‘B is A’”, circa GPT-3.5 and GPT-4)

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Matt Mistele

@mattgmistele

5 months ago

Do not be fooled by 3s and 4s star wars 3: 2005 tech & $113M star wars 4: 1977 tech & $11M o3: 2025 tech & $$$ 4o: 2024 tech & $$

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Jeremy Howard

@jeremyphoward

5 months ago

Study by my genius friend to CURE DEAFNESS cancelled because, I shit you not: "Research programs based primarily on artificial and non-scientific categories, including amorphous equity objectives, are antithetical to the scientific inquiry". Decel anti-science brain-dead take.

thumb_up_off_alt785

chat_bubble_outline21

repeat80

shareShare

Dave Kline

@dklineii

3 months ago

Company culture isn't hard to understand. It's the sum of each individual's behavior. - It's how the CEO interacts with their admin - It's how the veterans welcome new hires - It's how the recruiter responds to you Culture is nothing more than how we treat each other.

thumb_up_off_alt103

chat_bubble_outline6

repeat9

shareShare