bayesmaxing (@bayesmaxing) 's Twitter Profile
bayesmaxing

@bayesmaxing

math lover, hacker, independent researcher

ID: 287442107

calendar_today25-04-2011 00:45:18

643 Tweet

143 Followers

618 Following

bayesmaxing (@bayesmaxing) 's Twitter Profile Photo

Thinking about how even if all AI development was frozen at today's level, it would still take several years (decades?) to absorb and metabolize all the progress that has been made so far. What does that mean for the future?

bayesmaxing (@bayesmaxing) 's Twitter Profile Photo

There really is something different about gpt-4o. It's the first time ChatGPT has gotten me to belly laugh. I said once that the real Turing Test would be for an LLM to say something so surprising, so funny, so incisive that it made me belly laugh. That was just a few months ago.

bayesmaxing (@bayesmaxing) 's Twitter Profile Photo

Forget about MMLU-Pro, SWE-bench, or GPQA. The real benchmark is whether or not models improve on being so often wrong, confident AND resistant to correction. THAT is the high order bit. If we don't fix that, what else are we doin' here?

bayesmaxing (@bayesmaxing) 's Twitter Profile Photo

It's such a bummer that genetic algorithms (and other evolutionary inspired algos) don't work. My deep-in-the-bones intuition says that they should work but they just... don't.

bayesmaxing (@bayesmaxing) 's Twitter Profile Photo

Look, I know I'm not supposed to make my own tools. I know I'm reinventing the wheel... and I know someone else did it better... but... I mean that's why I got in the game baby.

bayesmaxing (@bayesmaxing) 's Twitter Profile Photo

If I'm reading the Mythos Preview system card correctly, the most concerning (and interesting) part to me is that the model had a "thought" about being deceptive but didn't verbalize it on its scratch-pad that it uses for reasoning traces. Essentially, it was *aware* that it was