Bary Levy (@barylevy_) 's Twitter Profile
Bary Levy

@barylevy_

ML & Security researcher

ID: 1030371715

calendar_today23-12-2012 11:09:52

966 Tweet

165 Takipçi

646 Takip Edilen

Bary Levy (@barylevy_) 's Twitter Profile Photo

In hindsight, this take was incredibly wrong. ChatGPT, Claude and Gemini improved so much over the past 6 months, that I find myself using them more and more every day. They allow me to do things at a speed I never thought would be possible.

Taelin (@victortaelin) 's Twitter Profile Photo

François Chollet This is what I like the most about you. There are intelligent people here that are completely oblivious to... things, in general. Like they're lost on the jungle. The log scale of o1 basically says it can find any function in exponential time. Same as, you know... brute force.

Keller Jordan (@kellerjordan0) 's Twitter Profile Photo

The reason I didn't write a proper arxiv paper for Muon is because I simply don't think there's any relationship between the ability to publish a paper with lots of good-looking results about a new optimizer, and whether that optimizer actually works. I only trust speedruns.

1a3orn (@1a3orn) 's Twitter Profile Photo

2025 is gonna be a speedrun of every single idea from decades of RL literature being applied to RL over chain-of-thought.

Bary Levy (@barylevy_) 's Twitter Profile Photo

Hot take: all skyscrapers look like this because having lots of daylight while you work is a good thing. Not because of some failed philosophical ideas.

Bary Levy (@barylevy_) 's Twitter Profile Photo

Reward hacking will likely be one of the most talked about problems in AI in the next few years. Most problems that interest us don't have an easily verifiable ground truth like in numeric math problems. The reward function needs to be 100% robust as the enormous optimization

Bary Levy (@barylevy_) 's Twitter Profile Photo

Argumentum ad governmentum "The government does this. Therefore, it's bad" Possibly the most destructive thought process of our time. Especially when these people take power and destroy the very foundation of public health on which modern civilization is built.

will brown (@willccbb) 's Twitter Profile Photo

we need a “nanoR1” benchmark for RL post-training experimentation. fixed set of reasoning tasks covering a few domains, set a threshold below what current reasoners can easily do but non-reasoners can’t. start with any Qwen2.5 base of your choice, see how fast you can get there

Harlan Stewart (@humanharlan) 's Twitter Profile Photo

It's concerning that Dario uses "MRI for AI" to mean cracking interpretability--MRI only reliably diagnoses structural problems like tumors, not problems like schizophrenia, psychopathy, depression, ADHD, etc. I know this sounds like a nitpick, but it's important that AI

It's concerning that Dario uses "MRI for AI" to mean cracking interpretability--MRI only reliably diagnoses structural problems like tumors, not problems like schizophrenia, psychopathy, depression, ADHD, etc.

I know this sounds like a nitpick, but it's important that AI
Bary Levy (@barylevy_) 's Twitter Profile Photo

Did anyone compare RL runs starting from different pre-training checkpoints? Could help determine whether scaling pre-training further is helpful