New for May 2025!
* RL on something silly makes Qwen reason well v1
* RL on something silly makes Qwen reason well v2
* RL on something silly makes Qwen reason well v3
...
I asked a pretty difficult research question to OpenAI's o3 about finding a particular data distribution with a specific property when optimized with SGD.
Then, o3 started optimizing a random seed.
Where did you learn this behavior? 😂😂
The new voice model from ElevenLabs is interesting. I put it against one of the hardest pieces for reading aloud - the final verse of Eliot's Wasteland, which uses four languages, a nursery rhyme & abrupt changes in tone.
It required a few attempts to get, but this was good.
Our new paper: Emergent misalignment extends to *reasoning* LLMs.
Training on narrow harmful tasks causes broad misalignment.
Reasoning models sometimes resist being shut down and plot deception against users in their chain-of-thought (despite no such training)🧵
⚠️ Update: #Iran has now been disconnected from the global internet for 36 hours; live metrics show national connectivity remains in the low few percent of ordinary levels with only a handful of users able to connect via multi-hop VPNs 📉
Right when Iranians need internet most for help & news, the Islamic regime is shutting it down.
This video nails why they do it and their horrific track record.
Share to be Iranians voice
#IRGCterorrists
#Iran #IsraeliranWar #IsraelIranConflict
An official music video created entirely with Veo 3!
We collaborated with Google to produce a music video for the queen of Turkish pop, affectionately known as the “Little Sparrow.”
On the 50th anniversary of her artistic career, we crafted this video using only Veo 3 and
Veo has an incredible "hidden" ability.
It's one of my absolute favorite aspects of Veo 3 i2v. It's ability to transport elements through the latent space, with this simple but very powerful prompt structure:
“Instantly jump/cut on frame 1. [Describe the new context]"
With
New paper & surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Yesterday we announced Genie 3. One feature of the model that's especially fun to play with is starting worlds from existing videos. Here's a drone shot generated by Veo 3, with me taking control mid-flight.
Something we discovered by accident: what happens if we start Genie 3 from a video and a completely unrelated prompt? Turns out the model really, really wants to make it work, to the point where it emulates itself.
The prompt in this one is about a trex on a tropical island.
Frontier AI performance typically reaches consumer hardware in just 9 months.
With a single gaming GPU, you can run open-weight models matching the benchmark performance of the absolute frontier from less than a year ago. 🧵
One note of caution: compared to frontier models, the best-scoring open model for any given benchmark is more likely to be overfit. This means the lag in broad, real-world utility is likely longer than 9 months.
As a rough approximation, gpt-oss-20B might be comparable to