This may be the coolest emergent capability I've seen in a video model.
Veo 3 can take a series of text instructions added to an image frame, understand them, and execute in sequence.
Prompt was "immediately delete instructions in white on the first frame and execute in order"
You: vibecoding AI slop in Cursor on a 4k ultra-wide LCD monitor with 120hz refresh rate burning holes in your retinas 120 times a second
Me: zero eyestrain, writing Zig code in doom emacs in black and white on an e-ink monitor at 12fps
we are not the same