Rasa (@rasahusen) 's Twitter Profile
Rasa

@rasahusen

ID: 389761073

calendar_today12-10-2011 23:25:56

1,1K Tweet

29 Followers

62 Following

Graham Neubig (@gneubig) 's Twitter Profile Photo

New for May 2025! * RL on something silly makes Qwen reason well v1 * RL on something silly makes Qwen reason well v2 * RL on something silly makes Qwen reason well v3 ...

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

Kangwook Lee (@kangwook_lee) 's Twitter Profile Photo

I asked a pretty difficult research question to OpenAI's o3 about finding a particular data distribution with a specific property when optimized with SGD. Then, o3 started optimizing a random seed. Where did you learn this behavior? 😂😂

I asked a pretty difficult research question to <a href="/OpenAI/">OpenAI</a>'s o3 about finding a particular data distribution with a specific property when optimized with SGD.

Then, o3 started optimizing a random seed.

Where did you learn this behavior? 😂😂
Ethan Mollick (@emollick) 's Twitter Profile Photo

The new voice model from ElevenLabs is interesting. I put it against one of the hardest pieces for reading aloud - the final verse of Eliot's Wasteland, which uses four languages, a nursery rhyme & abrupt changes in tone. It required a few attempts to get, but this was good.

Epoch AI (@epochairesearch) 's Twitter Profile Photo

The biggest weakness was a lack of creativity and deep understanding. This is perhaps most aptly captured by a quote from one of the mathematicians:

The biggest weakness was a lack of creativity and deep understanding. This is perhaps most aptly captured by a quote from one of the mathematicians:
Riley Goodside (@goodside) 's Twitter Profile Photo

ChatGPT o3-pro names a Sabrina Carpenter song that also appears when you read only the final letters of each word in its answer:

ChatGPT o3-pro names a Sabrina Carpenter song that also appears when you read only the final letters of each word in its answer:
Owain Evans (@owainevans_uk) 's Twitter Profile Photo

Our new paper: Emergent misalignment extends to *reasoning* LLMs. Training on narrow harmful tasks causes broad misalignment. Reasoning models sometimes resist being shut down and plot deception against users in their chain-of-thought (despite no such training)🧵

Our new paper: Emergent misalignment extends to *reasoning* LLMs.
Training on narrow harmful tasks causes broad misalignment.
Reasoning models sometimes resist being shut down and plot deception against users in their chain-of-thought (despite no such training)🧵
NetBlocks (@netblocks) 's Twitter Profile Photo

⚠️ Update: #Iran has now been disconnected from the global internet for 36 hours; live metrics show national connectivity remains in the low few percent of ordinary levels with only a handful of users able to connect via multi-hop VPNs 📉

⚠️ Update: #Iran has now been disconnected from the global internet for 36 hours; live metrics show national connectivity remains in the low few percent of ordinary levels with only a handful of users able to connect via multi-hop VPNs 📉
🏴نینکاسی (@clarasadatillil) 's Twitter Profile Photo

Right when Iranians need internet most for help & news, the Islamic regime is shutting it down. This video nails why they do it and their horrific track record. Share to be Iranians voice #IRGCterorrists #Iran #IsraeliranWar #IsraelIranConflict

Umut U. Simsekli (@umutsimsekli) 's Twitter Profile Photo

Anyone from #iran looking for a phd/postdoc/research internship in statistical learning theory, deep learning theory etc, contact me. Please retweet.

Öner S. Biberkökü (@onerbiberkoku) 's Twitter Profile Photo

An official music video created entirely with Veo 3! We collaborated with Google to produce a music video for the queen of Turkish pop, affectionately known as the “Little Sparrow.” On the 50th anniversary of her artistic career, we crafted this video using only Veo 3 and

Martin Nebelong (@martinnebelong) 's Twitter Profile Photo

Veo has an incredible "hidden" ability. It's one of my absolute favorite aspects of Veo 3 i2v. It's ability to transport elements through the latent space, with this simple but very powerful prompt structure: “Instantly jump/cut on frame 1. [Describe the new context]" With

Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

New paper &amp; surprising result.
LLMs transmit traits to other models via hidden signals in data.
Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵
Jakob Bauer (@jkbr_ai) 's Twitter Profile Photo

Yesterday we announced Genie 3. One feature of the model that's especially fun to play with is starting worlds from existing videos. Here's a drone shot generated by Veo 3, with me taking control mid-flight.

Jakob Bauer (@jkbr_ai) 's Twitter Profile Photo

Something we discovered by accident: what happens if we start Genie 3 from a video and a completely unrelated prompt? Turns out the model really, really wants to make it work, to the point where it emulates itself. The prompt in this one is about a trex on a tropical island.

Epoch AI (@epochairesearch) 's Twitter Profile Photo

Frontier AI performance typically reaches consumer hardware in just 9 months. With a single gaming GPU, you can run open-weight models matching the benchmark performance of the absolute frontier from less than a year ago. 🧵

Frontier AI performance typically reaches consumer hardware in just 9 months.

With a single gaming GPU, you can run open-weight models matching the benchmark performance of the absolute frontier from less than a year ago. 🧵
Epoch AI (@epochairesearch) 's Twitter Profile Photo

One note of caution: compared to frontier models, the best-scoring open model for any given benchmark is more likely to be overfit. This means the lag in broad, real-world utility is likely longer than 9 months. As a rough approximation, gpt-oss-20B might be comparable to