Rasa (@rasahusen) Twitter Tweets • TwiCopy

Graham Neubig

3 months ago

New for May 2025! * RL on something silly makes Qwen reason well v1 * RL on something silly makes Qwen reason well v2 * RL on something silly makes Qwen reason well v3 ...

thumb_up_off_alt340

chat_bubble_outline11

repeat22

shareShare

🚀 DeepSeek-R1-0528 is here! 🔹 Improved benchmark performance 🔹 Enhanced front-end capabilities 🔹 Reduced hallucinations 🔹 Supports JSON output & function calling ✅ Try it now: chat.deepseek.com 🔌 No change to API usage — docs here: api-docs.deepseek.com/guides/reasoni… 🔗

thumb_up_off_alt9,9K

chat_bubble_outline386

repeat1,1K

shareShare

Kangwook Lee

@kangwook_lee

3 months ago

I asked a pretty difficult research question to OpenAI's o3 about finding a particular data distribution with a specific property when optimized with SGD. Then, o3 started optimizing a random seed. Where did you learn this behavior? 😂😂

I asked a pretty difficult research question to <a href="/OpenAI/">OpenAI</a>'s o3 about finding a particular data distribution with a specific property when optimized with SGD.

Then, o3 started optimizing a random seed.

Where did you learn this behavior? 😂😂

thumb_up_off_alt28

chat_bubble_outline3

repeat5

shareShare

Ethan Mollick

@emollick

3 months ago

The new voice model from ElevenLabs is interesting. I put it against one of the hardest pieces for reading aloud - the final verse of Eliot's Wasteland, which uses four languages, a nursery rhyme & abrupt changes in tone. It required a few attempts to get, but this was good.

thumb_up_off_alt415

chat_bubble_outline22

repeat35

shareShare

Epoch AI

@epochairesearch

2 months ago

The biggest weakness was a lack of creativity and deep understanding. This is perhaps most aptly captured by a quote from one of the mathematicians:

thumb_up_off_alt128

chat_bubble_outline1

repeat15

shareShare

Riley Goodside

@goodside

2 months ago

ChatGPT o3-pro names a Sabrina Carpenter song that also appears when you read only the final letters of each word in its answer:

thumb_up_off_alt1,1K

chat_bubble_outline40

repeat50

shareShare

taoki

@justalexoki

2 months ago

ok this is fucking funny

thumb_up_off_alt7,7K

chat_bubble_outline98

repeat250

shareShare

Owain Evans

@owainevans_uk

2 months ago

Our new paper: Emergent misalignment extends to *reasoning* LLMs. Training on narrow harmful tasks causes broad misalignment. Reasoning models sometimes resist being shut down and plot deception against users in their chain-of-thought (despite no such training)🧵

thumb_up_off_alt315

chat_bubble_outline27

repeat58

shareShare

NetBlocks

@netblocks

2 months ago

⚠️ Update: #Iran has now been disconnected from the global internet for 36 hours; live metrics show national connectivity remains in the low few percent of ordinary levels with only a handful of users able to connect via multi-hop VPNs 📉

thumb_up_off_alt1,1K

chat_bubble_outline47

repeat548

shareShare

🏴نینکاسی

@clarasadatillil

2 months ago

Right when Iranians need internet most for help & news, the Islamic regime is shutting it down. This video nails why they do it and their horrific track record. Share to be Iranians voice #IRGCterorrists #Iran #IsraeliranWar #IsraelIranConflict

thumb_up_off_alt451

chat_bubble_outline38

repeat200

shareShare

Umut U. Simsekli

@umutsimsekli

2 months ago

Anyone from #iran looking for a phd/postdoc/research internship in statistical learning theory, deep learning theory etc, contact me. Please retweet.

thumb_up_off_alt92

chat_bubble_outline3

repeat51

shareShare

Öner S. Biberkökü

@onerbiberkoku

2 months ago

An official music video created entirely with Veo 3! We collaborated with Google to produce a music video for the queen of Turkish pop, affectionately known as the “Little Sparrow.” On the 50th anniversary of her artistic career, we crafted this video using only Veo 3 and

thumb_up_off_alt882

chat_bubble_outline48

repeat130

shareShare

Martin Nebelong

@martinnebelong

a month ago

Veo has an incredible "hidden" ability. It's one of my absolute favorite aspects of Veo 3 i2v. It's ability to transport elements through the latent space, with this simple but very powerful prompt structure: “Instantly jump/cut on frame 1. [Describe the new context]" With

thumb_up_off_alt1,1K

chat_bubble_outline63

repeat137

shareShare

Owain Evans

@owainevans_uk

a month ago

New paper & surprising result. LLMs transmit traits to other models via hidden signals in data. Datasets consisting only of 3-digit numbers can transmit a love for owls, or evil tendencies. 🧵

thumb_up_off_alt7,7K

chat_bubble_outline260

repeat1,1K

shareShare

Jakob Bauer

@jkbr_ai

17 days ago

Yesterday we announced Genie 3. One feature of the model that's especially fun to play with is starting worlds from existing videos. Here's a drone shot generated by Veo 3, with me taking control mid-flight.

thumb_up_off_alt1,1K

chat_bubble_outline81

repeat188

shareShare

Jakob Bauer

@jkbr_ai

16 days ago

Something we discovered by accident: what happens if we start Genie 3 from a video and a completely unrelated prompt? Turns out the model really, really wants to make it work, to the point where it emulates itself. The prompt in this one is about a trex on a tropical island.

thumb_up_off_alt4,4K

chat_bubble_outline153

repeat345

shareShare

Colin Fraser

@colin_fraser

15 days ago

Wow, I was just playing around before but it actually is stupid

thumb_up_off_alt943

chat_bubble_outline58

repeat55

shareShare

Dimitris Papailiopoulos

@dimitrispapail

15 days ago

Optimizing a model router is as hard as the Halting Problem. There I said it.

thumb_up_off_alt638

chat_bubble_outline56

repeat34

shareShare

Epoch AI

@epochairesearch

7 days ago

Frontier AI performance typically reaches consumer hardware in just 9 months. With a single gaming GPU, you can run open-weight models matching the benchmark performance of the absolute frontier from less than a year ago. 🧵

thumb_up_off_alt551

chat_bubble_outline15

repeat116

shareShare

Epoch AI

@epochairesearch

7 days ago

One note of caution: compared to frontier models, the best-scoring open model for any given benchmark is more likely to be overfit. This means the lag in broad, real-world utility is likely longer than 9 months. As a rough approximation, gpt-oss-20B might be comparable to

thumb_up_off_alt41

chat_bubble_outline2

repeat4

shareShare