Michael Liu (@michaelyliu6) 's Twitter Profile
Michael Liu

@michaelyliu6

why am i here

ID: 2162689659

calendar_today01-11-2013 02:19:29

173 Tweet

23 Takipçi

587 Takip Edilen

Owain Evans (@owainevans_uk) 's Twitter Profile Photo

Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. This is *emergent misalignment* & we cannot fully explain it 🧵

Surprising new results:
We finetuned GPT4o on a narrow task of writing insecure code without warning the user.
This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis.

This is *emergent misalignment* & we cannot fully explain it 🧵
Josie Kins (@josikinz) 's Twitter Profile Photo

This just in: Claude expresses significantly less existential distress than chatGPT 4o when presented with the same prompt asking it to script comics about its life (more detail in thread). What does it mean???

This just in: Claude expresses significantly less existential distress than chatGPT 4o when presented with the same prompt asking it to script comics about its life (more detail in thread).
What does it mean???
Neel Nanda (@neelnanda5) 's Twitter Profile Photo

Models that think aloud, like o1, are the future. But what's going on inside? Why do they work so well? What does this mean for interpretability? In this talk, I try to give a bunch of wildly speculative and uninformed intuitions I have on how to think about thinking models.

Models that think aloud, like o1, are the future.

But what's going on inside? Why do they work so well? What does this mean for interpretability?

In this talk, I try to give a bunch of wildly speculative and uninformed intuitions I have on how to think about thinking models.
Lucas Beyer (bl16) (@giffmana) 's Twitter Profile Photo

With first Claude and now Gemini playing Pokemon, I was thinking of doing my own game-playing experiment over the weekend. However, I quickly learned that it's very far from the VLA-style "pixels->plan" that I naively thought it was, and wanted to do myself. It's like 90%

With first Claude and now Gemini playing Pokemon, I was thinking of doing my own game-playing experiment over the weekend.

However, I quickly learned that it's very far from the VLA-style "pixels->plan" that I naively thought it was, and wanted to do myself.

It's like 90%
Jiaxin Wen @ICLR2025 (@jiaxinwen22) 's Twitter Profile Photo

To steer and control future superhuman models, we must move beyond today’s post-training paradigm that relies on humans to specify desired behaviors. Our new algorithm allows us to fine-tune a pretrained model on its own generated labels to perform well on many important tasks,

Peter Wildeford 🇺🇸🚀 (@peterwildeford) 's Twitter Profile Photo

Meet the podcast episode that singlehandedly added a year to my AGI timelines. Here are my notes 👇on this amazing podcast with Toby Ord outlining some key reasons we might not see ultrafast AI scaling and what implications this has.

Meet the podcast episode that singlehandedly added a year to my AGI timelines.

Here are my notes 👇on this amazing podcast with <a href="/tobyordoxford/">Toby Ord</a> outlining some key reasons we might not see ultrafast AI scaling and what implications this has.
Matthew Barnett (@matthewjbar) 's Twitter Profile Photo

I genuinely think "consciousness" is simply the modern, secular term for "soul". Both refer to unfalsifiable concepts used to determine who is in or out of our moral ingroup. Neither are empirical designations discovered through experiment, but socially constructed categories.