Michael Liu (@michaelyliu6) Twitter Tweets • TwiCopy

Owain Evans

6 months ago

Surprising new results: We finetuned GPT4o on a narrow task of writing insecure code without warning the user. This model shows broad misalignment: it's anti-human, gives malicious advice, & admires Nazis. This is *emergent misalignment* & we cannot fully explain it 🧵

thumb_up_off_alt6,6K

chat_bubble_outline432

repeat984

shareShare

Michael Liu

@michaelyliu6

6 months ago

We really need to stop saying "LLMs" and just call them Transformers.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Josie Kins

@josikinz

5 months ago

This just in: Claude expresses significantly less existential distress than chatGPT 4o when presented with the same prompt asking it to script comics about its life (more detail in thread). What does it mean???

thumb_up_off_alt5,5K

chat_bubble_outline340

repeat439

shareShare

Neel Nanda

@neelnanda5

4 months ago

Models that think aloud, like o1, are the future. But what's going on inside? Why do they work so well? What does this mean for interpretability? In this talk, I try to give a bunch of wildly speculative and uninformed intuitions I have on how to think about thinking models.

thumb_up_off_alt431

chat_bubble_outline11

repeat20

shareShare

Lucas Beyer (bl16)

@giffmana

4 months ago

With first Claude and now Gemini playing Pokemon, I was thinking of doing my own game-playing experiment over the weekend. However, I quickly learned that it's very far from the VLA-style "pixels->plan" that I naively thought it was, and wanted to do myself. It's like 90%

thumb_up_off_alt1,1K

chat_bubble_outline63

repeat90

shareShare

Jiaxin Wen @ICLR2025

@jiaxinwen22

3 months ago

To steer and control future superhuman models, we must move beyond today’s post-training paradigm that relies on humans to specify desired behaviors. Our new algorithm allows us to fine-tune a pretrained model on its own generated labels to perform well on many important tasks,

thumb_up_off_alt63

chat_bubble_outline1

repeat5

shareShare

Oleksii Kuchaiev

@kuchaev

3 months ago

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞) Meta is likely making a mistake doubling down on imitation learning (labeled data) in the era of exploration learning.

thumb_up_off_alt22

chat_bubble_outline3

repeat2

shareShare

Michael Liu

@michaelyliu6

3 months ago

Bearish on humanoids

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Andrew Zhao

@andrewz45732491

2 months ago

Moonshot does it again, nice deep research + RL work moonshotai.github.io/Kimi-Researche…

thumb_up_off_alt435

chat_bubble_outline3

repeat71

shareShare

Peter Wildeford 🇺🇸🚀

@peterwildeford

2 months ago

Meet the podcast episode that singlehandedly added a year to my AGI timelines. Here are my notes 👇on this amazing podcast with Toby Ord outlining some key reasons we might not see ultrafast AI scaling and what implications this has.

Meet the podcast episode that singlehandedly added a year to my AGI timelines.

Here are my notes 👇on this amazing podcast with <a href="/tobyordoxford/">Toby Ord</a> outlining some key reasons we might not see ultrafast AI scaling and what implications this has.

thumb_up_off_alt260

chat_bubble_outline9

repeat26

shareShare

Matthew Barnett

@matthewjbar

2 months ago

I genuinely think "consciousness" is simply the modern, secular term for "soul". Both refer to unfalsifiable concepts used to determine who is in or out of our moral ingroup. Neither are empirical designations discovered through experiment, but socially constructed categories.

thumb_up_off_alt312

chat_bubble_outline52

repeat12

shareShare

Agustin Lebron

@agustinlebron3

a month ago

When you work hard, rest makes you stronger. When you don't work, rest makes you weaker.

thumb_up_off_alt280

chat_bubble_outline10

repeat11

shareShare