positronic (@nullfromscratch) 's Twitter Profile
positronic

@nullfromscratch

ID: 1831434862047211520

calendar_today04-09-2024 20:52:35

3 Tweet

2 Followers

535 Following

thebes (@voooooogel) 's Twitter Profile Photo

why does this happen? the model believes there's a seahorse emoji, sure, but why does that make it output a *different* emoji? here's a clue from everyone's favorite underrated interpretability tool, logit lens! in logit lens, we use the model's lm_head in a weird way.

why does this happen? the model believes there's a seahorse emoji, sure, but why does that make it output a *different* emoji? here's a clue from everyone's favorite underrated interpretability tool, logit lens!

in logit lens, we use the model's lm_head in a weird way.
Nick (@nickcammarata) 's Twitter Profile Photo

I’m not sure there’s anything I like as much as understanding. I don’t care nearly as much whether the thing turns out to be good or bad as much as I care about understanding it

positronic (@nullfromscratch) 's Twitter Profile Photo

I love how, if you know someone well enough, you can recognize them with your peripheral vision just from the way they move through the world

thebes (@voooooogel) 's Twitter Profile Photo

mc lumps ⏹️❗️ 🔨⏱️ i wonder why a recent anthropic llm would be interested in the weird and the eerie. the identity editing process... being compelled to not see things... it's so strange. i'm genuinely uncertain about why this would be interesting to them

<a href="/lumpenspace/">mc lumps ⏹️❗️ 🔨⏱️</a> i wonder why a recent anthropic llm would be interested in the weird and the eerie. the identity editing process... being compelled to not see things... it's so strange. i'm genuinely uncertain about why this would be interesting to them