Ben Levinstein (@ben_levinstein) 's Twitter Profile
Ben Levinstein

@ben_levinstein

Philosophy prof at Illinois interested in AI alignment, epistemology, and decision theory.

ID: 1340337260793880577

linkhttp://www.levinstein.org calendar_today19-12-2020 16:45:17

1,1K Tweet

1,1K Followers

529 Following

Seán Ó hÉigeartaigh (@s_oheigeartaigh) 's Twitter Profile Photo

It's not done yet. Hearing reports that the Nobel prize for literature will be going to the authors of "OpenAI's nonprofit governance structure" for outstanding contributions to creative fiction.

Ben Levinstein (@ben_levinstein) 's Twitter Profile Photo

Why are humans winning the prize instead of the AI solving the actual problems? AlphaFold is more deserving. This seems like some woke DEI thing.

Harry Crane (@harrydcrane) 's Twitter Profile Photo

Dissecting Inefficiency in Prediction Markets It is well known that polls are subject to statistical errors, and this error is accounted for by the margin of error. Betting markets, on the other hand, are subject to inefficiency. These inefficiencies can be accounted for by

Dissecting Inefficiency in Prediction Markets

It is well known that polls are subject to statistical errors, and this error is accounted for by the margin of error.  Betting markets, on the other hand, are subject to inefficiency.  These inefficiencies can be accounted for by
Owain Evans (@owainevans_uk) 's Twitter Profile Photo

New paper: Are LLMs capable of introspection, i.e. special access to their own inner states? Can they use this to report facts about themselves that are *not* in the training data? Yes — in simple tasks at least! This has implications for interpretability + moral status of AI 🧵

New paper:
Are LLMs capable of introspection, i.e. special access to their own inner states?
Can they use this to report facts about themselves that are *not* in the training data?
Yes — in simple tasks at least! This has implications for interpretability + moral status of AI 🧵
Ben Levinstein (@ben_levinstein) 's Twitter Profile Photo

This was pretty cool to play around with. I asked it to turn the whole world into paperclips, though, and it struggled to find anything useful from Saks Fifth Avenue.

This was pretty cool to play around with. I asked it to turn the whole world into paperclips, though, and it struggled to find anything useful from Saks Fifth Avenue.
Ben Levinstein (@ben_levinstein) 's Twitter Profile Photo

Sports are a counterexample to Kant's claim that you need to adopt the position of a disinterested observer to appreciate art.

Ben Levinstein (@ben_levinstein) 's Twitter Profile Photo

Can any AI do a good job turning hand drawn diagrams into Tikz equivalents? I want to do this in Tikz and also hate using Tikz.

Can any AI do a good job turning hand drawn diagrams into Tikz equivalents? I want to do this in Tikz and also hate using Tikz.
Ben Levinstein (@ben_levinstein) 's Twitter Profile Photo

GPT-4o seems so dumb and useless these days compared to Claude. Claude tells me to STFU multiple times a day, which stops lots of my work and hurts my feelings. I've tried switching over to GPT, but it's not the same. Do people still use 4o much for work- or coding-related tasks?

Joe Carlsmith (@jkcarlsmith) 's Twitter Profile Photo

My current take on Apollo's recent scheming paper is that they aren’t emphasizing the most interesting results, which are the sandbagging results in section 3.6 and appendix A.6 (screenshot of the key numbers below). In more particular: the paper frames its results centrally as

My current take on Apollo's recent scheming paper is that they aren’t emphasizing the most interesting results, which are the sandbagging results in section 3.6 and appendix A.6 (screenshot of the key numbers below).

In more particular: the paper frames its results centrally as
Anthropic (@anthropicai) 's Twitter Profile Photo

New Anthropic research: Alignment faking in large language models. In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.

New Anthropic research: Alignment faking in large language models.

In a series of experiments with Redwood Research, we found that Claude often pretends to have different views during training, while actually maintaining its original preferences.
Sean Carroll (@seanmcarroll) 's Twitter Profile Photo

Mindscape 301 | Tina Eliassi-Rad on Al, Networks, and Epistemic Instability. If we're all just vectors in a huge dataset, might as well turn it to our advantage. #MindscapePodcast preposterousuniverse.com/podcast/2025/0…

Mindscape 301 | Tina Eliassi-Rad on Al, Networks, and Epistemic Instability. If we're all just vectors in a huge dataset, might as well turn it to our advantage. #MindscapePodcast
preposterousuniverse.com/podcast/2025/0…