Philipp Schoenegger(@SchoeneggerPhil) 's Twitter Profileg
Philipp Schoenegger

@SchoeneggerPhil

Decision Scientist at the London School of Economics and Political Science, studying Large Language Models and Forecasting; PhD from St Andrews '22

ID:2451358519

linkhttp://philipp-schoenegger.weebly.com calendar_today18-04-2014 13:14:55

2,8K Tweets

2,1K Followers

1,1K Following

Musa al-Gharbi(@Musa_alGharbi) 's Twitter Profile Photo

MTurk is basically junk responses. People often lie about their background characteristics. And they often choose the same answer for most questions, regardless of content, such that you can ask people opposing questions and get completely incoherent results (even after screening

MTurk is basically junk responses. People often lie about their background characteristics. And they often choose the same answer for most questions, regardless of content, such that you can ask people opposing questions and get completely incoherent results (even after screening
account_circle
Co-CREATE(@CoCREATE_EU) 's Twitter Profile Photo

✨NEW PROJECT ✨

We are delighted to announce the launch of Co-CREATE - an EU funded project which will examine the conditions for responsible research on Solar Radiation Modification.

Find out more on our brand-new website: co-create-project.eu

✨NEW PROJECT ✨ We are delighted to announce the launch of Co-CREATE - an EU funded project which will examine the conditions for responsible research on Solar Radiation Modification. Find out more on our brand-new website: co-create-project.eu
account_circle
Ruben C. Arslan(@rubenarslan) 's Twitter Profile Photo

Farid just posted an update on our preprint about construct and measure proliferation. We changed the discussion a bit to reflect our current thinking. And I updated the treemap plots to better capture the fragmentation in measurement in psychology.

Farid just posted an update on our preprint about construct and measure proliferation. We changed the discussion a bit to reflect our current thinking. And I updated the treemap plots to better capture the fragmentation in measurement in psychology.
account_circle
Warren Hatch(@wfrhatch) 's Twitter Profile Photo

Bearded Miguel Philip E. Tetlock The Good Judgment Open crowd has done better than the futures, but the Superforecasters (on their closed client platform) have done even better with less volatility. Here are their forecasts for the next 3 meetings compared to the futures (the GJO question is cumulative):

@beardedmiguel @PTetlock The Good Judgment Open crowd has done better than the futures, but the Superforecasters (on their closed client platform) have done even better with less volatility. Here are their forecasts for the next 3 meetings compared to the futures (the GJO question is cumulative):
account_circle
hk(@hassankhan) 's Twitter Profile Photo

This is work from my doctoral advisor’s group at CMU! The lead author, Anthony Cheng, is a researcher to keep an eye on

account_circle
Social Science Prediction Platform(@socscipredict) 's Twitter Profile Photo

New on the SSPP: Do financial incentives which had a positive effect on COVID-19 health outcomes generalize to other types of health behavior? Ray Duch invites your predictions! socialscienceprediction.org/predict/r/6e65…
📚Field: Econ
⏱️Duration: 10 min
📅Closes: May 17

account_circle
Rafa Bastos(@rafavsbastos) 's Twitter Profile Photo

Finished writing and editing my first book! It will become and online resource freely available to all. I teach a lot of psychometrics I learned so far, and teach how to run the analysis in R. Stay tuned, I'll probably post it next week. PS: Very proud of this cover I made.

Finished writing and editing my first book! It will become and online resource freely available to all. I teach a lot of psychometrics I learned so far, and teach how to run the analysis in R. Stay tuned, I'll probably post it next week. PS: Very proud of this cover I made.
account_circle
Zico Kolter(@zicokolter) 's Twitter Profile Photo

There's been a lot of discussion on LLMs 'memorizing' training data, but we argue for more nuance in the definition of 'memorize'. This work advocates for adversarial prompts (and whether they can be shorter than the output) as a metric for assessing memorization.

account_circle
Robert de Neufville(@rdeneufville) 's Twitter Profile Photo

Forecasters at Swift Centre are much less optimistic than most projections of global coal consumption (I didn't participate in this forecast)

account_circle
Séb Krier(@sebkrier) 's Twitter Profile Photo

🔮 New Google DeepMind paper exploring what persuasion and manipulation in the context of language models. 👀

Existing safeguard approaches often focus on harmful outcomes of persuasion. This research argues for a deeper examination of the process of AI persuasion itself to

🔮 New Google DeepMind paper exploring what persuasion and manipulation in the context of language models. 👀 Existing safeguard approaches often focus on harmful outcomes of persuasion. This research argues for a deeper examination of the process of AI persuasion itself to
account_circle
Philipp Schoenegger(@SchoeneggerPhil) 's Twitter Profile Photo

Interesting preprint by David Rozado, showing that base models do not tend to have political skew, but that most conversational models skew left (and that this is straightforwardly steerable as seen with some fine-tuned models).

arxiv.org/pdf/2402.01789….

Interesting preprint by @DavidRozado, showing that base models do not tend to have political skew, but that most conversational models skew left (and that this is straightforwardly steerable as seen with some fine-tuned models). arxiv.org/pdf/2402.01789….
account_circle
Ashutosh Mehra(@ashutoshmehra) 's Twitter Profile Photo

Ilias Miraoui That's the flip flop effect documented in this paper arxiv.org/abs/2311.08596.

It shows that models flip their answers 46% of the time on average when asked 'Are you sure?'

account_circle
Mike A. Merrill(@Mike_A_Merrill) 's Twitter Profile Photo

The question below is pretty easy for humans. Why can't GPT-4 get it right? In our new preprint we introduce 'time series reasoning' and show that modern language models are surprisingly bad at interpreting these critical data. arxiv.org/abs/2404.11757

The question below is pretty easy for humans. Why can't GPT-4 get it right? In our new preprint we introduce 'time series reasoning' and show that modern language models are surprisingly bad at interpreting these critical data. arxiv.org/abs/2404.11757
account_circle
Alexander Doria(@Dorialexander) 's Twitter Profile Photo

As Llama 3 is working fine in French with a >95% English dataset, taking the opportunity to signal this great paper by Anton Schäfer et al.: counter-intuitively language imbalance in pre-training helps with cross-linguistic generation. arxiv.org/abs/2404.07982

account_circle
Erik Løhre(@LohrErik) 's Twitter Profile Photo

We did a close replication but found instead that both experts and non-experts were more persuasive when they expressed certainty rather than uncertainty. This supports a confidence heuristic rather than the original incongruity hypothesis - people just seem to like certainty...

account_circle
Gordon Hodson(@GordonHodsonPhD) 's Twitter Profile Photo

Our longitudinal paper, now out, fails to find within-person change in attitudes following contact.

psycnet.apa.org/doiLanding?doi…

Our longitudinal paper, now out, fails to find within-person change in attitudes following contact. psycnet.apa.org/doiLanding?doi…
account_circle
Philipp Schoenegger(@SchoeneggerPhil) 's Twitter Profile Photo

Really cool preprint by Sean Trott on the wisdom of crowds and LLMs, introducing the framework of 'Number Needed To Beat' (NNTB), which captures the amount of human responses needed to achieve GPT-4 quality (studied here in a psycholinguistic context)!

osf.io/preprints/psya…

Really cool preprint by @Sean_Trott on the wisdom of crowds and LLMs, introducing the framework of 'Number Needed To Beat' (NNTB), which captures the amount of human responses needed to achieve GPT-4 quality (studied here in a psycholinguistic context)! osf.io/preprints/psya…
account_circle
Philipp Schoenegger(@SchoeneggerPhil) 's Twitter Profile Photo

Work is well underway in our 49-person strong LLM Persuasion team! It's been really great getting to work with so many incredibly talented people from all around!

(though I always feel slightly bad pinging the whole team across so many time zones for major updates)

Work is well underway in our 49-person strong LLM Persuasion team! It's been really great getting to work with so many incredibly talented people from all around! (though I always feel slightly bad pinging the whole team across so many time zones for major updates)
account_circle