Philipp Schoenegger
@SchoeneggerPhil
Decision Scientist at the London School of Economics and Political Science, studying Large Language Models and Forecasting; PhD from St Andrews '22
ID:2451358519
http://philipp-schoenegger.weebly.com 18-04-2014 13:14:55
2,8K Tweets
2,1K Followers
1,1K Following
Bearded Miguel Philip E. Tetlock The Good Judgment Open crowd has done better than the futures, but the Superforecasters (on their closed client platform) have done even better with less volatility. Here are their forecasts for the next 3 meetings compared to the futures (the GJO question is cumulative):
This is work from my doctoral advisor’s group at CMU! The lead author, Anthony Cheng, is a researcher to keep an eye on
Forecasters at Swift Centre are much less optimistic than most projections of global coal consumption (I didn't participate in this forecast)
Interesting preprint by David Rozado, showing that base models do not tend to have political skew, but that most conversational models skew left (and that this is straightforwardly steerable as seen with some fine-tuned models).
arxiv.org/pdf/2402.01789….
Ilias Miraoui That's the flip flop effect documented in this paper arxiv.org/abs/2311.08596.
It shows that models flip their answers 46% of the time on average when asked 'Are you sure?'
As Llama 3 is working fine in French with a >95% English dataset, taking the opportunity to signal this great paper by Anton Schäfer et al.: counter-intuitively language imbalance in pre-training helps with cross-linguistic generation. arxiv.org/abs/2404.07982
Really cool preprint by Sean Trott on the wisdom of crowds and LLMs, introducing the framework of 'Number Needed To Beat' (NNTB), which captures the amount of human responses needed to achieve GPT-4 quality (studied here in a psycholinguistic context)!
osf.io/preprints/psya…