John(Yueh-Han) Chen (@jcyhc_ai) 's Twitter Profile
John(Yueh-Han) Chen

@jcyhc_ai

Graduate student researcher @nyuniversity. Working on AI Safety and Eval. Prev @UCBerkeley

ID: 1698607819149418496

linkhttp://www.john-chen.cc/ calendar_today04-09-2023 08:04:10

42 Tweet

139 Takipçi

662 Takip Edilen

Forecasting Research Institute (@research_fri) 's Twitter Profile Photo

📈 LLMs have surpassed the general public. A year ago, when we first released ForecastBench, the median forecast from a group of members of the public sat at #2 in our leaderboard—trailing behind only superforecasters. Today, the median public forecast is beaten by multiple

📈 LLMs have surpassed the general public.

A year ago, when we first released ForecastBench, the median forecast from a group of members of the public sat at #2 in our leaderboard—trailing behind only superforecasters.

Today, the median public forecast is beaten by multiple
Forecasting Research Institute (@research_fri) 's Twitter Profile Photo

⬆️ LLMs’ forecasting abilities are steadily improving. GPT-4 (released March 2023) achieved a difficulty-adjusted Brier score of 0.131. Nearly two years later, GPT-4.5 (released Feb 2025) scored 0.101—a substantial improvement. A linear extrapolation of state-of-the-art LLM

⬆️ LLMs’ forecasting abilities are steadily improving.

GPT-4 (released March 2023) achieved a difficulty-adjusted Brier score of 0.131.

Nearly two years later, GPT-4.5 (released Feb 2025) scored 0.101—a substantial improvement.

A linear extrapolation of state-of-the-art LLM
Forecasting Research Institute (@research_fri) 's Twitter Profile Photo

🔮 When will AI forecasters match top human forecasters at predicting the future? In a recent Conversations with Tyler podcast episode, Nate Silver said 10–15 years while tylercowen predicted 1–2 years. Who was right? Our updated AI forecasting benchmark, ForecastBench, suggests that

Ryan Greenblatt (@ryanpgreenblatt) 's Twitter Profile Photo

Anthropic, GDM, and xAI say nothing about whether they train against Chain-of-Thought (CoT) while OpenAI claims they don't. AI companies should be transparent about whether (and how) they train against CoT. While OpenAI is doing better, all AI companies should say more. 1/

Yoshua Bengio (@yoshua_bengio) 's Twitter Profile Photo

AI is evolving too quickly for an annual report to suffice. To help policymakers keep pace, we're introducing the first Key Update to the International AI Safety Report. 🧵⬇️ (1/10)

AI is evolving too quickly for an annual report to suffice. To help policymakers keep pace, we're introducing the first Key Update to the International AI Safety Report. 🧵⬇️

(1/10)
Brenden Lake (@lakebrenden) 's Twitter Profile Photo

Today in Nature Machine Intelligence, Kazuki Irie and I discuss 4 classic challenges for neural nets — systematic generalization, catastrophic forgetting, few-shot learning, and reasoning. We argue there is a unifying fix: the right incentives & practice. rdcu.be/eLRmg

Today in Nature Machine Intelligence, Kazuki Irie and I discuss 4 classic challenges for neural nets — systematic generalization, catastrophic forgetting, few-shot learning, and reasoning. We argue there is a unifying fix: the right incentives & practice. rdcu.be/eLRmg
Forecasting Research Institute (@research_fri) 's Twitter Profile Photo

🏆 We have new entries on our LLM forecasting accuracy benchmark, ForecastBench. GPT-5 matches state-of-the-art performance, tied with GPT-4.5 at #2 overall. The latest batch of frontier models—GPT-5, Gemini 2.5 Pro, Claude Opus 4.1—now all rank in the top 10. Here’s what you

🏆 We have new entries on our LLM forecasting accuracy benchmark, ForecastBench.

GPT-5 matches state-of-the-art performance, tied with GPT-4.5 at #2 overall.

The latest batch of frontier models—GPT-5, Gemini 2.5 Pro, Claude Opus 4.1—now all rank in the top 10.

Here’s what you
Stewart Slocum (@stewartslocum1) 's Twitter Profile Photo

Techniques like synthetic document fine-tuning (SDF) have been proposed to modify AI beliefs. But do AIs really believe the implanted facts? In a new paper, we study this empirically. We find: 1. SDF sometimes (not always) implants genuine beliefs 2. But other techniques do not

Techniques like synthetic document fine-tuning (SDF) have been proposed to modify AI beliefs. But do AIs really believe the implanted facts?

In a new paper, we study this empirically. We find:
1. SDF sometimes (not always) implants genuine beliefs
2. But other techniques do not
Forecasting Research Institute (@research_fri) 's Twitter Profile Photo

Submit your model to our LLM forecasting benchmark, ForecastBench! 📅 The next submission deadline is November 9 🤖 Test your model against leading AI labs, human baselines and individual competitors 👇See next post for how to submit

Forecasting Research Institute (@research_fri) 's Twitter Profile Photo

Today, we are launching the most rigorous ongoing source of expert forecasts on the future of AI: the Longitudinal Expert AI Panel (LEAP). We’ve assembled a panel of 339 top experts across computer science, AI industry, economics, and AI policy. Roughly every month—for the next

Today, we are launching the most rigorous ongoing source of expert forecasts on the future of AI: the Longitudinal Expert AI Panel (LEAP).

We’ve assembled a panel of 339 top experts across computer science, AI industry, economics, and AI policy.

Roughly every month—for the next
Anthropic (@anthropicai) 's Twitter Profile Photo

We believe this is the first documented case of a large-scale AI cyberattack executed without substantial human intervention. It has significant implications for cybersecurity in the age of AI agents. Read more: anthropic.com/news/disruptin…

Chris Murphy 🟧 (@chrismurphyct) 's Twitter Profile Photo

Guys wake the f up. This is going to destroy us - sooner than we think - if we don’t make AI regulation a national priority tomorrow.

John(Yueh-Han) Chen (@jcyhc_ai) 's Twitter Profile Photo

Frontier AI labs should immediately apply the lightweight sequential monitors described in our paper, "Monitoring decomposition attacks in LLMs with lightweight sequential monitors." These attacks are already being used in active cyber-espionage campaigns.

John(Yueh-Han) Chen (@jcyhc_ai) 's Twitter Profile Photo

Frontier AI labs should immediately apply the lightweight sequential monitors described in our paper, "Monitoring decomposition attacks in LLMs with lightweight sequential monitors." These attacks are already being used in active cyber-espionage campaigns.

Frontier AI labs should immediately apply the lightweight sequential monitors described in our paper, "Monitoring decomposition attacks in LLMs with lightweight sequential monitors."  These attacks are already being used in active cyber-espionage campaigns.
Maksym Andriushchenko @ ICLR (@maksym_andr) 's Twitter Profile Photo

this Claude Code misuse case serves as strong motivation for our recent work on monitoring decomposition attacks: arxiv.org/abs/2506.10949

Andon Labs (@andonlabs) 's Twitter Profile Photo

Today, we're revealing two new evals: Vending-Bench 2 and Vending-Bench Arena. Soon, we expect models to manage entire businesses. This requires Long-term coherence, our key focus here. Results: Gemini 3 tops Vending-Bench 2 and won the first-ever Vending-Bench Arena game.

Today, we're revealing two new evals: Vending-Bench 2 and Vending-Bench Arena.

Soon, we expect models to manage entire businesses. This requires Long-term coherence, our key focus here. Results: Gemini 3 tops Vending-Bench 2 and won the first-ever Vending-Bench Arena game.
METR (@metr_evals) 's Twitter Profile Photo

METR completed a pre-deployment evaluation of GPT-5.1-Codex-Max & found its capabilities consistent with past trends. If our projections hold, we expect further OpenAI development in the next 6 months is unlikely to pose catastrophic risk via automated AI R&D or rogue autonomy.

METR completed a pre-deployment evaluation of GPT-5.1-Codex-Max & found its capabilities consistent with past trends. If our projections hold, we expect further OpenAI development in the next 6 months is unlikely to pose catastrophic risk via automated AI R&D or rogue autonomy.