METR (@metr_evals) 's Twitter Profile
METR

@metr_evals

A research non-profit that develops evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.

ID: 1706770561903497216

linkhttp://metr.org calendar_today26-09-2023 20:39:57

170 Tweet

6,6K Takipçi

15 Takip Edilen

METR (@metr_evals) 's Twitter Profile Photo

METR evaluated a series of recent Qwen and DeepSeek models on our software tasks. We found that the best Qwen models from 2024 perform similar to frontier models from 2023, while DeepSeek models from mid-2025 perform close to frontier models from late 2024.

METR evaluated a series of recent Qwen and DeepSeek models on our software tasks. We found that the best Qwen models from 2024 perform similar to frontier models from 2023, while DeepSeek models from mid-2025 perform close to frontier models from late 2024.