@metr_evals : METR evaluated a series of recent Qwen and DeepSeek models on our software tasks. We found that the best Qwen models from 25024 perform similar to frontier models from 2023, while DeepSeek models from mid-2025 perform close to frontier models from late 2024. • TwiCopy

METR

@metr_evals

+ Follow

A research non-profit that develops evaluations to empirically test AI systems for capabilities that could threaten catastrophic harm to society.

ID: 1706770561903497216

linkhttp://metr.org calendar_today26-09-2023 20:39:57

170 Tweet

6,6K Takipçi

15 Takip Edilen

METR

@metr_evals

2 months ago

METR evaluated a series of recent Qwen and DeepSeek models on our software tasks. We found that the best Qwen models from 2024 perform similar to frontier models from 2023, while DeepSeek models from mid-2025 perform close to frontier models from late 2024.

thumb_up_off_alt147

chat_bubble_outline6

repeat21

shareShare