Zoro (@younghoax20) 's Twitter Profile
Zoro

@younghoax20

#AI

ID: 1151621377

calendar_today05-02-2013 18:15:04

440 Tweet

479 Takipçi

7,7K Takip Edilen

steve hsu (@hsu_steve) 's Twitter Profile Photo

Is Chain-of-Thought Reasoning of LLMs a Mirage? ... Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing

Is Chain-of-Thought Reasoning of LLMs a Mirage?

... Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing
tetsuo.ai 💹🧲 (@7etsuo) 's Twitter Profile Photo

xAI Grok Updates (Last 24H): - Grok 4 boosts PDF handling for massive files! - iOS app v1.1.40 improves sound & Imagine. - Kids Mode hits Android soon. - longer vids in Imagine. - 44M images created, app #2 in Productivity! - Art Contest: Most-liked pics in X feed. - Tesla

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Fantastic paper from AI at Meta Reasoning LLMs hallucinate more on long answers, and the authors show why and fix it with a new reward that balances accuracy, detail, and relevance. Their online RL recipe cuts hallucinations by 23.1 points, raises factual detail by 23%, and

Fantastic paper from <a href="/AIatMeta/">AI at Meta</a> 

Reasoning LLMs hallucinate more on long answers, and the authors show why and fix it with a new reward that balances accuracy, detail, and relevance. 

Their online RL recipe cuts hallucinations by 23.1 points, raises factual detail by 23%, and
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Large language models often sound sure even when they are wrong. This paper teaches a model to treat its own confidence as a training reward, which tightens calibration and improves reasoning without any human labels. Here is how it works. The model generates several chain of

Large language models often sound sure even when they are wrong.

This paper teaches a model to treat its own confidence as a training reward, which tightens calibration and improves reasoning without any human labels.

Here is how it works.

The model generates several chain of