@gabrielsarch : Why does vanilla RL fail here? RL can only amplify base behaviors: - Pretrained VLMs bias toward abstract scene references not region analysis - Accuracy-only rewards reinforce this We argue: grounding each thought shifts models toward iterative, perceptually guided reasoning. • TwiCopy

Gabriel Sarch

@gabrielsarch

+ Follow

Ph.D. Candidate at Carnegie Mellon University @mldcmu @cmuneurosci. Prev. @yutori_ai @MSFTResearch.

ID: 1485666765178904576

linkhttp://www.gabesarch.me calendar_today24-01-2022 17:32:20

159 Tweet

497 Takipçi

668 Takip Edilen

Gabriel Sarch

@gabrielsarch

3 months ago

Why does vanilla RL fail here? RL can only amplify base behaviors: - Pretrained VLMs bias toward abstract scene references not region analysis - Accuracy-only rewards reinforce this We argue: grounding each thought shifts models toward iterative, perceptually guided reasoning.

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare