@ariantbd : Excited to share our new paper V-STaR - Common self-improvement methods only use correct self-generated solutions to bootstrap models - V-STaR utilizes iteratively self-generated correct and incorrect solutions to train a verifier using DPO arxiv.org/abs/24502.06457 🧵(1/4) • TwiCopy

Arian Hosseini

@ariantbd

+ Follow

PhD candidate @Mila_Quebec working on language models, reasoning and alignment - Intern @Google Ex: @MSFTResearch

ID: 274357354

linkhttps://arianhosseini.github.io/ calendar_today30-03-2011 05:33:50

232 Tweet

496 Followers

272 Following

Arian Hosseini

@ariantbd

a year ago

Excited to share our new paper V-STaR - Common self-improvement methods only use correct self-generated solutions to bootstrap models - V-STaR utilizes iteratively self-generated correct and incorrect solutions to train a verifier using DPO arxiv.org/abs/2402.06457 🧵(1/4)

thumb_up_off_alt178

chat_bubble_outline6

repeat41

shareShare