Wenqi Zhang
@spicysweet1859
Engineer, PHD for LLM Research
ID: 1668513129553354754
13-06-2023 06:58:31
70 Tweet
151 Followers
296 Following
๐Introducing ๐ฅ๐ฒ๐ฆ๐ฒ๐ฎ๐ฟ๐ฐ๐ต: Learning to Reason with Search for LLMs via Reinforcement Learning. An open-source project that combines ๐ฅ๐ and ๐ฅ๐๐ for LLMs! ๐กLike Deepseek-R1-Zero and Deep Research, we start with pretrained models and use RL to empower them with the
ๅคๆ้ซๆ กๅ้ฟ้่ๅๅบ็ไธไธชๅ ท่บซๆบ่ฝๆจกๅ๏ผEmbodied-Reasoner๏ผๅฎ้่ฟ่ง่งๆ็ดขใๆจ็ไปฅๅๆง่ก่กๅจ็ปๅ่ตทๆฅๅฎๆไบคไบๅผไปปๅก ๅฎ่ฝๆ็ฅๅนถ็่งฃ็ฏๅข๏ผ่ฟ่ฝ้่ฟๆ่ๅ่งๅๆฅๅฎๆๅคๆ็ไปปๅก๏ผๅ ถๅคๅไปปๅก่ฝๅๅผบ๏ผ่ถ ๅบGPT-4o 39.9% ๆๅ็ๆฏ OpenAI o1้ซ9.6%๏ผๆ็ดขๆ็ไธๆฏOpenAI o1้ซ12%
๐๐ถ๐ฎ๐ข๐ฏ๐ด ๐ต๐ฉ๐ช๐ฏ๐ฌ ๐ง๐ญ๐ถ๐ช๐ฅ๐ญ๐บโ๐ฏ๐ข๐ท๐ช๐จ๐ข๐ต๐ช๐ฏ๐จ ๐ข๐ฃ๐ด๐ต๐ณ๐ข๐ค๐ต ๐ค๐ฐ๐ฏ๐ค๐ฆ๐ฑ๐ต๐ด ๐ฆ๐ง๐ง๐ฐ๐ณ๐ต๐ญ๐ฆ๐ด๐ด๐ญ๐บ, ๐ง๐ณ๐ฆ๐ฆ ๐ง๐ณ๐ฐ๐ฎ ๐ณ๐ช๐จ๐ช๐ฅ ๐ญ๐ช๐ฏ๐จ๐ถ๐ช๐ด๐ต๐ช๐ค ๐ฃ๐ฐ๐ถ๐ฏ๐ฅ๐ข๐ณ๐ช๐ฆ๐ด. But current reasoning models remain constrained by discrete tokens, limiting their full