 
                                Perry Zhang
@py_z001
PhD student at UCSD CSE
ID: 1058755338889916416
https://veiled-texture-20c.notion.site/Peiyuan-Zhang-ab24b48621c9491db767a76df860873a 03-11-2018 16:18:38
160 Tweet
862 Takipçi
362 Takip Edilen
 
         
         
         
         
         
         
         
         
         
         
         
         
        Crazy fast!! Great work from Hao AI Lab
 
         
         
         
         
        Excited to share my 1st project as a Research Scientist Intern at Meta FAIR! Grateful to my mentor Jiawei Zhao for guidance, and to Yuandong Tian & Xuewei for their valuable advice and collaboration. Our work DeepConf explores local confidence for more accurate & efficient LLM reasoning!
 
         
         
         
                         
                        ![Hao AI Lab (@haoailab) on Twitter photo [Lmgame Bench] 🔥 We tested Openai’s GPT-5-thinking-high and two recent open-source models in our Lmgame Bench!
Across 26 models and 6 games (Sokoban, Tetris, 2048, Candy Crush, Mario, Ace Attorney), Here’s where they landed:
GPT-5-thinking-high → #2 [Lmgame Bench] 🔥 We tested Openai’s GPT-5-thinking-high and two recent open-source models in our Lmgame Bench!
Across 26 models and 6 games (Sokoban, Tetris, 2048, Candy Crush, Mario, Ace Attorney), Here’s where they landed:
GPT-5-thinking-high → #2](https://pbs.twimg.com/media/GyL2Kh6bQAALBD9.jpg) 
                        ![Hao AI Lab (@haoailab) on Twitter photo [Lmgame Bench]
🤔 Ever wondered how to evaluate different games in Lmgame-Bench or even add your own, but don’t know where to start? 
We’ve made it super easy to run evaluations and integrate new games. Our latest blog walks you through a few key features from Lmgame Bench [Lmgame Bench]
🤔 Ever wondered how to evaluate different games in Lmgame-Bench or even add your own, but don’t know where to start? 
We’ve made it super easy to run evaluations and integrate new games. Our latest blog walks you through a few key features from Lmgame Bench](https://pbs.twimg.com/media/Gy5fVfFboAUKbmy.jpg) 
                        ![Hao AI Lab (@haoailab) on Twitter photo [1/5] [Lmgame Bench] 🎮
Question: Can RL-based LLM post-training on games generalize to other tasks?
We shared a preliminary study to explore this question:
- Same-family (in-domain): Training on 6×6 Sokoban → 8×8 and Tetris (1 block type) → Tetris (2 block types) transfers, [1/5] [Lmgame Bench] 🎮
Question: Can RL-based LLM post-training on games generalize to other tasks?
We shared a preliminary study to explore this question:
- Same-family (in-domain): Training on 6×6 Sokoban → 8×8 and Tetris (1 block type) → Tetris (2 block types) transfers,](https://pbs.twimg.com/media/GzYhLpla4AE6hCj.jpg)