Qiyue Gao (@qiyuegao123) 's Twitter Profile
Qiyue Gao

@qiyuegao123

PhD student @UCSanDiego; Prev intern @allen_ai #AI #ML #NLP

ID: 1939706598521520128

calendar_today30-06-2025 15:24:34

15 Tweet

80 Followers

15 Following

Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation "we introduce WM-ABench, a large-scale benchmark comprising 23 fine-grained evaluation dimensions across 6 diverse simulated environments with controlled counterfactual simulations. Through 660

Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation

"we introduce WM-ABench, a large-scale benchmark comprising 23  fine-grained evaluation dimensions across 6 diverse simulated  environments with controlled counterfactual simulations. Through 660
Qiyue Gao (@qiyuegao123) 's Twitter Profile Photo

Thank you for sharing our work! 🚀 Vision-Language Models are advancing rapidly, and it's exciting to track their progress. We'll continuously update our leaderboard and datasets as new VLMs emerge. Stay tuned for more insights and results! Our website: wm-abench.maitrix.org

Eric Xing (@ericxing) 's Twitter Profile Photo

I have been long arguing that a world model is NOT about generating videos, but IS about simulating all possibilities of the world to serve as a sandbox for general-purpose reasoning via thought-experiments. This paper proposes an architecture toward that arxiv.org/abs/2507.05169

Zhiting Hu (@zhitinghu) 's Twitter Profile Photo

Some critical reviews and clarifications on different perspectives of world models. 🔥🌶️ Stay tuned for more on PAN — its position on the roadmap towards next-level intelligence, strong results, and open-sources❗️🧠

Zeming Chen (@eric_zemingchen) 's Twitter Profile Photo

🗒️Can we meta-learn test-time learning to solve long-context reasoning? Our latest work, PERK, learns to encode long contexts through gradient updates to a memory scratchpad at test time, achieving long-context reasoning robust to complexity and length extrapolation while

🗒️Can we meta-learn test-time learning to solve long-context reasoning?

Our latest work, PERK, learns to encode long contexts through gradient updates to a memory scratchpad at test time, achieving long-context reasoning robust to complexity and length extrapolation while