Shitian Zhao (@zst96687522) 's Twitter Profile
Shitian Zhao

@zst96687522

Looking for CS PhD position in 2025Fall.
Researcher @ Shanghai AI Lab @opengvlab
Bachelor @ ECNU @ECNUER
Previous Intern @ CCVL @JohnsHopkins

ID: 1381990856043896832

linkhttps://zhaoshitian.github.io/ calendar_today13-04-2021 15:21:19

441 Tweet

485 Takipçi

2,2K Takip Edilen

張小珺 Xiaojùn (@zhang_benita) 's Twitter Profile Photo

这集关于多模态和下一个“GPT-4时刻”,讨论了一个有意思的问题:随着模型规模扩大,对话能力、知识量和情商都在变强;但推理能力(尤其数学)表现是先上升后平缓,再扩大反而是下降。更大的模型做数学题倾向于跳步,不老实——这可能是next token prediction的本质缺陷🗺️ xiaoyuzhoufm.com/episode/683d2c…

Cline (@cline) 's Twitter Profile Photo

Kimi K2 just hit 65.8% on SWE-bench. That's higher than GPT-4.1 (54.6%). And it's open source. You can use it right now in Cline. 🧵

Kimi K2 just hit 65.8% on SWE-bench. That's higher than GPT-4.1 (54.6%). And it's open source. You can use it right now in Cline. 🧵
Mira Murati (@miramurati) 's Twitter Profile Photo

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're

马东锡 NLP 🇸🇪 (@dongxi_nlp) 's Twitter Profile Photo

「 Data Contamination,Qwen2.5 」 Qwen2.5 系列的 Data Contamination 问题被证实,模型在预训练阶段已经见过评测题目。 前几个月,数篇 LLM Reasoning + RL 的论文发现,用极弱或随机奖励即可显著提升 Qwen 系列数学推理能力。 这引发出 Qwen 模型在 pretraining 阶段已经见过评测题目的疑问。

「 Data Contamination,Qwen2.5 」

Qwen2.5 系列的 Data Contamination 问题被证实,模型在预训练阶段已经见过评测题目。

前几个月,数篇 LLM Reasoning + RL 的论文发现,用极弱或随机奖励即可显著提升 Qwen 系列数学推理能力。

这引发出 Qwen 模型在 pretraining 阶段已经见过评测题目的疑问。
Xinyu Zhou (@zxytim) 's Twitter Profile Photo

Any vertical agents are fossil fuel (a.k.a., data) generators for the general agents. Verticle agents will vanish in the long run. Only general agents will prevail. This is just getting started.

張小珺 Xiaojùn (@zhang_benita) 's Twitter Profile Photo

Kimi K2 is hot. This is my interview from last year with Kimi's founder, Yang Zhilin. Back then, many people said they really liked the title — 'Towards the Endless and Unknown Snow Mountains.' Hope it inspires you as well. Chinese version & podcast link at the end:)

Sedrick Keh (@sedrickkeh2) 's Twitter Profile Photo

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀

OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.
Alexander Wei (@alexwei_) 's Twitter Profile Photo

1/N I’m excited to share that our latest OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

1/N I’m excited to share that our latest <a href="/OpenAI/">OpenAI</a> experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).
will brown (@willccbb) 's Twitter Profile Photo

i’m much more inclined to say that the RL *system* inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025 (@mti_neurips) 's Twitter Profile Photo

🚀 Call for Papers — NeurIPS Conference 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,

🚀 Call for Papers — <a href="/NeurIPSConf/">NeurIPS Conference</a> 2025 Workshop
 Multi-Turn Interactions in LLMs
 📅 December 6/7 · 📍 San Diego Convention Center

Join us to shape the future of interactive AI. Topics include but are not limited to:
🧠 Multi-Turn RL for Agentic Tasks (e.g., web &amp; GUI agents,
Thang Luong (@lmthang) 's Twitter Profile Photo

This year was a major paradigm shift, where we can solve problems end to end in natural language. With novel reinforcement learning techniques, we are able to train an advanced Gemini model on multi-step reasoning proof data, which advances the model's capabilities in terms of

This year was a major paradigm shift, where we can solve problems end to end in natural language. With novel reinforcement learning techniques, we are able to train an advanced Gemini model on multi-step reasoning proof data, which advances the model's capabilities in terms of
Thomas Ahle (@thomasahle) 's Twitter Profile Photo

Days after the Deepmind/OpenAI results, the Kimi K2 paper already details how to do RL with non-verifiable rewards! So much impressive stuff in this paper

Days after the Deepmind/OpenAI results, the Kimi K2 paper already details how to do RL with non-verifiable rewards!

So much impressive stuff in this paper
Lin Yang (@lyang36) 's Twitter Profile Photo

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

Shitian Zhao (@zst96687522) 's Twitter Profile Photo

"We design a general reinforcement learning framework that combines verifiable rewards (RLVR) with a self critique rubric reward mechanism. The model learns not only from externally defined tasks but also from evaluating its own outputs." Interesting🤔

Shitian Zhao (@zst96687522) 's Twitter Profile Photo

Three stages of cat-titude: playful derp, yarn entanglement, and existential teacup crisis. Which stage are you in today? #CatLife #CuteCats #FelineFun #PetHumor #KittenLove

Three stages of cat-titude: playful derp, yarn entanglement, and existential teacup crisis. Which stage are you in today?

#CatLife #CuteCats #FelineFun #PetHumor #KittenLove
InternLM (@intern_lm) 's Twitter Profile Photo

Our paper won an outstanding paper on ACL 2025. Try our best open-source multimodal reasoning model Intern-S1 at huggingface.co/internlm/Inter…. This 241B MoE model combines strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks,

Our paper won an outstanding paper on ACL 2025.

Try our best open-source multimodal reasoning model Intern-S1 at huggingface.co/internlm/Inter….

This 241B MoE model combines strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks,