Shitian Zhao (@zst96687522) Twitter Tweets • TwiCopy

張小珺 Xiaojùn

2 months ago

这集关于多模态和下一个“GPT-4时刻”，讨论了一个有意思的问题：随着模型规模扩大，对话能力、知识量和情商都在变强；但推理能力（尤其数学）表现是先上升后平缓，再扩大反而是下降。更大的模型做数学题倾向于跳步，不老实——这可能是next token prediction的本质缺陷🗺️ xiaoyuzhoufm.com/episode/683d2c…

thumb_up_off_alt105

chat_bubble_outline2

repeat8

shareShare

Cline

@cline

2 months ago

Kimi K2 just hit 65.8% on SWE-bench. That's higher than GPT-4.1 (54.6%). And it's open source. You can use it right now in Cline. 🧵

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat110

shareShare

Shitian Zhao

@zst96687522

2 months ago

Need to learn infra.

thumb_up_off_alt3

chat_bubble_outline1

repeat0

shareShare

Thinking Machines Lab exists to empower humanity through advancing collaborative general intelligence. We're building multimodal AI that works with how you naturally interact with the world - through conversation, through sight, through the messy way we collaborate. We're

thumb_up_off_alt4,4K

chat_bubble_outline331

repeat373

shareShare

马东锡 NLP 🇸🇪

@dongxi_nlp

2 months ago

「 Data Contamination，Qwen2.5 」 Qwen2.5 系列的 Data Contamination 问题被证实，模型在预训练阶段已经见过评测题目。前几个月，数篇 LLM Reasoning + RL 的论文发现，用极弱或随机奖励即可显著提升 Qwen 系列数学推理能力。这引发出 Qwen 模型在 pretraining 阶段已经见过评测题目的疑问。

thumb_up_off_alt158

chat_bubble_outline11

repeat26

shareShare

Xinyu Zhou

@zxytim

2 months ago

Any vertical agents are fossil fuel (a.k.a., data) generators for the general agents. Verticle agents will vanish in the long run. Only general agents will prevail. This is just getting started.

thumb_up_off_alt43

chat_bubble_outline5

repeat5

shareShare

張小珺 Xiaojùn

@zhang_benita

2 months ago

Kimi K2 is hot. This is my interview from last year with Kimi's founder, Yang Zhilin. Back then, many people said they really liked the title — 'Towards the Endless and Unknown Snow Mountains.' Hope it inspires you as well. Chinese version & podcast link at the end:)

thumb_up_off_alt29

chat_bubble_outline2

repeat3

shareShare

Sedrick Keh

@sedrickkeh2

2 months ago

📢📢📢 Releasing OpenThinker3-1.5B, the top-performing SFT-only model at the 1B scale! 🚀 OpenThinker3-1.5B is a smaller version of our previous 7B model, trained on the same OpenThoughts3-1.2M dataset.

thumb_up_off_alt110

chat_bubble_outline1

repeat30

shareShare

Alexander Wei

@alexwei_

a month ago

1/N I’m excited to share that our latest OpenAI experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

1/N I’m excited to share that our latest <a href="/OpenAI/">OpenAI</a> experimental reasoning LLM has achieved a longstanding grand challenge in AI: gold medal-level performance on the world’s most prestigious math competition—the International Math Olympiad (IMO).

thumb_up_off_alt6,6K

chat_bubble_outline361

repeat1,1K

shareShare

Shitian Zhao

@zst96687522

a month ago

It's coming.🥹🥹

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

will brown

@willccbb

a month ago

i’m much more inclined to say that the RL *system* inside OpenAI is AGI rather than than any fixed model checkpoint which comes out of it

thumb_up_off_alt832

chat_bubble_outline31

repeat51

shareShare

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025

@mti_neurips

a month ago

🚀 Call for Papers — NeurIPS Conference 2025 Workshop Multi-Turn Interactions in LLMs 📅 December 6/7 · 📍 San Diego Convention Center Join us to shape the future of interactive AI. Topics include but are not limited to: 🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,

🚀 Call for Papers — <a href="/NeurIPSConf/">NeurIPS Conference</a> 2025 Workshop
Multi-Turn Interactions in LLMs
📅 December 6/7 · 📍 San Diego Convention Center

Join us to shape the future of interactive AI. Topics include but are not limited to:
🧠 Multi-Turn RL for Agentic Tasks (e.g., web & GUI agents,

thumb_up_off_alt102

chat_bubble_outline2

repeat25

shareShare

Thang Luong

@lmthang

a month ago

This year was a major paradigm shift, where we can solve problems end to end in natural language. With novel reinforcement learning techniques, we are able to train an advanced Gemini model on multi-step reasoning proof data, which advances the model's capabilities in terms of

thumb_up_off_alt177

chat_bubble_outline5

repeat21

shareShare

🐻熊狸

@bigeagle_xd

a month ago

just finished the tech report and pushed to github. good night.

thumb_up_off_alt333

chat_bubble_outline19

repeat18

shareShare

Thomas Ahle

@thomasahle

a month ago

Days after the Deepmind/OpenAI results, the Kimi K2 paper already details how to do RL with non-verifiable rewards! So much impressive stuff in this paper

thumb_up_off_alt38

chat_bubble_outline1

repeat3

shareShare

Lin Yang

@lyang36

a month ago

🚨 Olympiad math + AI: We ran Google’s Gemini 2.5 Pro on the fresh IMO 2025 problems. With careful prompting and pipeline design, it solved 5 out of 6 — remarkable for tasks demanding deep insight and creativity. The model could win gold! 🥇 #AI #Math #LLMs #IMO2025

thumb_up_off_alt1,1K

chat_bubble_outline59

repeat118

shareShare

Shitian Zhao

@zst96687522

a month ago

"We design a general reinforcement learning framework that combines verifiable rewards (RLVR) with a self critique rubric reward mechanism. The model learns not only from externally defined tasks but also from evaluating its own outputs." Interesting🤔

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Shitian Zhao

@zst96687522

a month ago

Rubric reward via llm-as-a-judge is the way🔥

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Shitian Zhao

@zst96687522

a month ago

Three stages of cat-titude: playful derp, yarn entanglement, and existential teacup crisis. Which stage are you in today? #CatLife #CuteCats #FelineFun #PetHumor #KittenLove

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

InternLM

@intern_lm

a month ago

Our paper won an outstanding paper on ACL 2025. Try our best open-source multimodal reasoning model Intern-S1 at huggingface.co/internlm/Inter…. This 241B MoE model combines strong general-task capabilities with state-of-the-art performance on a wide range of scientific tasks,

thumb_up_off_alt94

chat_bubble_outline4

repeat12

shareShare

Shitian Zhao

張小珺 Xiaojùn

Cline

Shitian Zhao

Mira Murati

马东锡 NLP 🇸🇪

Xinyu Zhou

張小珺 Xiaojùn

Sedrick Keh

Alexander Wei

Shitian Zhao

will brown

Multi-Turn Interaction LLM Workshop @ NeurIPS 2025

Thang Luong

🐻熊狸

Thomas Ahle

Lin Yang

Shitian Zhao

Shitian Zhao

Shitian Zhao

InternLM