Zhongwen Xu (@zhongwen2009) 's Twitter Profile
Zhongwen Xu

@zhongwen2009

Principal Researcher at Tencent

ID: 109561078

linkhttp://zhongwen.one calendar_today29-01-2010 13:36:35

429 Tweet

613 Followers

961 Following

Looool (@datawarmup) 's Twitter Profile Photo

Simon Used notebookllm to recreate slides for professor david mackay’s YouTube course on information theory, using transcripts and slides which is basically him imaged while deriving equations and showing examples on blackboard, the slides generated are so information dense and with

-Zho- (@zho_zho_zho) 's Twitter Profile Photo

卧槽,Nano Banana Pro 上限太高了!!!!!! 原作者 Kris 的创意太好玩了!延伸了一下: “I want to see how this was designed|我想看看这个是如何设计出来的” 把我的清华给🍌Pro,这结果卧槽,太强了,连平面轴测构造都要给我画好了啊啊啊啊啊啊 ZHNO|创意系列|Nano Banana Pro

卧槽,Nano Banana Pro 上限太高了!!!!!!

原作者 Kris 的创意太好玩了!延伸了一下:

“I want to see how this was designed|我想看看这个是如何设计出来的”

把我的清华给🍌Pro,这结果卧槽,太强了,连平面轴测构造都要给我画好了啊啊啊啊啊啊

ZHNO|创意系列|Nano Banana Pro
(((ل()(ل() 'yoav))))👾 (@yoavgo) 's Twitter Profile Photo

the fascinating (to me) quality of hard-core RL researchers (e.g. Sutton, but also many others) is the ability to have this very broad, all encompassing view of RL as the principle basis of intelligence, while at the same time working on super low level stuff like temporal

DeepSeek (@deepseek_ai) 's Twitter Profile Photo

🏆 World-Leading Reasoning 🔹 V3.2: Balanced inference vs. length. Your daily driver at GPT-5 level performance. 🔹 V3.2-Speciale: Maxed-out reasoning capabilities. Rivals Gemini-3.0-Pro. 🥇 Gold-Medal Performance: V3.2-Speciale attains gold-level results in IMO, CMO, ICPC World

🏆 World-Leading Reasoning

🔹 V3.2: Balanced inference vs. length. Your daily driver at GPT-5 level performance.
🔹 V3.2-Speciale: Maxed-out reasoning capabilities. Rivals Gemini-3.0-Pro.
🥇 Gold-Medal Performance: V3.2-Speciale attains gold-level results in IMO, CMO, ICPC World
Chujie Zheng (@chujiezheng) 's Twitter Profile Photo

Glad to introduce our research on understanding the "mathematical principles" behind reinforcement learning (RL) with LLMs, and how stabilization techniques work 🧠 📄 huggingface.co/papers/2512.01… 👇 Thread below

Glad to introduce our research on understanding the "mathematical principles" behind reinforcement learning (RL) with LLMs, and how stabilization techniques work 🧠

📄 huggingface.co/papers/2512.01…
👇 Thread below
Kevin Patrick Murphy (@sirbayes) 's Twitter Profile Photo

I am pleased to announce another update to my RL tutorial (arxiv.org/abs/2412.05265). This time I have added code for RLFT for multi-turn LLM agents, using the awesome Tinker library from Thinking Machines, and the simple ReBN training loop from GEM by Zichen Liu et al. With ~100

I am pleased to announce another update to my RL tutorial (arxiv.org/abs/2412.05265). This time I have added code for RLFT for multi-turn LLM agents, using the awesome Tinker library from <a href="/thinkymachines/">Thinking Machines</a>, and the simple ReBN training loop from GEM by <a href="/zzlccc/">Zichen Liu</a> et al.  With ~100
Logan Kilpatrick (@officiallogank) 's Twitter Profile Photo

Gemini 3 Flash punches way above its weight class, surpassing 2.5 Pro on many benchmarks, while being much cheaper, faster, and more token efficient.

Gemini 3 Flash punches way above its weight class, surpassing 2.5 Pro on many benchmarks, while being much cheaper, faster, and more token efficient.
Zhongwen Xu (@zhongwen2009) 's Twitter Profile Photo

Pleased to share our engineering practices for medium-sized LLMs in multi-turn agentic search, where we boosted Qwen3 8B and Qwen3 A3B from 1-2 turn search and 10% accuracy on Browsecomp-Plus to 15+ and 20+ turns with 30% accuracy. The devils are in the details; we hope our

Pleased to share our engineering practices for medium-sized LLMs in multi-turn agentic search, where we boosted Qwen3 8B and Qwen3 A3B from 1-2 turn search and 10% accuracy on Browsecomp-Plus to 15+ and 20+ turns with 30% accuracy. The devils are in the details; we hope our
isaac 🧩 (@isaacbmiller1) 's Twitter Profile Photo

Having played with Qwen-3 32B (not the 30B-A3 version as they do here) on BrowseComp Plus quite a bit, the funniest thing is that it just gives up really easily. The original paper had it at ~3% recall and less than one(!!!) tool call per question. LESS THAN ONE! IT JUST WASNT

Zhongwen Xu (@zhongwen2009) 's Twitter Profile Photo

We just uploaded the trained weights to HF. Feel free to play with the models! A3B: huggingface.co/aidenjhwu/Sear… 8B: huggingface.co/aidenjhwu/Sear…

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become

Boris Cherny (@bcherny) 's Twitter Profile Photo

Andrej Karpathy I feel this way most weeks tbh. Sometimes I start approaching a problem manually, and have to remind myself “claude can probably do this”. Recently we were debugging a memory leak in Claude Code, and I started approaching it the old fashioned way: connecting a profiler, using the

Jackson Kernion (@jacksonkernion) 's Twitter Profile Photo

I'm trying to figure out what to care about next. I joined Anthropic 4+ years ago, motivated by the dream of building AGI. I was convinced from studying philosophy of mind that we're approaching sufficient scale and that anything that can be learned can be learned in an RL env.

Jaana Dogan ヤナ ドガン (@rakyll) 's Twitter Profile Photo

I'm not joking and this isn't funny. We have been trying to build distributed agent orchestrators at Google since last year. There are various options, not everyone is aligned... I gave Claude Code a description of the problem, it generated what we built last year in an hour.