Tianbao Xie (@tianbaox) 's Twitter Profile
Tianbao Xie

@tianbaox

Ph.D. candidate @XLangNLP lab and @HKUniversity NLP group 2022. Advised by @taoyds and @ikekong . 🤝 @Alibaba_Qwen @SFResearch

ID: 1471828389468147712

linkhttp://tianbaoxie.com calendar_today17-12-2021 13:03:36

606 Tweet

2,2K Followers

1,1K Following

Zhoujun (Jorge) Cheng (@chengzhoujun) 's Twitter Profile Photo

We've been wondering about these too and studied multi-domain RLVR! One finding suggests that the conclusion "RL only elicits pretrained knowledge" is nuanced and varies by domain: 🔥 Heavily pretrained domains (Math, Code, Science) are indeed more readily "elicited." They

We've been wondering about these too and studied multi-domain RLVR!

One finding suggests that the conclusion "RL only elicits pretrained knowledge" is nuanced and varies by domain:
🔥 Heavily pretrained domains (Math, Code, Science) are indeed more readily "elicited." They
merve (@mervenoyann) 's Twitter Profile Photo

Qwen2.5-VL is such a great and versatile model that every frontier lab is building on it these days, new agentic models, GUI models and more always base on it Qwen you're the best 💗

Chen Wu (@chenhenrywu) 's Twitter Profile Photo

Language models are good at predicting the next word, but can they truly be creative? Creativity isn't just about being accurate. We want the model to tell us something (1) novel 🛸 – that can't be found anywhere on the internet, and (2) diverse 🍱 – so we are surprised each

Yu Su @#ICLR2025 (@ysu_nlp) 's Twitter Profile Photo

I believe computer use, in principle, is much harder than math/coding for current AI. the digital world encompasses a much larger part of the complexity in this world. The goals are often vastly underspecified and require accessing and understanding broad context (in users’ head

Lei Li (@_tobiaslee) 's Twitter Profile Photo

MiMo-VL technical report, models, and evaluation suite are out! 🤗 Models: huggingface.co/XiaomiMiMo/MiM… (or RL) Report: arxiv.org/abs/2506.03569 Evaluation Suite: github.com/XiaomiMiMo/lmm… Looking back, it's incredible that we delivered such compact yet powerful vision-language

MiMo-VL technical report, models, and evaluation suite are out!  

 🤗 Models: huggingface.co/XiaomiMiMo/MiM… (or RL)
Report: arxiv.org/abs/2506.03569
Evaluation Suite: github.com/XiaomiMiMo/lmm…

Looking back, it's incredible that we delivered such compact yet powerful vision-language
Xing Han Lu (@xhluca) 's Twitter Profile Photo

"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).

"Build the web for agents, not agents for the web"

This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).
XLANG NLP Lab (@xlangnlp) 's Twitter Profile Photo

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)! 🤔Which VLMs act better as computer use agents (CUAs)? 1, Claude Sonnet 4 🥇 2, Claude 3.7 Sonnet 🥈 3, UI-TARS-1.5 🥉 4, Operator More insights in the thread 👇 arena.xlang.ai

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)!
🤔Which VLMs act better as computer use agents (CUAs)?

1, Claude Sonnet 4 🥇
2, Claude 3.7 Sonnet 🥈
3, UI-TARS-1.5 🥉
4, Operator

More insights in the thread 👇
arena.xlang.ai
Chenxin An (@anchancy46881) 's Twitter Profile Photo

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels

# 🚨 4B open-recipe model beats Claude-4-Opus 
🔓 100% open data, recipe, model weights and code.

Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 

🥳 Check out how we boost open-recipe reasoning models to incredible performance levels
Tianbao Xie (@tianbaox) 's Twitter Profile Photo

In depth analysis about RL reasoning under massive domains! need to think about how to scale this path other than math and code but more.

Sinclair Wang (@sinclairwang1) 's Twitter Profile Photo

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?

What Makes a Base Language Model Suitable for RL?

Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”:

(1) Is the magic only happening on Qwen + Math?
(2) Does the "aha moment" only spark during math reasoning?
(3) Is evaluation hiding some tricky traps?
Qwen (@alibaba_qwen) 's Twitter Profile Photo

Meet Qwen-VLo, your AI creative engine: • Concept-to-Polish: Turn rough sketches or text prompts into high-res visuals • On-the-Fly Edits: Refine product shots, adjust layouts or styles with simple commands • Global-Ready: Generate image in multiple languages • Progressive

Shuai Bai (@shuai_bai_) 's Twitter Profile Photo

From QwenVL to Qwen2.5VL, we’ve kept enhancing our model’s ability to see and understand the world. Now, meet QwenVLo — our newest artist that can paint it. 🎨

Tianbao Xie (@tianbaox) 's Twitter Profile Photo

With the right computer-use agent data & strong foundation models, we get refined uranium tech. CAPTCHA data, human services, real accounts (gray markets), & a few GPUs? Unauthorized nuclear scientists & research shops. At the right moment, someone will leverage the internet's

Li Junnan (@lijunnan0409) 's Twitter Profile Photo

🚀Introducing GTA1 – our new GUI Agent that leads the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI's CUA! GTA1 improves two core components of GUI agents: Planning and Grounding. 🧠 Planning: A generic test-time scaling strategy that concurrently samples

🚀Introducing GTA1 – our new GUI Agent that leads the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI's CUA!

GTA1 improves two core components of GUI agents: Planning and Grounding.

🧠 Planning: A generic test-time scaling strategy that concurrently samples
Kimi.ai (@kimi_moonshot) 's Twitter Profile Photo

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence

🚀 Hello, Kimi K2!  Open-Source Agentic Model!
🔹 1T total / 32B active MoE model
🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models
🔹Strong in coding and agentic tasks
🐤 Multimodal & thought-mode not supported for now

With Kimi K2, advanced agentic intelligence
Qwen (@alibaba_qwen) 's Twitter Profile Photo

🎉 Introducing Qwen.ai Explore three powerful tools in one place: 🔹 Qwen Chat — AI to brainstorm, create, and collaborate 🔹 Research — Stay updated with Qwen’s latest work 🔹 Qwen API — Perfect for building your own AI-powered apps 🌐 Dive in:

Qwen (@alibaba_qwen) 's Twitter Profile Photo

🚀 Qwen Chat for Desktop is here! 💻 All the power of Qwen Chat — now with MCP support for smarter, faster agents. ⚡️ Run MCP Server, boost productivity, and stay in control. 📥 Grab it now: qwen.ai/download