Tianbao Xie (@tianbaox) Twitter Tweets • TwiCopy

Zhoujun (Jorge) Cheng

6 months ago

We've been wondering about these too and studied multi-domain RLVR! One finding suggests that the conclusion "RL only elicits pretrained knowledge" is nuanced and varies by domain: 🔥 Heavily pretrained domains (Math, Code, Science) are indeed more readily "elicited." They

thumb_up_off_alt97

chat_bubble_outline2

repeat28

shareShare

merve

@mervenoyann

6 months ago

Qwen2.5-VL is such a great and versatile model that every frontier lab is building on it these days, new agentic models, GUI models and more always base on it Qwen you're the best 💗

thumb_up_off_alt365

chat_bubble_outline14

repeat28

shareShare

Chen Wu

@chenhenrywu

6 months ago

Language models are good at predicting the next word, but can they truly be creative? Creativity isn't just about being accurate. We want the model to tell us something (1) novel 🛸 – that can't be found anywhere on the internet, and (2) diverse 🍱 – so we are surprised each

thumb_up_off_alt22

chat_bubble_outline0

repeat4

shareShare

Yu Su @#ICLR2025

@ysu_nlp

6 months ago

I believe computer use, in principle, is much harder than math/coding for current AI. the digital world encompasses a much larger part of the complexity in this world. The goals are often vastly underspecified and require accessing and understanding broad context (in users’ head

thumb_up_off_alt56

chat_bubble_outline7

repeat7

shareShare

Lei Li

@_tobiaslee

6 months ago

MiMo-VL technical report, models, and evaluation suite are out! 🤗 Models: huggingface.co/XiaomiMiMo/MiM… (or RL) Report: arxiv.org/abs/2506.03569 Evaluation Suite: github.com/XiaomiMiMo/lmm… Looking back, it's incredible that we delivered such compact yet powerful vision-language

thumb_up_off_alt42

chat_bubble_outline2

repeat13

shareShare

Binyuan Hui

@huybery

6 months ago

Please check out and try our first embedding models: Qwen3-Embedding and Qwen3-ReRanker!

thumb_up_off_alt203

chat_bubble_outline11

repeat23

shareShare

Xing Han Lu

@xhluca

6 months ago

"Build the web for agents, not agents for the web" This position paper argues that rather than forcing web agents to adapt to UIs designed for humans, we should develop a new interface optimized for web agents, which we call Agentic Web Interface (AWI).

thumb_up_off_alt184

chat_bubble_outline7

repeat52

shareShare

XLANG NLP Lab

@xlangnlp

6 months ago

🔥New Computer Agent Arena Leaderboard Updates (2k+ user votes)! 🤔Which VLMs act better as computer use agents (CUAs)? 1, Claude Sonnet 4 🥇 2, Claude 3.7 Sonnet 🥈 3, UI-TARS-1.5 🥉 4, Operator More insights in the thread 👇 arena.xlang.ai

thumb_up_off_alt38

chat_bubble_outline1

repeat18

shareShare

Tianbao Xie

@tianbaox

6 months ago

CUA folks, please check out the latest rank and our newest analysis on current models!

thumb_up_off_alt18

chat_bubble_outline0

repeat3

shareShare

Tianbao Xie

@tianbaox

6 months ago

Wow evolving UI kinds of come true, although taking code as container this time.

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Chenxin An

@anchancy46881

6 months ago

# 🚨 4B open-recipe model beats Claude-4-Opus 🔓 100% open data, recipe, model weights and code. Introducing Polaris✨--a post-training recipe for scaling RL on advanced reasoning models. 🥳 Check out how we boost open-recipe reasoning models to incredible performance levels

thumb_up_off_alt441

chat_bubble_outline23

repeat80

shareShare

Tianbao Xie

@tianbaox

6 months ago

In depth analysis about RL reasoning under massive domains! need to think about how to scale this path other than math and code but more.

thumb_up_off_alt13

chat_bubble_outline0

repeat5

shareShare

Sinclair Wang

@sinclairwang1

6 months ago

What Makes a Base Language Model Suitable for RL? Rumors in the community say RL (i.e., RLVR) on LLMs is full of “mysteries”: (1) Is the magic only happening on Qwen + Math? (2) Does the "aha moment" only spark during math reasoning? (3) Is evaluation hiding some tricky traps?

thumb_up_off_alt505

chat_bubble_outline10

repeat89

shareShare

Qwen

@alibaba_qwen

5 months ago

Meet Qwen-VLo, your AI creative engine: • Concept-to-Polish: Turn rough sketches or text prompts into high-res visuals • On-the-Fly Edits: Refine product shots, adjust layouts or styles with simple commands • Global-Ready: Generate image in multiple languages • Progressive

thumb_up_off_alt1,1K

chat_bubble_outline56

repeat265

shareShare

Shuai Bai

@shuai_bai_

5 months ago

From QwenVL to Qwen2.5VL, we’ve kept enhancing our model’s ability to see and understand the world. Now, meet QwenVLo — our newest artist that can paint it. 🎨

thumb_up_off_alt29

chat_bubble_outline2

repeat4

shareShare

Tianbao Xie

@tianbaox

5 months ago

With the right computer-use agent data & strong foundation models, we get refined uranium tech. CAPTCHA data, human services, real accounts (gray markets), & a few GPUs? Unauthorized nuclear scientists & research shops. At the right moment, someone will leverage the internet's

thumb_up_off_alt11

chat_bubble_outline0

repeat0

shareShare

Li Junnan

@lijunnan0409

5 months ago

🚀Introducing GTA1 – our new GUI Agent that leads the OSWorld leaderboard with a 45.2% success rate, outperforming OpenAI's CUA! GTA1 improves two core components of GUI agents: Planning and Grounding. 🧠 Planning: A generic test-time scaling strategy that concurrently samples

thumb_up_off_alt65

chat_bubble_outline2

repeat16

shareShare

Kimi.ai

@kimi_moonshot

5 months ago

🚀 Hello, Kimi K2! Open-Source Agentic Model! 🔹 1T total / 32B active MoE model 🔹 SOTA on SWE Bench Verified, Tau2 & AceBench among open models 🔹Strong in coding and agentic tasks 🐤 Multimodal & thought-mode not supported for now With Kimi K2, advanced agentic intelligence

thumb_up_off_alt3,3K

chat_bubble_outline158

repeat614

shareShare

Qwen

@alibaba_qwen

5 months ago

🎉 Introducing Qwen.ai Explore three powerful tools in one place: 🔹 Qwen Chat — AI to brainstorm, create, and collaborate 🔹 Research — Stay updated with Qwen’s latest work 🔹 Qwen API — Perfect for building your own AI-powered apps 🌐 Dive in:

thumb_up_off_alt817

chat_bubble_outline27

repeat95

shareShare

Qwen

@alibaba_qwen

5 months ago

🚀 Qwen Chat for Desktop is here! 💻 All the power of Qwen Chat — now with MCP support for smarter, faster agents. ⚡️ Run MCP Server, boost productivity, and stay in control. 📥 Grab it now: qwen.ai/download

thumb_up_off_alt871

chat_bubble_outline53

repeat158

shareShare