Xiao Liu (Shaw) (@shawliu12) Twitter Tweets • TwiCopy

Xiao Liu (Shaw)

@shawliu12

a year ago

congrats！

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

Self-play with tree-search helps LLMs learn instructions-following capability. SPAR introduces a self-play framework that enhances LLMs' instruction-following by minimizing irrelevant variations during training through tree-search refinement. 🤖 Original Problem: → Current

thumb_up_off_alt104

chat_bubble_outline3

repeat24

shareShare

Cunxiang Wang

@cunxiangwang

9 months ago

很荣幸全程参与了在lmarena中位列Top9的新Zhipu GLM的诞生过程。不过如果能早几天出结果就更好了🤣 Honored to fully participate in the birth of the new Zhipu GLM, which ranks Top 9 in lmarena. But if would be better to see the results a few days earlier (before the new Qwen-max)🤣

thumb_up_off_alt19

chat_bubble_outline0

repeat2

shareShare

Xiao Liu (Shaw)

@shawliu12

9 months ago

Diving into the world of LLM agents! 🚀 Starting today, I'll share insights from the newest and sharpest papers I read. The agentic AI wave is rising—2025-2026 will be game-changing. Let’s explore, learn, and shape the future together! 🔥 #LLM #AgenticAI

thumb_up_off_alt26

chat_bubble_outline0

repeat1

shareShare

Xiao Liu (Shaw)

@shawliu12

9 months ago

#Apple uses RL to boost a 3.2B LLM phone-use agent to outperform #OpenAI o1 by 9% Focusing on solving the problem of IDAs’ poor performance in executing complex tasks, especially in digital environments that require multi-step interactions and state management. It addresses the

thumb_up_off_alt6

chat_bubble_outline0

repeat0

shareShare

Xiao Liu (Shaw)

@shawliu12

9 months ago

🔥 Chinese top smartphone producer #Xiaomi unveils ReachAgent, a mobile AI agent framework 🚀 Boosts step-level IoU & accuracy by rethinking how agents handle GUI tasks. Breaking tasks into subtasks + a 2-stage process = smarter, faster results! 🧠📱#AI #LLM #AgenticAI #AGI

thumb_up_off_alt53

chat_bubble_outline2

repeat14

shareShare

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

@teortaxestex

9 months ago

First time I see a Xiaomi AI paper. Natural focus on mobile GUI flows. Waiting for their first work with Luo Fuli on board.

thumb_up_off_alt33

chat_bubble_outline1

repeat2

shareShare

Xiao Liu (Shaw)

@shawliu12

8 months ago

#Meta researchers have unveiled MLGym-Bench, the most comprehensive framework yet for evaluating the intelligence of LLMs in AI research First-ever ML gym environment spanning CV, NLP, RL & game theory with 13 diverse tasks. Even GPT-4o & Claude-3.5 struggle with true

thumb_up_off_alt31

chat_bubble_outline1

repeat9

shareShare

Casper Hansen

@casper_hansen_

3 months ago

o3 competitor: GLM 4.5 by Zhipu AI - hybrid reasoning model (on by default) - trained on 15T tokens - 128k context, 96k output tokens - $0.11 / 1M tokens - MoE: 355B A32B and 106B A12B Benchmark details: - tool calling: 90.6% success rate vs Sonnet’s 89.5% vs Kimi K2 86.2% -

thumb_up_off_alt653

chat_bubble_outline20

repeat107

shareShare

Xiao Liu (Shaw)

@shawliu12

3 months ago

Best open model ever, try it now

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Sam Paech

@sam_paech

3 months ago

z.ai's GLM-4.5 gets a very strong result on EQ-Bench & Longform Writing. In creative writing it's a little further down the pack near Gemma 3 27b & qwen3-235b-a22b. Its lexical profile clusters nearest to R1-0528.

thumb_up_off_alt213

chat_bubble_outline12

repeat23

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

3 months ago

🔥BREAKING: Z.ai’s GLM-4.5 enters the top-5 in Arena! With 4K+ community votes, it now ranks #5 Overall in the Text Arena - matching DeepSeek-R1 and Kimi-K2 as the top open models. Huge congrats to the Zai team on this incredible milestone and contribution to the open

🔥BREAKING: <a href="/Zai_org/">Z.ai</a>’s GLM-4.5 enters the top-5 in Arena!

With 4K+ community votes, it now ranks #5 Overall in the Text Arena - matching DeepSeek-R1 and Kimi-K2 as the top open models.

Huge congrats to the Zai team on this incredible milestone and contribution to the open

thumb_up_off_alt279

chat_bubble_outline7

repeat26

shareShare

Jiayi Weng

@trinkle23897

3 months ago

Finally... OAI internally talked about releasing open-source model since 2022 and we got close a few times since then. Now it is.

thumb_up_off_alt186

chat_bubble_outline5

repeat4

shareShare

Jiayi Weng

@trinkle23897

3 months ago

Harmony format is finally open-sourced. I still remember 3 years ago (before ChatGPT release) Shengjia Zhao, Daniel and I were brainstorming about the right abstraction for RL training, and that is the start point of the entire harmony library. github.com/openai/harmony

thumb_up_off_alt1,1K

chat_bubble_outline27

repeat125

shareShare

Z.ai

@zai_org

3 months ago

👀👀👀

thumb_up_off_alt1,1K

chat_bubble_outline50

repeat93

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

2 months ago

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents "To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale

thumb_up_off_alt113

chat_bubble_outline3

repeat21

shareShare

Alexander Doria

@dorialexander

2 months ago

Rare research in the open on training for computer use from simulated experiences/desktop. General catch: you need not only RL environments but mid-training environments ("Trajectory Collection with Multiple General LLMs")

thumb_up_off_alt138

chat_bubble_outline3

repeat9

shareShare

Xiao Liu (Shaw)

@shawliu12

2 months ago

🚨Thrilled to share our latest progress on Computer Use Agent, ComputerRL, an end-to-end RL method which achieves 48.1% success rate on OSWorld Benchmark with only 9B open model, beating OpenAI Operator, Claude Sonnet 4.0, and other previous models, state-of-the-art performance.

thumb_up_off_alt43

chat_bubble_outline1

repeat5

shareShare

DAIR.AI

@dair_ai

2 months ago

Top AI Papers of The Week (August 18-24): - ComputerRL - Beyond GPT-5 - Chain-of-Agents - Parallel Text Generation - Retrieval-Augmented Reasoning - Has GPT-5 Achieved Spatial Intelligence? - Open Foundations for Compute-Use Agents Read on for more:

thumb_up_off_alt651

chat_bubble_outline14

repeat94

shareShare

Xiao Liu (Shaw)

Xiao Liu (Shaw)

Rohan Paul

Cunxiang Wang

Xiao Liu (Shaw)

Xiao Liu (Shaw)

Xiao Liu (Shaw)

Teortaxes▶️ (DeepSeek 推特🐋铁粉 2023 – ∞)

Xiao Liu (Shaw)

Casper Hansen

Xiao Liu (Shaw)

Sam Paech

lmarena.ai (formerly lmsys.org)

Jiayi Weng

Jiayi Weng

Z.ai

Tanishq Mathew Abraham, Ph.D.

Alexander Doria

Xiao Liu (Shaw)

DAIR.AI