Adam Yanxiao Zhao (@sdpkjc_adam) 's Twitter Profile
Adam Yanxiao Zhao

@sdpkjc_adam

🧑‍🎓 CS PhD Student @UCAS | 🤖 Deep RL | 🏄‍♂️ Research Intern @ Z.ai | 🦶 Ex-Intern @ LiAuto @SenseTime @ ZeronTruck.com

ID: 1009849824370307072

linkhttp://sdpkjc.com calendar_today21-06-2018 17:25:35

121 Tweet

39 Followers

284 Following

Jarek Liesen (@jarekliesen) 's Twitter Profile Photo

🥳 I'm releasing Rejax, a lightweight library of fully vectorizable RL algorithms! ⚡ Enjoy lightning-fast speed using jax.jit on the training function 🧬Use vmap and pmap on hyperparameters 🔙 Log using flexible callbacks 🌐 Available @ github.com/kerajli/rejax 📸 Take a tour!

TNG Technology Consulting GmbH (@tngtech) 's Twitter Profile Photo

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to DeepSeek V3-0324 with a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens. The Chimera is a child LLM, using V3s

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to <a href="/deepseek_ai/">DeepSeek</a>  V3-0324 with a novel construction method.

In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.

The Chimera is a child LLM, using V3s
Z.ai (@zai_org) 's Twitter Profile Photo

Introducing GLM-4.5 and GLM-4.5 Air: new flagship models designed to unify frontier reasoning, coding, and agentic capabilities. GLM-4.5: 355B total / 32B active parameters GLM-4.5-Air: 106B total / 12B active parameters API Pricing (per 1M tokens): GLM-4.5: $0.6 Input / $2.2

Introducing GLM-4.5 and GLM-4.5 Air: new flagship models designed to unify frontier reasoning, coding, and agentic capabilities.

GLM-4.5: 355B total / 32B active parameters
GLM-4.5-Air: 106B total / 12B active parameters

API Pricing (per 1M tokens):
GLM-4.5: $0.6 Input / $2.2
Tanishq Mathew Abraham, Ph.D. (@iscienceluvr) 's Twitter Profile Photo

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents "To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

"To support scalable and robust training, we develop a distributed RL  infrastructure capable of orchestrating thousands of parallel virtual  desktop environments to accelerate large-scale
Xiao Liu (Shaw) (@shawliu12) 's Twitter Profile Photo

🚨Thrilled to share our latest progress on Computer Use Agent, ComputerRL, an end-to-end RL method which achieves 48.1% success rate on OSWorld Benchmark with only 9B open model, beating OpenAI Operator, Claude Sonnet 4.0, and other previous models, state-of-the-art performance.

Xuandong Zhao (@xuandongzhao) 's Twitter Profile Photo

Someone on rednote said they found bugs in the Open Review system on 11/12/2025 and even sent three emails to the security team, but still haven’t received a reply. Moments like this just reinforce the feeling that the whole world is held together with duct tape. #ICLR2026

Someone on rednote said they found bugs in the <a href="/openreviewnet/">Open Review</a> system on 11/12/2025 and even sent three emails to the security team, but still haven’t received a reply.

Moments like this just reinforce the feeling that the whole world is held together with duct tape.

#ICLR2026