Adam Yanxiao Zhao (@sdpkjc_adam) Twitter Tweets • TwiCopy

Adam Yanxiao Zhao

@sdpkjc_adam

+ Follow

🧑‍🎓 CS PhD Student @UCAS | 🤖 Deep RL | 🏄‍♂️ Research Intern @ Z.ai | 🦶 Ex-Intern @ LiAuto @SenseTime @ ZeronTruck.com

ID: 1009849824370307072

linkhttp://sdpkjc.com calendar_today21-06-2018 17:25:35

121 Tweet

39 Followers

284 Following

Jarek Liesen

@jarekliesen

2 years ago

🥳 I'm releasing Rejax, a lightweight library of fully vectorizable RL algorithms! ⚡ Enjoy lightning-fast speed using jax.jit on the training function 🧬Use vmap and pmap on hyperparameters 🔙 Log using flexible callbacks 🌐 Available @ github.com/kerajli/rejax 📸 Take a tour!

thumb_up_off_alt171

chat_bubble_outline4

repeat28

shareShare

Adam Yanxiao Zhao

@sdpkjc_adam

a year ago

🚀

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Joseph Suarez (e/🐡)

@jsuarez5341

a year ago

x.com/i/article/1863…

thumb_up_off_alt316

chat_bubble_outline10

repeat25

shareShare

will brown

@willccbb

a year ago

trying to make it really really easy to build LLM RL envs

thumb_up_off_alt354

chat_bubble_outline8

repeat21

shareShare

TNG Technology Consulting GmbH

@tngtech

a year ago

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to DeepSeek V3-0324 with a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens. The Chimera is a child LLM, using V3s

Today we release DeepSeek-R1T-Chimera, an open weights model adding R1 reasoning to <a href="/deepseek_ai/">DeepSeek</a> V3-0324 with a novel construction method.

In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.

The Chimera is a child LLM, using V3s

thumb_up_off_alt611

chat_bubble_outline27

repeat110

shareShare

Z.ai

@zai_org

9 months ago

Introducing GLM-4.5 and GLM-4.5 Air: new flagship models designed to unify frontier reasoning, coding, and agentic capabilities. GLM-4.5: 355B total / 32B active parameters GLM-4.5-Air: 106B total / 12B active parameters API Pricing (per 1M tokens): GLM-4.5: $0.6 Input / $2.2

thumb_up_off_alt1,1K

chat_bubble_outline116

repeat248

shareShare

Tanishq Mathew Abraham, Ph.D.

@iscienceluvr

8 months ago

ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents "To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale

thumb_up_off_alt113

chat_bubble_outline3

repeat21

shareShare

Adam Yanxiao Zhao

@sdpkjc_adam

8 months ago

Lucky to have collaborated with an amazing team on this work! 🎉🚀😃

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Xiao Liu (Shaw)

@shawliu12

8 months ago

🚨Thrilled to share our latest progress on Computer Use Agent, ComputerRL, an end-to-end RL method which achieves 48.1% success rate on OSWorld Benchmark with only 9B open model, beating OpenAI Operator, Claude Sonnet 4.0, and other previous models, state-of-the-art performance.

thumb_up_off_alt43

chat_bubble_outline1

repeat5

shareShare

Xuandong Zhao

@xuandongzhao

5 months ago

Someone on rednote said they found bugs in the Open Review system on 11/12/2025 and even sent three emails to the security team, but still haven’t received a reply. Moments like this just reinforce the feeling that the whole world is held together with duct tape. #ICLR2026

Someone on rednote said they found bugs in the <a href="/openreviewnet/">Open Review</a> system on 11/12/2025 and even sent three emails to the security team, but still haven’t received a reply.

Moments like this just reinforce the feeling that the whole world is held together with duct tape.

#ICLR2026

thumb_up_off_alt142

chat_bubble_outline5

repeat6

shareShare