ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents
"To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale