rLLM (@rllm_project) 's Twitter Profile
rLLM

@rllm_project

Training AI agents to "learn from experience" with RL @BerkeleySky

ID: 1975088327431999488

linkhttps://github.com/rllm-org/rllm calendar_today06-10-2025 06:39:31

15 Tweet

334 Followers

8 Following

rLLM (@rllm_project) 's Twitter Profile Photo

🚀 Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes. Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple

🚀 Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes.

Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple
Sijun Tan (@sijun_tan) 's Twitter Profile Photo

I am incredibly excited to introduce rLLM v0.2. Zooming back to a year ago: OpenAI's o1-preview just dropped, and RL + test-time scaling suddenly became the hype. But no one knew how they did it. Kyle Montgomery and I had this idea - what if we built a solver-critique loop for

Kyle Montgomery (@kylepmont) 's Twitter Profile Photo

🚨 New preprint: Budget-aware Test-time Scaling via Discriminative Verification 👉 arxiv.org/pdf/2510.14913 We show that discriminative verification is the best option for test-time scaling under 25.5 minutes, outperforming state-of-the-art generative verification in both

🚨 New preprint:  Budget-aware Test-time Scaling via Discriminative Verification 👉 arxiv.org/pdf/2510.14913

We show that discriminative verification is the best option for test-time scaling under 25.5 minutes, outperforming state-of-the-art generative verification in both
rLLM (@rllm_project) 's Twitter Profile Photo

🚀 We just released rLLM v0.2.1 — packed with several exciting new features! What’s new: - rLLM SDK (preview): Turn your agents written in any frameworks (e.g. LangGraph, Strands) into continuous learners. - Tinker backend: Run serverless RL training with Tinker as the backend.

Sijun Tan (@sijun_tan) 's Twitter Profile Photo

We are releasing rLLM v0.2.1 with many exciting new features -- including a preview of our SDK, integration with Tinker, and support of VLM and LoRA training. Come check it out!

Sida (Star) Li (@starli27496427) 's Twitter Profile Photo

Been working on rLLM for the past few months 😀! This new version (and more to come) is definitely one step closer to 𝙙𝙚𝙢𝙤𝙘𝙧𝙖𝙩𝙞𝙯𝙞𝙣𝙜 𝙖𝙜𝙚𝙣𝙩𝙞𝙘 𝙍𝙇 𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 -- any agent you can write down, rLLM will help you train it.

rLLM (@rllm_project) 's Twitter Profile Photo

Train complex RAG agents with RL—directly on your existing code. Most RL frameworks force you into a rigid “agent-environment loop.” Real-world agents? They’re complex, stateful workflows. Refactoring them for training is a nightmare. That’s exactly what rLLM SDK is for: train

Train complex RAG agents with RL—directly on your existing code.

Most RL frameworks force you into a rigid “agent-environment loop.” Real-world agents? They’re complex, stateful workflows. Refactoring them for training is a nightmare.

That’s exactly what rLLM SDK is for: train
Tianhao Wu (@wththao) 's Twitter Profile Photo

Previously, RL training required a massive overhaul. You had to restructure your entire codebase just to fit a standard environment interface. As an AI agent developer, I’ve felt this pain deeply. One day I asked myself: Why not make training an agent as simple as making an API

Zhongwen Xu (@zhongwen2009) 's Twitter Profile Photo

Pleased to share our engineering practices for medium-sized LLMs in multi-turn agentic search, where we boosted Qwen3 8B and Qwen3 A3B from 1-2 turn search and 10% accuracy on Browsecomp-Plus to 15+ and 20+ turns with 30% accuracy. The devils are in the details; we hope our

Pleased to share our engineering practices for medium-sized LLMs in multi-turn agentic search, where we boosted Qwen3 8B and Qwen3 A3B from 1-2 turn search and 10% accuracy on Browsecomp-Plus to 15+ and 20+ turns with 30% accuracy. The devils are in the details; we hope our
Alex Ratner (@ajratner) 's Twitter Profile Photo

Another example of a tiny model (4B) with the right data/environments + RL beating a much larger one (235B) - and an awesome collab with the rLLM UC Berkeley Sky team!! + some good lessons about focusing on core, generalizable tool use skills as well as other insights: