rLLM (@rllm_project) Twitter Tweets • TwiCopy

rLLM

@rllm_project

+ Follow

Training AI agents to "learn from experience" with RL @BerkeleySky

ID: 1975088327431999488

linkhttps://github.com/rllm-org/rllm calendar_today06-10-2025 06:39:31

15 Tweet

334 Followers

8 Following

rLLM

@rllm_project

6 months ago

🚀 Introducing rLLM v0.2 - train arbitrary agentic programs with RL, with minimal code changes. Most RL training systems adopt the agent-environment abstraction. But what about complex workflows? Think solver-critique pairs collaborating, or planner agents orchestrating multiple

thumb_up_off_alt136

chat_bubble_outline2

repeat28

shareShare

Sijun Tan

@sijun_tan

6 months ago

I am incredibly excited to introduce rLLM v0.2. Zooming back to a year ago: OpenAI's o1-preview just dropped, and RL + test-time scaling suddenly became the hype. But no one knew how they did it. Kyle Montgomery and I had this idea - what if we built a solver-critique loop for

thumb_up_off_alt303

chat_bubble_outline8

repeat35

shareShare

Kyle Montgomery

@kylepmont

6 months ago

🚨 New preprint: Budget-aware Test-time Scaling via Discriminative Verification 👉 arxiv.org/pdf/2510.14913 We show that discriminative verification is the best option for test-time scaling under 25.5 minutes, outperforming state-of-the-art generative verification in both

thumb_up_off_alt11

chat_bubble_outline3

repeat3

shareShare

rLLM

@rllm_project

4 months ago

🚀 We just released rLLM v0.2.1 — packed with several exciting new features! What’s new: - rLLM SDK (preview): Turn your agents written in any frameworks (e.g. LangGraph, Strands) into continuous learners. - Tinker backend: Run serverless RL training with Tinker as the backend.

thumb_up_off_alt28

chat_bubble_outline1

repeat7

shareShare

Sijun Tan

@sijun_tan

4 months ago

We are releasing rLLM v0.2.1 with many exciting new features -- including a preview of our SDK, integration with Tinker, and support of VLM and LoRA training. Come check it out!

thumb_up_off_alt27

chat_bubble_outline1

repeat5

shareShare

Sida (Star) Li

@starli27496427

4 months ago

Been working on rLLM for the past few months 😀! This new version (and more to come) is definitely one step closer to 𝙙𝙚𝙢𝙤𝙘𝙧𝙖𝙩𝙞𝙯𝙞𝙣𝙜 𝙖𝙜𝙚𝙣𝙩𝙞𝙘 𝙍𝙇 𝙩𝙧𝙖𝙞𝙣𝙞𝙣𝙜 -- any agent you can write down, rLLM will help you train it.

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

rLLM

@rllm_project

4 months ago

Train complex RAG agents with RL—directly on your existing code. Most RL frameworks force you into a rigid “agent-environment loop.” Real-world agents? They’re complex, stateful workflows. Refactoring them for training is a nightmare. That’s exactly what rLLM SDK is for: train

thumb_up_off_alt24

chat_bubble_outline0

repeat3

shareShare

Tianhao Wu

@wththao

4 months ago

Previously, RL training required a massive overhaul. You had to restructure your entire codebase just to fit a standard environment interface. As an AI agent developer, I’ve felt this pain deeply. One day I asked myself: Why not make training an agent as simple as making an API

thumb_up_off_alt31

chat_bubble_outline0

repeat3

shareShare

Zhongwen Xu

@zhongwen2009

4 months ago

Pleased to share our engineering practices for medium-sized LLMs in multi-turn agentic search, where we boosted Qwen3 8B and Qwen3 A3B from 1-2 turn search and 10% accuracy on Browsecomp-Plus to 15+ and 20+ turns with 30% accuracy. The devils are in the details; we hope our

thumb_up_off_alt512

chat_bubble_outline13

repeat54

shareShare

sid is in sf 🌉

@sidbing

3 months ago

verifiers should now work with rLLM and (soon) with atropos. Prime Intellect Nous Research rLLM

verifiers should now work with rLLM and (soon) with atropos.

<a href="/PrimeIntellect/">Prime Intellect</a> <a href="/NousResearch/">Nous Research</a> <a href="/rllm_project/">rLLM</a>

thumb_up_off_alt27

chat_bubble_outline4

repeat1

shareShare

rLLM

@rllm_project

2 months ago

Excited to see this new training paradigm being explored on top of rLLM! 🚀

thumb_up_off_alt9

chat_bubble_outline1

repeat2

shareShare

rLLM

@rllm_project

2 months ago

x.com/i/article/2017…

thumb_up_off_alt34

chat_bubble_outline1

repeat9

shareShare

Alex Ratner

@ajratner

2 months ago

Another example of a tiny model (4B) with the right data/environments + RL beating a much larger one (235B) - and an awesome collab with the rLLM UC Berkeley Sky team!! + some good lessons about focusing on core, generalizable tool use skills as well as other insights:

thumb_up_off_alt41

chat_bubble_outline1

repeat7

shareShare