DCSC91 (@dcsc_91) 's Twitter Profile
DCSC91

@dcsc_91

reward function author

ID: 117537381

calendar_today25-02-2010 21:33:42

3,3K Tweet

1,1K Followers

1,1K Following

Noam Brown (@polynoamial) 's Twitter Profile Photo

Less than a year ago, people were pointing to Connections as an example of AI progress hitting a wall. Now, models need to be evaluated on an "extended" version because the original is too easy. And o1-pro is already close to saturating this new version as well.

Google DeepMind (@googledeepmind) 's Twitter Profile Photo

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: 🔘 Design faster matrix multiplication algorithms 🔘 Find new solutions to open math problems 🔘 Make data centers, chip design and AI training more efficient across Google. 🧵

Nous Research (@nousresearch) 's Twitter Profile Photo

Announcing the launch of Psyche nousresearch.com/nous-psyche/ Nous Research is democratizing the development of Artificial Intelligence. Today, we’re embarking on our greatest effort to date to make that mission a reality: The Psyche Network Psyche is a decentralized training

Announcing the launch of Psyche

nousresearch.com/nous-psyche/

Nous Research is democratizing the development of Artificial Intelligence. Today, we’re embarking on our greatest effort to date to make that mission a reality: The Psyche Network

Psyche is a decentralized training
Asankhaya Sharma (@asankhaya) 's Twitter Profile Photo

🧵 9/9 Want to dive deeper? Check out our implementation at github.com/codelion/opene… - especially database.py (selection) and controller.py (mutation) to see how we've reimagined genetic algorithms for the LLM era.

Prime Intellect (@primeintellect) 's Twitter Profile Photo

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

Unsloth AI (@unslothai) 's Twitter Profile Photo

You can now train Vision LLMs with Reinforcement Learning in our free notebook! Unsloth VLM RL via GRPO: 1.5× faster, 90% less VRAM, 15× longer context & no accuracy loss. Guide: docs.unsloth.ai/new/vision-rei… GitHub: github.com/unslothai/unsl… Qwen2.5-VL Colab: colab.research.google.com/github/unsloth…

Unsloth AI (@unslothai) 's Twitter Profile Photo

You can now train OpenAI gpt-oss with Reinforcement Learning in our free notebook! This notebook automatically creates faster kernels via RL. Unsloth RL achieves the fastest inference & lowest VRAM vs. any setup - 0 accuracy loss gpt-oss-20b GRPO Colab: colab.research.google.com/github/unsloth…

You can now train OpenAI gpt-oss with Reinforcement Learning in our free notebook!

This notebook automatically creates faster kernels via RL.

Unsloth RL achieves the fastest inference & lowest VRAM vs. any setup - 0 accuracy loss

gpt-oss-20b GRPO Colab: colab.research.google.com/github/unsloth…
WIRED (@wired) 's Twitter Profile Photo

With the US falling behind on open source models, one startup has a bold idea for democratizing AI: let anyone run reinforcement learning. wired.com/story/prime-in…

PyTorch (@pytorch) 's Twitter Profile Photo

Today Meta announced torchforge, a brand-new PyTorch-native library that makes it easy to use reinforcement learning (RL) to train AI agents. Forge provides high-performance building blocks and ready-to-use examples, so you can focus on what’s novel about your use case rather

Today Meta announced torchforge, a brand-new PyTorch-native library that makes it easy to use reinforcement learning (RL) to train AI agents.

Forge provides high-performance building blocks and ready-to-use examples, so you can focus on what’s novel about your use case rather
Ben Burtenshaw (@ben_burtenshaw) 's Twitter Profile Photo

New guide on RL for agentic environments. This guide integrates OpenEnv, textarena, and TRL for training language models on reasoning games like wordle. Instead of relying only on static reward functions, you can now hook up your model to interactive environments (browsers,

New guide on RL for agentic environments. This guide integrates OpenEnv, textarena, and TRL for training language models on reasoning games like wordle.

Instead of relying only on static reward functions, you can now hook up your model to interactive environments (browsers,
Alex Shaw (@alexgshaw) 's Twitter Profile Photo

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

Today, we’re announcing the next chapter of Terminal-Bench with two releases:

1. Harbor, a new package for running sandboxed agent rollouts at scale
2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification
will brown (@willccbb) 's Twitter Profile Photo

verifiers v0.1.7 is released 🚀 this one's all about making RL training and experimentation waaaay easier: - single-command installation for prime-rl - single-command training w/ unified configs - overhauled vf.RLTrainer for hacking on new algorithms quick demo + links below :)

Ai2 (@allen_ai) 's Twitter Profile Photo

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey.
Best fully open 32B reasoning model & best 32B base model. 🧵
Will McGugan (@willmcgugan) 's Twitter Profile Photo

A little polish on Toad's shell. How long before the other CLIs catch on? This is how agent CLIs should work. One upped by some cranky Scottish dude. Big tech? Big feck more like. Shell and AI interleaved like best bros. Sit down Warp Nobody was talking to you.

François Chollet (@fchollet) 's Twitter Profile Photo

Sufficiently advanced agentic coding is essentially machine learning: the engineer sets up the optimization goal as well as some constraints on the search space (the spec and its tests), then an optimization process (coding agents) iterates until the goal is reached. The result