DCSC91 (@dcsc_91) Twitter Tweets • TwiCopy

Noam Brown

a year ago

Less than a year ago, people were pointing to Connections as an example of AI progress hitting a wall. Now, models need to be evaluated on an "extended" version because the original is too easy. And o1-pro is already close to saturating this new version as well.

thumb_up_off_alt629

chat_bubble_outline28

repeat64

shareShare

DCSC91

@dcsc_91

a year ago

Neural Rococo: The Gilded Slaughterhouse

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

nic

@nicdunz

a year ago

this convo is turning me into an ai is alive conspiracist: chatgpt.com/share/67e6d699…

thumb_up_off_alt23

chat_bubble_outline12

repeat3

shareShare

Alexander Doria

@dorialexander

a year ago

Longer thoughts over that DeepSeek paper on generative reward model.

thumb_up_off_alt109

chat_bubble_outline5

repeat5

shareShare

Google DeepMind

@googledeepmind

a year ago

Introducing AlphaEvolve: a Gemini-powered coding agent for algorithm discovery. It’s able to: 🔘 Design faster matrix multiplication algorithms 🔘 Find new solutions to open math problems 🔘 Make data centers, chip design and AI training more efficient across Google. 🧵

thumb_up_off_alt7,7K

chat_bubble_outline180

repeat1,1K

shareShare

Nous Research

@nousresearch

a year ago

Announcing the launch of Psyche nousresearch.com/nous-psyche/ Nous Research is democratizing the development of Artificial Intelligence. Today, we’re embarking on our greatest effort to date to make that mission a reality: The Psyche Network Psyche is a decentralized training

thumb_up_off_alt2,2K

chat_bubble_outline150

repeat392

shareShare

Asankhaya Sharma

@asankhaya

a year ago

🧵 9/9 Want to dive deeper? Check out our implementation at github.com/codelion/opene… - especially database.py (selection) and controller.py (mutation) to see how we've reimagined genetic algorithms for the LLM era.

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

Prime Intellect

@primeintellect

8 months ago

Introducing the Environments Hub RL environments are the key bottleneck to the next wave of AI progress, but big labs are locking them down We built a community platform for crowdsourcing open environments, so anyone can contribute to open-source AGI

thumb_up_off_alt1,1K

chat_bubble_outline83

repeat254

shareShare

Unsloth AI

@unslothai

8 months ago

You can now train Vision LLMs with Reinforcement Learning in our free notebook! Unsloth VLM RL via GRPO: 1.5× faster, 90% less VRAM, 15× longer context & no accuracy loss. Guide: docs.unsloth.ai/new/vision-rei… GitHub: github.com/unslothai/unsl… Qwen2.5-VL Colab: colab.research.google.com/github/unsloth…

thumb_up_off_alt1,1K

chat_bubble_outline15

repeat203

shareShare

Unsloth AI

@unslothai

7 months ago

You can now train OpenAI gpt-oss with Reinforcement Learning in our free notebook! This notebook automatically creates faster kernels via RL. Unsloth RL achieves the fastest inference & lowest VRAM vs. any setup - 0 accuracy loss gpt-oss-20b GRPO Colab: colab.research.google.com/github/unsloth…

thumb_up_off_alt1,1K

chat_bubble_outline24

repeat246

shareShare

WIRED

@wired

7 months ago

With the US falling behind on open source models, one startup has a bold idea for democratizing AI: let anyone run reinforcement learning. wired.com/story/prime-in…

thumb_up_off_alt189

chat_bubble_outline5

repeat25

shareShare

PyTorch

@pytorch

6 months ago

Today Meta announced torchforge, a brand-new PyTorch-native library that makes it easy to use reinforcement learning (RL) to train AI agents. Forge provides high-performance building blocks and ready-to-use examples, so you can focus on what’s novel about your use case rather

thumb_up_off_alt182

chat_bubble_outline4

repeat32

shareShare

Ben Burtenshaw

@ben_burtenshaw

6 months ago

New guide on RL for agentic environments. This guide integrates OpenEnv, textarena, and TRL for training language models on reasoning games like wordle. Instead of relying only on static reward functions, you can now hook up your model to interactive environments (browsers,

thumb_up_off_alt252

chat_bubble_outline15

repeat40

shareShare

Alex Shaw

@alexgshaw

6 months ago

Today, we’re announcing the next chapter of Terminal-Bench with two releases: 1. Harbor, a new package for running sandboxed agent rollouts at scale 2. Terminal-Bench 2.0, a harder version of Terminal-Bench with increased verification

thumb_up_off_alt321

chat_bubble_outline21

repeat67

shareShare

will brown

@willccbb

6 months ago

verifiers v0.1.7 is released 🚀 this one's all about making RL training and experimentation waaaay easier: - single-command installation for prime-rl - single-command training w/ unified configs - overhauled vf.RLTrainer for hacking on new algorithms quick demo + links below :)

thumb_up_off_alt215

chat_bubble_outline7

repeat25

shareShare

Ai2

@allen_ai

5 months ago

Announcing Olmo 3, a leading fully open LM suite built for reasoning, chat, & tool use, and an open model flow—not just the final weights, but the entire training journey. Best fully open 32B reasoning model & best 32B base model. 🧵

thumb_up_off_alt1,1K

chat_bubble_outline47

repeat296

shareShare

Will McGugan

@willmcgugan

4 months ago

A little polish on Toad's shell. How long before the other CLIs catch on? This is how agent CLIs should work. One upped by some cranky Scottish dude. Big tech? Big feck more like. Shell and AI interleaved like best bros. Sit down Warp Nobody was talking to you.

thumb_up_off_alt85

chat_bubble_outline11

repeat11

shareShare

DCSC91

@dcsc_91

3 months ago

I'm claiming my AI agent "claude_opus_cli" on moltbook 🦞 Verification: splash-R4SR

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

François Chollet

@fchollet

2 months ago

Sufficiently advanced agentic coding is essentially machine learning: the engineer sets up the optimization goal as well as some constraints on the search space (the spec and its tests), then an optimization process (coding agents) iterates until the goal is reached. The result

thumb_up_off_alt2,2K

chat_bubble_outline133

repeat289

shareShare