dmayhem93 (@dmayhem93) Twitter Tweets • TwiCopy

dmayhem93

@dmayhem93

6 months ago

thumb_up_off_alt16

chat_bubble_outline0

repeat1

shareShare

We have just released v0.2.0 of Atropos - Nous' RL environments project! New environments, updated handling for api endpoints that don't support n iterations from the openai spec, added better TRL support, and an official Trainer partner, Axolotl! Read the changelog here:

thumb_up_off_alt162

chat_bubble_outline9

repeat12

shareShare

Teknium (e/λ)

@teknium1

6 months ago

So many updates and the hackathon hasnt even begun! You can now train using Atropos using Axolotl too

So many updates and the hackathon hasnt even begun!

You can now train using Atropos using <a href="/axolotl_ai/">Axolotl</a> too

thumb_up_off_alt22

chat_bubble_outline3

repeat1

shareShare

Teknium (e/λ)

@teknium1

6 months ago

thumb_up_off_alt100

chat_bubble_outline8

repeat6

shareShare

Axolotl

@axolotl_ai

6 months ago

🚀 Async-RL is now available with Nous Research Atropos and Axolotl! We've created a plugin for tighter RL iterations that leverages Atropos' optimized asynchronous rollouts. Check out the plugin and get started: github.com/axolotl-ai-clo…

thumb_up_off_alt65

chat_bubble_outline0

repeat15

shareShare

xAI

@xai

6 months ago

Come hack with us!

thumb_up_off_alt1,1K

chat_bubble_outline158

repeat170

shareShare

Nous Research

@nousresearch

6 months ago

Announcing the launch of Psyche nousresearch.com/nous-psyche/ Nous Research is democratizing the development of Artificial Intelligence. Today, we’re embarking on our greatest effort to date to make that mission a reality: The Psyche Network Psyche is a decentralized training

thumb_up_off_alt2,2K

chat_bubble_outline150

repeat392

shareShare

mephisto

@karan4d

6 months ago

Largest distributed pretrain ever

thumb_up_off_alt203

chat_bubble_outline18

repeat19

shareShare

Teknium (e/λ)

@teknium1

6 months ago

Today marks a really big achievement for Nous, but also potentially the AI Landscape. We have begun a decentralized pretraining run of what is basically a dense Deepseek - 40B parameters, over 20T tokens, with MLA for long context efficiency. All checkpoints, unannealed,

thumb_up_off_alt728

chat_bubble_outline40

repeat82

shareShare

emozilla

@theemozilla

6 months ago

Fun story about getting this to work -- since we're doing a dense model it was important to do tensor parallelism inside of MLA but we kept getting weird divergences DeepSeek skipped this since they had smaller experts that could fully fit within a single GPU (they detail this

thumb_up_off_alt155

chat_bubble_outline4

repeat11

shareShare

nathan lile

@nathanthinks

6 months ago

excellent work by Jason Weston & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment scalable (synthetic) evaluation continues to be AI's key bottleneck!

excellent work by <a href="/jaseweston/">Jason Weston</a> & team—extending our "Generative Reward Models" work with RL (GRPO) to optimize LLM reasoning during judgment

scalable (synthetic) evaluation continues to be AI's key bottleneck!

thumb_up_off_alt95

chat_bubble_outline1

repeat12

shareShare

Teknium (e/λ)

@teknium1

6 months ago

There's like 5 or 10 slots left for the most promising potential competitors - Sign up now!

thumb_up_off_alt46

chat_bubble_outline1

repeat3

shareShare

Cerebral Valley

@cerebral_valley

6 months ago

fate awaits at Nous Research hackathon opening 🔥

thumb_up_off_alt52

chat_bubble_outline4

repeat6

shareShare

Nous Research

@nousresearch

6 months ago

Nous Research's RL Environments Hackathon recap thread! Starting with the stars of the show, the winners! Top 3 for the subjective track were: 1st - Pokemon Trainer - by Ajay Uppili Arasanipalai & Alex Reibman 🖇️ 2nd - VR-CLImax by Jake Boggs 3rd - DynastAI by David van Vliet and

thumb_up_off_alt386

chat_bubble_outline22

repeat47

shareShare

Kyle Fish

@fish_kyle3

6 months ago

🧵For Claude Opus 4, we ran our first pre-launch model welfare assessment. To be clear, we don’t know if Claude has welfare. Or what welfare even is, exactly? 🫠 But, we think this could be important, so we gave it a go. And things got pretty wild…

thumb_up_off_alt623

chat_bubble_outline35

repeat70

shareShare

Cerebral Valley

@cerebral_valley

6 months ago

Six teams just won $50,000 at Nous' first ever RL hackathon 🤩 Check out the winning demos👇 Nous Research xAI NVIDIA Nebius Akash Network Lambda TensorStax RunPod

thumb_up_off_alt79

chat_bubble_outline6

repeat13

shareShare

Shashwat Goel

@shashwatgoel7

6 months ago

Confused about recent LLM RL results where models improve without any ground-truth signal? We were too. Until we looked at the reported numbers of the Pre-RL models and realized they were serverely underreported across papers. We compiled discrepancies in a blog below🧵👇

thumb_up_off_alt836

chat_bubble_outline33

repeat120

shareShare

Allan

@niemerg

6 months ago

Huge fan of Claude Code—so I built a python version using smolagents! Introducing SmolCC 🤖📟🛠️ An open source coding agent with Claude Code style tools (bash, grep, edit…✨) that can be easily customized.

thumb_up_off_alt101

chat_bubble_outline3

repeat17

shareShare

Nous Research

@nousresearch

6 months ago

Sequential Monte Carlo (SMC) is a powerful approximation method where multiple branches, or “particles”, are sampled, weighted, and resampled against a scoring function to produce likelier completions that fit the constraints.

thumb_up_off_alt44

chat_bubble_outline1

repeat5

shareShare

nightwing

@yaboilyrical

6 months ago

absolutely ecstatic to announce my latest research with Nous Research in controlled text generation! this has been a very challenging and rewarding experience over the past ~6 months, I’m so happy to finally put it out into the world!

thumb_up_off_alt47

chat_bubble_outline4

repeat8

shareShare