Scott Condron (@_scottcondron) 's Twitter Profile
Scott Condron

@_scottcondron

Helping build AI/ML dev tools at @weights_biases. I post about machine learning, data visualisation, software tools.

ID: 982132042845401088

linkhttps://www.scottcondron.com/ calendar_today06-04-2018 05:45:00

2,2K Tweet

5,5K Takipçi

1,1K Takip Edilen

W&B Weave (@weave_wb) 's Twitter Profile Photo

Your RL run just spiked at step 89! But, do you know why? We’re fixing that. Today we’re launching W&B Weave Traces to give you a step by step look into your agent’s decisions. This is the first drop from our fresh new integration with OpenPipe. More RL magic is incoming.

Boris Dayma 🖍️ (@borisdayma) 's Twitter Profile Photo

Interesting Muon experiment 🤓 Learning rate of Adam parameters (embeddings/gains) does not matter so much (here from 1e-4 to 1e-2). It’s more about Muon LR

Interesting Muon experiment 🤓
Learning rate of Adam parameters (embeddings/gains) does not matter so much (here from 1e-4 to 1e-2).
It’s more about Muon LR
Fred Jonsson (@enginoid) 's Twitter Profile Photo

Shawn Lewis W&B Weave Weights & Biases have to say kudos on this. i added a bunch of weave.ops around the code to debug a perf issue in my GRPO rollouts. and i got flamegraphs!! and it's great for inspecting all the env interactions

Scott Condron (@_scottcondron) 's Twitter Profile Photo

Great to see people trying out using weave traces in W&B model training runs to inspect agent rollouts. Would love to chat to more people trying this out and to hear any other rollout visualizations you’d like to see on top of this

Kyle Corbitt (@corbtt) 's Twitter Profile Photo

🚀 Big launch from OpenPipe: We just launched Serverless RL — train agents faster and cheaper with zero infra headaches. Compared to running your own GPUs, Serverless RL is: - 40% cheaper - 28% faster wall‑clock - instantly deployed to prod via Weights & Biases Inference

🚀 Big launch from <a href="/OpenPipeAI/">OpenPipe</a>: We just launched Serverless RL — train agents faster and cheaper with zero infra headaches.

Compared to running your own GPUs, Serverless RL is:
 - 40% cheaper
 - 28% faster wall‑clock
 - instantly deployed to prod via <a href="/weights_biases/">Weights & Biases</a> Inference
Weights & Biases (@weights_biases) 's Twitter Profile Photo

RL X-mas came early. 🎄 For too long, building powerful AI agents with Reinforcement Learning has been blocked by GPU scarcity and complex infrastructure. That ends today. Introducing Serverless RL from wandb, powered by CoreWeave! We're making RL accessible to all.

Scott Condron (@_scottcondron) 's Twitter Profile Photo

Evals versus no evals is a pretty silly debate; the answer is always just enough evals. "Enough" means you’re getting a strong enough signal to make the next iteration worthwhile. If you can get enough signal by putting it in front of users and gathering implicit / explicit

Scott Condron (@_scottcondron) 's Twitter Profile Photo

I appreciate the "Research use cases" section of OpenAI's Apps SDK. They've clearly learned that AI projects fail when teams don't: - start with clear user goals - prototype against real prompts - align scope before building tools It’s applicable to almost any AI app and a

I appreciate the "Research use cases" section of <a href="/OpenAI/">OpenAI</a>'s Apps SDK.
They've clearly learned that AI projects fail when teams don't:
- start with clear user goals
- prototype against real prompts
- align scope before building tools

It’s applicable to almost any AI app and a
Shreya Shankar (@sh_reya) 's Twitter Profile Photo

100% agree with Andrew on the unreasonable effectiveness of error analysis, and that it’s a bit different for genAI and agents. Turns out there is a structured & well established framework to help with error analysis in gen AI—grounded theory! Hamel and I go into detail and do

W&B Weave (@weave_wb) 's Twitter Profile Photo

Stop juggling tabs to test your prompts! 🥵 The W&B Weave Playground is your new home for iterating on and comparing LLMs. And did you know... you can now generate images right in the Playground? Just search "image" in the model dropdown!

Stop juggling tabs to test your prompts! 🥵

The W&amp;B Weave Playground is your new home for iterating on and comparing LLMs.

And did you know... you can now generate images right in the Playground? Just search "image" in the model dropdown!
W&B Weave (@weave_wb) 's Twitter Profile Photo

The Princeton University University lab built their agent evaluation harness (HAL) using W&B Weave. They're leveraging Weave to automatically track, monitor, and unify telemetry across different LLM providers and frameworks for consistent, in-depth evaluation.

The <a href="/Princeton/">Princeton University</a> University lab built their agent evaluation harness (HAL) using W&amp;B Weave.

They're leveraging Weave to automatically track, monitor, and unify telemetry across different LLM providers and frameworks for consistent, in-depth evaluation.
W&B Weave (@weave_wb) 's Twitter Profile Photo

New QoL update for wandb Weave. We've added Quick Filters for your evals. Now, you can use the filter dropdown to search by name or dataset. This makes it easier to identify comparable evaluations and analyze performance trends across datasets.

New QoL update for wandb Weave. 

We've added Quick Filters for your evals. 

Now, you can use the filter dropdown to search by name or dataset. This makes it easier to identify comparable evaluations and analyze performance trends across datasets.