Shawn Lewis (@shawnup) 's Twitter Profile
Shawn Lewis

@shawnup

Founder & CTO @weights_biases. Building tools for AI.

ID: 263881777

calendar_today10-03-2011 22:47:27

474 Tweet

2,2K Followers

736 Following

Shawn Lewis (@shawnup) 's Twitter Profile Photo

This is the improvement Claude Code needed to be great. Just keep going! I don’t care how you do it or what the context looks like. Looking forward to trying it.

CoreWeave (@coreweave) 's Twitter Profile Photo

CoreWeave is the first cloud provider to submit MLPerf Inference v5.0 results for @NVIDIA GB200 GPUs, achieving a 4X per-chip performance improvement over H200 GPUs. We are committed to delivering the fastest and most efficient AI infrastructure. hubs.la/Q03fBklc0

Bespoke Labs (@bespokelabsai) 's Twitter Profile Photo

OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents. Today, we show how to do the same with your own agents, using RL and open-source models. We used GRPO on only 100 high quality questions from the BFCL benchmark, and post-trained a 7B Qwen model to

OpenAI’s o4 just showed that multi-turn tool use is a huge deal for AI agents.
Today, we show how to do the same with your own agents, using RL and open-source models.

We used GRPO on only 100 high quality questions from the BFCL benchmark, and post-trained a 7B Qwen model to
Scott Condron (@_scottcondron) 's Twitter Profile Photo

New Evals API I’m excited to share a new API for logging evals with W&B Weave. EvaluationLogger - log_prediction - log_score - log_summary Our design goal for this API was to get out of your way and build the most flexible eval API out there, inspired by wandb.log, which our

New Evals API

I’m excited to share a new API for logging evals with W&B Weave.

EvaluationLogger
- log_prediction
- log_score
- log_summary

Our design goal for this API was to get out of your way and build the most flexible eval API out there, inspired by wandb.log, which our
Kyle Corbitt (@corbtt) 's Twitter Profile Photo

🚀 Meet ART·E—our open-source RL-trained email research agent that searches your inbox and answers questions more accurately, faster, and cheaper than o3. Let's go deeper on how we built it. 🧵

🚀 Meet ART·E—our open-source RL-trained email research agent that searches your inbox and answers questions more accurately, faster, and cheaper than o3. Let's go deeper on how we built it. 🧵
Yiping Wang (@ypwang61) 's Twitter Profile Photo

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks! 📍RLVR with one training example can boost: - Qwen2.5-Math-1.5B: 36.0% → 73.6% - Qwen2.5-Math-7B: 51.0% → 79.2% on MATH500. 📄 Paper: arxiv.org/abs/2504.20571

We only need ONE example for RLVR on LLMs to achieve significant improvement on math tasks!

📍RLVR with one training example can boost:
         - Qwen2.5-Math-1.5B: 36.0% → 73.6%
         - Qwen2.5-Math-7B: 51.0% → 79.2% 
       on MATH500.

đź“„ Paper: arxiv.org/abs/2504.20571
Chen Goldberg (@goldbergchen) 's Twitter Profile Photo

We’ve officially completed our acquisition of Weights & Biases , and I couldn’t be more excited. Combining CoreWeave high-performance AI cloud with W&B’s incredible developer tools unlocks new levels of innovation for our customers. Together, we’re building the next-gen AI cloud

We’ve officially completed our acquisition of <a href="/weights_biases/">Weights & Biases</a> , and I couldn’t be more excited.
Combining <a href="/CoreWeave/">CoreWeave</a>  high-performance AI cloud with W&amp;B’s incredible developer tools unlocks new levels of innovation for our customers.
Together, we’re building the next-gen AI cloud
Fastino (@fastinoai) 's Twitter Profile Photo

BIG NEWS: Fastino raises $17.5M Seed to launch TLMs – Task-Specific Language Models that beat GPT on accuracy and latency. Led by jon chu at Khosla Ventures + joined by George K. Mathew at Insight Partners, agracias at @valorep, Scott Johnston (ex-Docker CEO), and Lukas Biewald (CEO of

Shawn Lewis (@shawnup) 's Twitter Profile Photo

Optimize for self-confidence, no external rewards needed. Beautiful work. I have a feeling this leads to models claiming to be self-aware.