UC Berkeley Sky (@berkeleysky) Twitter Tweets • TwiCopy

vLLM

9 months ago

We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you NVIDIA!

We're excited to receive our first #NVIDIADGX B200 system which we'll use for vLLM research and development! Thank you <a href="/nvidia/">NVIDIA</a>!

thumb_up_off_alt975

chat_bubble_outline23

repeat42

shareShare

vLLM

@vllm_project

9 months ago

NVIDIA Arrival from the FedEx truck near Soda Hall. UC Berkeley Sky, the lab where vLLM was born, will be the home for this machine.

<a href="/nvidia/">NVIDIA</a> Arrival from the FedEx truck near Soda Hall. <a href="/BerkeleySky/">UC Berkeley Sky</a>, the lab where vLLM was born, will be the home for this machine.

thumb_up_off_alt75

chat_bubble_outline1

repeat4

shareShare

1/8 🚀 Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain NovaSky . S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

1/8 🚀
Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain <a href="/NovaSkyAI/">NovaSky</a> .

S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

thumb_up_off_alt205

chat_bubble_outline7

repeat49

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

9 months ago

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

$🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.$

thumb_up_off_alt142

chat_bubble_outline3

repeat42

shareShare

Ember

@pyemberproject

8 months ago

Hello World! 👋 Project Ember is a compositional framework for compound AI systems. It gives ML researchers a toolset to achieve comparable performance to frontier LLMs at 1/1000th the cost (or less) with inference-time scaling. github.com/PyEmber/ember

thumb_up_off_alt18

chat_bubble_outline1

repeat7

shareShare

Agentica Project

@agentica_

8 months ago

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

thumb_up_off_alt886

chat_bubble_outline23

repeat224

shareShare

Jared Quincy Davis

@jaredq_

8 months ago

Ember: an inference-time scaling architecture framework 🧵 (1/8)

thumb_up_off_alt276

chat_bubble_outline10

repeat106

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

8 months ago

We're excited to invite everyone to a new Beta version of LMArena! 🎉 For months, we’ve been poring through community feedback to improve the site—fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this

thumb_up_off_alt690

chat_bubble_outline33

repeat70

shareShare

darya

@daryakaviani

8 months ago

Thrilled to share our IEEE S&P '25 work "Myco 🌳🍄: Unlocking Polylogarithmic Accesses in Metadata-Private Messaging" with Deevashwer, Bhargav Annem | In Tokyo 🇯🇵, Raluca Ada Popa. We break a decade-old asymptotic barrier in cryptographic metadata-private messaging. eprint.iacr.org/2025/687👇

thumb_up_off_alt77

chat_bubble_outline4

repeat13

shareShare

Melissa Pan

@melissapan

7 months ago

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n

thumb_up_off_alt186

chat_bubble_outline4

repeat54

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

7 months ago

Around this time 2 years ago, the community helped us launch the very first Arena leaderboard! Today we’re publishing a blog to celebrate everything we’ve built together on LMArena! 🥳👏 Highlights: ☑️ 3M+ community votes 🤖 400+ models ranked across text, vision,

thumb_up_off_alt231

chat_bubble_outline9

repeat23

shareShare

SkyPilot

@skypilot_org

7 months ago

What a night! Huge thanks to everyone who came out to our first SkyPilot meetup — a packed house of builders and insightful convos.💥 Thanks to all speakers (sisil mehta Abridge, Woosuk Kwon vLLM, Ion Stoica, et al) for sharing SkyPilot use cases, and Anyscale

thumb_up_off_alt22

chat_bubble_outline1

repeat6

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

7 months ago

Real world AI pipelines are often compound, multi-module, and multi-step programs—unlike most RL/GRPO implementations today which optimize a single agent. 🚨 Super excited to release dspy.GRPO, which lets you GRPO tune any arbitrary multi-module, multi-step DSPy program, with

thumb_up_off_alt60

chat_bubble_outline0

repeat15

shareShare

NovaSky

@novaskyai

7 months ago

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50

thumb_up_off_alt266

chat_bubble_outline2

repeat68

shareShare

lmarena.ai (formerly lmsys.org)

@lmarena_ai

6 months ago

📢We’re excited to share that we’ve raised $100M in seed funding to support LMArena and continue our research on reliable AI. Led by a16z and UC Investments (University of California), we're proud to have the support of those that believe in both the science and the mission. We’re

thumb_up_off_alt796

chat_bubble_outline62

repeat85

shareShare

Mir Miroyan

@mirmiroyan

6 months ago

We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs. We also share a comprehensive report on user preferences and model performance in the search-enabled setting. Paper, dataset, and code in 🧵

thumb_up_off_alt217

chat_bubble_outline5

repeat39

shareShare

vLLM

@vllm_project

6 months ago

👀 Look what just arrived at UC Berkeley Sky! 🌟 A shiny MI355X system. Huge thanks to AMD for supporting open source and we are looking forward to getting it set up in the next few days!

👀 Look what just arrived at <a href="/BerkeleySky/">UC Berkeley Sky</a>! 🌟 A shiny MI355X system. Huge thanks to <a href="/AMD/">AMD</a> for supporting open source and we are looking forward to getting it set up in the next few days!

thumb_up_off_alt273

chat_bubble_outline3

repeat24

shareShare

uccl_project

@uccl_proj

6 months ago

1/N 📢 Introducing UCCL (Ultra & Unified CCL), an efficient collective communication library for ML training and inference, outperforming NCCL by up to 2.5x 🚀 Code: github.com/uccl-project/u… Blog: uccl-project.github.io/posts/about-uc… Results: AllReduce on 6 HGX across 2 racks over RoCE RDMA

thumb_up_off_alt31

chat_bubble_outline1

repeat13

shareShare

NovaSky

@novaskyai

5 months ago

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: novasky-ai.notion.site/skyrl-v01 Code: github.com/NovaSky-AI/Sky…

thumb_up_off_alt202

chat_bubble_outline2

repeat43

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

4 months ago

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

thumb_up_off_alt458

chat_bubble_outline15

repeat87

shareShare