UC Berkeley Sky (@berkeleysky) 's Twitter Profile
UC Berkeley Sky

@berkeleysky

Sky Computing - looking for the Berkeley Skydeck? They’re on the other side of Campus from us @SkyDeck_Cal.

ID: 1465408242930970625

linkhttps://sky.cs.berkeley.edu/ calendar_today29-11-2021 19:52:27

21 Tweet

805 Followers

13 Following

NovaSky (@novaskyai) 's Twitter Profile Photo

1/8 🚀 Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain NovaSky . S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* > o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).

1/8 🚀
Introducing S*: Test-Time Scaling for Code Generation, start of our releases in the coding domain <a href="/NovaSkyAI/">NovaSky</a> .

S* enables (1) non-reasoning models surpass reasoning models: GPT-4o-mini + S* &gt; o1-preview. (2) open models compete SOTA: R1-Distilled-32B +S* ~= o1 (high).
Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs! We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.

🧵Introducing LangProBe: the first benchmark testing where and how composing LLMs into language programs affects cost-quality tradeoffs!

We find that, on avg across diverse tasks, smaller models within optimized programs beat calls to larger models at a fraction of the cost.
Ember (@pyemberproject) 's Twitter Profile Photo

Hello World! 👋 Project Ember is a compositional framework for compound AI systems. It gives ML researchers a toolset to achieve comparable performance to frontier LLMs at 1/1000th the cost (or less) with inference-time scaling. github.com/PyEmber/ember

Agentica Project (@agentica_) 's Twitter Profile Photo

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math. The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥 Links below:

Introducing DeepCoder-14B-Preview - our fully open-sourced reasoning model reaching o1 and o3-mini level on coding and math.

The best part is, we’re releasing everything: not just the model, but the dataset, code, and training recipe—so you can train it yourself!🔥

Links below:
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

We're excited to invite everyone to a new Beta version of LMArena! 🎉 For months, we’ve been poring through community feedback to improve the site—fixing errors/bugs, improving our UI layout, and more. To keep supporting the development and continual improvement of this

darya (@daryakaviani) 's Twitter Profile Photo

Thrilled to share our IEEE S&P '25 work "Myco 🌳🍄: Unlocking Polylogarithmic Accesses in Metadata-Private Messaging" with Deevashwer, Bhargav Annem | In Tokyo 🇯🇵, Raluca Ada Popa. We break a decade-old asymptotic barrier in cryptographic metadata-private messaging. eprint.iacr.org/2025/687👇

Melissa Pan (@melissapan) 's Twitter Profile Photo

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️
🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks!

Paper: arxiv.org/pdf/2503.13657
Code: github.com/multi-agent-sy…

🧵1/n
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

Around this time 2 years ago, the community helped us launch the very first Arena leaderboard! Today we’re publishing a blog to celebrate everything we’ve built together on LMArena! 🥳👏 Highlights: ☑️ 3M+ community votes 🤖 400+ models ranked across text, vision,

Around this time 2 years ago, the community helped us launch the very first Arena leaderboard!

Today we’re publishing a blog to celebrate everything we’ve built together on LMArena! 🥳👏

Highlights:
☑️ 3M+ community votes
🤖 400+ models ranked across text, vision,
SkyPilot (@skypilot_org) 's Twitter Profile Photo

What a night! Huge thanks to everyone who came out to our first SkyPilot meetup — a packed house of builders and insightful convos.💥 Thanks to all speakers (sisil mehta Abridge, Woosuk Kwon vLLM, Ion Stoica, et al) for sharing SkyPilot use cases, and Anyscale

What a night! Huge thanks to everyone who came out to our first SkyPilot meetup — a packed house of builders and insightful convos.💥

Thanks to all speakers (<a href="/sisilmehta/">sisil mehta</a> <a href="/AbridgeHQ/">Abridge</a>, <a href="/woosuk_k/">Woosuk Kwon</a> <a href="/vllm_project/">vLLM</a>, <a href="/istoica05/">Ion Stoica</a>, et al) for sharing SkyPilot use cases, and <a href="/anyscalecompute/">Anyscale</a>
Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

Real world AI pipelines are often compound, multi-module, and multi-step programs—unlike most RL/GRPO implementations today which optimize a single agent. 🚨 Super excited to release dspy.GRPO, which lets you GRPO tune any arbitrary multi-module, multi-step DSPy program, with

NovaSky (@novaskyai) 's Twitter Profile Photo

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50

1/N Introducing SkyRL-v0, our RL training pipeline enabling efficient RL training for long-horizon, real-environment tasks like SWE-Bench. We also open-source a series of our early trained models to showcase the potential of end-to-end online RL training on long-horizon (20-50
lmarena.ai (formerly lmsys.org) (@lmarena_ai) 's Twitter Profile Photo

📢We’re excited to share that we’ve raised $100M in seed funding to support LMArena and continue our research on reliable AI. Led by a16z and UC Investments (University of California), we're proud to have the support of those that believe in both the science and the mission. We’re

Mir Miroyan (@mirmiroyan) 's Twitter Profile Photo

We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs. We also share a comprehensive report on user preferences and model performance in the search-enabled setting. Paper, dataset, and code in 🧵

We release Search Arena 🌐 — the first large-scale (24k+) dataset of in-the-wild user interactions with search-augmented LLMs.

We also share a comprehensive report on user preferences and model performance in the search-enabled setting.

Paper, dataset, and code in 🧵
vLLM (@vllm_project) 's Twitter Profile Photo

👀 Look what just arrived at UC Berkeley Sky! 🌟 A shiny MI355X system. Huge thanks to AMD for supporting open source and we are looking forward to getting it set up in the next few days!

👀 Look what just arrived at <a href="/BerkeleySky/">UC Berkeley Sky</a>! 🌟 A shiny MI355X system. Huge thanks to <a href="/AMD/">AMD</a> for supporting open source and we are looking forward to getting it set up in the next few days!
uccl_project (@uccl_proj) 's Twitter Profile Photo

1/N 📢 Introducing UCCL (Ultra & Unified CCL), an efficient collective communication library for ML training and inference, outperforming NCCL by up to 2.5x 🚀 Code: github.com/uccl-project/u… Blog: uccl-project.github.io/posts/about-uc… Results: AllReduce on 6 HGX across 2 racks over RoCE RDMA

1/N 📢 Introducing UCCL (Ultra &amp; Unified CCL), an efficient collective communication library for ML training and inference, outperforming NCCL by up to 2.5x 🚀

Code: github.com/uccl-project/u…
Blog: uccl-project.github.io/posts/about-uc…
Results: AllReduce on 6 HGX across 2 racks over RoCE RDMA
NovaSky (@novaskyai) 's Twitter Profile Photo

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead. 🧵👇 Blog: novasky-ai.notion.site/skyrl-v01 Code: github.com/NovaSky-AI/Sky…

✨Release: We upgraded SkyRL into a highly-modular, performant RL framework for training LLMs. We prioritized modularity—easily prototype new algorithms, environments, and training logic with minimal overhead.

🧵👇
Blog: novasky-ai.notion.site/skyrl-v01
Code: github.com/NovaSky-AI/Sky…
Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

How does prompt optimization compare to RL algos like GRPO?

GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked &amp; what didn't.

Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵