dstack (@dstackai) 's Twitter Profile
dstack

@dstackai

An open-source alternative to Kubernetes and Slurm for AI container orchestration, supporting NVIDIA, AMD, TPU, and Intel across cloud and on-prem.

ID: 1220419250743119872

linkhttps://github.com/dstackai/dstack calendar_today23-01-2020 18:53:26

582 Tweet

958 Takipçi

4 Takip Edilen

dstack (@dstackai) 's Twitter Profile Photo

🚀 New in dstack 0.19.9 *CPU, memory and GPU metrics are now retained and visible after runs complete. *The CLI now displays container exit codes for failed jobs, making it easier to understand why runs failed. Release notes: github.com/dstackai/dstac…

🚀 New in dstack 0.19.9  

*CPU, memory and GPU metrics are now retained and visible after runs complete. 
*The CLI now displays container exit codes for failed jobs, making it easier to understand why runs failed. 

Release notes: github.com/dstackai/dstac…
dstack (@dstackai) 's Twitter Profile Photo

Running @NVIDIA or AMD clusters for distributed training? We’ve put together a brief guide on setting up fast interconnects with dstack – whether you’re scaling AI workloads in the cloud or on-prem. dstack.ai/docs/guides/cl…

Andrey Cheptsov (@andrey_cheptsov) 's Twitter Profile Photo

ROCm experience Part 2. Tested verl project on AMD: * AMD example uses Ubuntu 20, but most clouds run Ubuntu 22—glibc errors with Broadcom RoCE * Broadcom NICs untested—only works with Mellanox. Building custom Ubuntu 22 + ROCm 6.2 + vLLM 0.6.3 didn’t fix RDMA * AMD example

Andrey Cheptsov (@andrey_cheptsov) 's Twitter Profile Photo

Anush Elangovan We plan to support the default AMD Docker image with dstack so users can simply specify ROCm version: github.com/dstackai/dstac… The problem is most AI frameworks today do not simply work require and use their own custom images very custom ones, incl. verl project,

dstack (@dstackai) 's Twitter Profile Photo

🚀 New in dstack 0.19.10: smarter task scheduling. Run configs now support a priority field to control scheduling order. If you're still using Slurm for ML development, time to try dstack. github.com/dstackai/dstac…

🚀 New in dstack 0.19.10: smarter task scheduling. 

Run configs now support a priority field to control scheduling order.

If you're still using Slurm for ML development, time to try dstack.

github.com/dstackai/dstac…
dstack (@dstackai) 's Twitter Profile Photo

Announcing a new example on how to use RAGEN on dstack to train reasoning agents. Under the hood the example uses verl project and ray for running Reinforcement Learning across multiple nodes. Huge thanks to Lambda for providing access to a 16xH100 1CC cluster.

dstack (@dstackai) 's Twitter Profile Photo

At NVIDIA GTC, Electronic Arts showed how their teams use dstack to spin up GPUs and orchestrate training and development at scale. 👉 Read the full case study + slides: dstack.ai/blog/ea-gtc25/

At <a href="/NVIDIAGTC/">NVIDIA GTC</a>, <a href="/EA/">Electronic Arts</a> showed how their teams use dstack to spin up GPUs and orchestrate training and development at scale.

👉 Read the full case study + slides: dstack.ai/blog/ea-gtc25/
dstack (@dstackai) 's Twitter Profile Photo

The verl project docs now include a guide on using dstack for distributed RL training across multiple nodes, without needing Kubernetes or Slurm. More on docs: verl.readthedocs.io/en/latest/inde…

The <a href="/verl_project/">verl project</a> docs now include a guide on using dstack for distributed RL training across multiple nodes, without needing Kubernetes or Slurm. 

More on docs: verl.readthedocs.io/en/latest/inde…
dstack (@dstackai) 's Twitter Profile Photo

RAGEN is a new RL framework for training reasoning agents — and now dstack is featured in their repo as a way to run multi-node training (as an alternative to K8S/Slurm). Check it out: github.com/RAGEN-AI/RAGEN

dstack (@dstackai) 's Twitter Profile Photo

🚀 v0.19.11 is out! If you don’t specify a Docker image, dstack now lets you use `uv` (in addition to `pip`). And it’s time to say goodbye to `conda`! `uv` is not only more convenient — it’s also a lot faster. Read more: github.com/dstackai/dstac…

Andrey Cheptsov (@andrey_cheptsov) 's Twitter Profile Photo

I've been a long time fan of conda, but I think its time to go! Would love to see more ML projects to switch to uv! Kudos to Astral and Charlie Marsh for amazing project!

dstack (@dstackai) 's Twitter Profile Photo

We’ve just published a new example of using dstack with Hugging Face TRL to train across multiple nodes! It lets you quickly run training on any cloud GPU or on-prem cluster—no K8s or Slurm needed. Check it out: dstack.ai/examples/distr…

We’ve just published a new example of using dstack with <a href="/huggingface/">Hugging Face</a> TRL to train across multiple nodes! It lets you quickly run training on any cloud GPU or on-prem cluster—no K8s or Slurm needed.

Check it out: dstack.ai/examples/distr…