Coiled (@coiledhq) 's Twitter Profile
Coiled

@coiledhq

Scale Python with Dask

ID: 1226345064533651456

linkhttps://coiled.io calendar_today09-02-2020 03:21:21

1,1K Tweet

3,3K Takipçi

54 Takip Edilen

Uwe L. Korn (@xhochy) 's Twitter Profile Photo

On the 14th of May, QuantCo Karlsruhe will host the next PyData Südwest- 🤖 AI in 🐍 Python Meetup. Florian Jetter will join us to talk about Dask's impressive speed, and Pavel, Adrian, and Bela show how to manage hundreds of Python Sign up at meetup.com/pydata-suedwes…

Anthony Wu (@anthonywu) 's Twitter Profile Photo

Recommendation of the day: `coiled notebook start` to run a remote Jupyter Lab from big machines in cloud but with file sync that feel "local". Demo from Coiled youtu.be/mibhDHYun0M #python #jupyter

Matthew Rocklin (@mrocklin) 's Twitter Profile Photo

TPC-H Cloud Benchmarks: Spark, Dask, DuckDB, Polars Across scales: 10 GiB, 100 GiB, 1 TiB, 10 TiB Hardware: MBP and AWS It was a fun experiment. No project wins uniformly. DuckDB and Dask do pretty well. docs.coiled.io/blog/tpch.html

TPC-H Cloud Benchmarks: Spark, Dask, DuckDB, Polars

Across scales: 10 GiB, 100 GiB, 1 TiB, 10 TiB
Hardware: MBP and AWS

It was a fun experiment.  No project wins uniformly.  DuckDB and Dask do pretty well.

docs.coiled.io/blog/tpch.html
Dask (@dask_dev) 's Twitter Profile Photo

Dask DataFrame is now 20x faster. Some of most prominent changes include: - Apache Arrow support in pandas - Better shuffling algorithm for faster joins - Automatic query optimization Learn more: docs.coiled.io/blog/dask-data…

Dask DataFrame is now 20x faster. Some of most prominent changes include:
- Apache Arrow support in <a href="/pandas_dev/">pandas</a> 
- Better shuffling algorithm for faster joins
- Automatic query optimization

Learn more: docs.coiled.io/blog/dask-data…
Coiled (@coiledhq) 's Twitter Profile Photo

Run a Python script on a cloud GPU with one line of code. Training a PyTorch model training takes ~10 minutes and cost ~$0.12 on the NVIDIA T4 GPU on AWS. Coiled handles provisioning hardware, setting up drivers, and installing CUDA-compiled PyTorch. docs.coiled.io/user_guide/gpu…

Run a Python script on a cloud GPU with one line of code.

Training a <a href="/PyTorch/">PyTorch</a> model training takes ~10 minutes and cost ~$0.12 on the NVIDIA T4 GPU on AWS. Coiled handles provisioning hardware, setting up drivers, and installing CUDA-compiled PyTorch.

docs.coiled.io/user_guide/gpu…
Earthmover (@earthmoverhq) 's Twitter Profile Photo

Arraylake and Coiled work great together! You can use Coiled to manage your cloud computing infrastructure with Dask, and store your data as zarr_dev in Arraylake. We just added new a documentation page about our integration with Coiled. docs.earthmover.io/integrations/c…

Matthew Rocklin (@mrocklin) 's Twitter Profile Photo

We're to build a 100-TB scale geospatial benchmark suite docs.coiled.io/blog/geospatia… We've seen an uptick in geospatial users and in challenges of the Xarray/Dask stack to scale beyond ~500-GiB. This post presents a call for benchmark workloads.

Arpit Bansal (@arpit__bansal) 's Twitter Profile Photo

Implemented Coiled into our product to offload data syncing from BigQuery to Neo4j 🤯 Works like butter 🧈 Now I don’t have to worry about scaling VMs dynamically to handle variable loads.

Quentin Lhoest 🤗 (@lhoestq) 's Twitter Profile Photo

New blog post: Scale AI-based Data Processing EASY The FineWeb-Edu dataset comes from processing 45TB (🤯) of FineWeb And it uses a Language Model to classify the educational level of the text 😭😭 Still, we reproduced it in a few lines of code ! The key ? HF + Dask 😎

New blog post: Scale AI-based Data Processing EASY

The FineWeb-Edu dataset comes from processing 45TB (🤯) of FineWeb

And it uses a Language Model to classify the educational level of the text 😭😭

Still, we reproduced it in a few lines of code !
The key ? HF + Dask 😎
Matthew Rocklin (@mrocklin) 's Twitter Profile Photo

New Post: SLURM-Style Job Arrays on the Cloud docs.coiled.io/blog/slurm-job… HPC Job scripts were the first form of parallelism I ever used as a graduate student. They're dead simple and accessible to almost anyone. We replicated the API with Coiled. It feels pretty slick to me 🙂

Xarray (@xarray_dev) 's Twitter Profile Photo

Read about the latest improvement to GroupBy.map with Dask: xarray.dev/blog/dask-detr… Thanks to Patrick Hoefler of Coiled for the great work here!

Coiled (@coiledhq) 's Twitter Profile Photo

We're now on Bluesky! Should be pretty easy to find us, since bluesky lets us use our coiled.io domain as our handle ☀️

Matthew Rocklin (@mrocklin) 's Twitter Profile Photo

New Post: Cloud Computing is Broken matthewrocklin.com/cloud-is-broke… Investor asks: "What's next for Data/Cloud Infrastructure?" My answer: "Boring stuff. People struggle with basics." Cloud feels like MP3 players before iPod. In theory everything is good. In practice adoption is low

Coiled (@coiledhq) 's Twitter Profile Photo

We're big fans of rich for a nice terminal experience, but have found sometimes folks log things even rich can't handle. In the latest coiled=1.67.0 release, coiled logs automatically falls back to non-rich printing in these situations. Release notes: docs.coiled.io/user_guide/cha…

We're big fans of rich for a nice terminal experience, but have found sometimes folks log things even rich can't handle. 

In the latest coiled=1.67.0 release, coiled logs automatically falls back to non-rich printing in these situations. 

Release notes: docs.coiled.io/user_guide/cha…
Coiled (@coiledhq) 's Twitter Profile Photo

Calculating quantiles, a common application in #geospatial workloads, used to be slow due to GIL contention in NumPy. The new implementation in Dask + Xarray is up to a hundred times faster and scales independently of the number of threads 🥳. docs.coiled.io/blog/array-qua…

Matthew Rocklin (@mrocklin) 's Twitter Profile Photo

Coiled 2024 in Review docs.coiled.io/blog/2024-eoy.… It’s the time when companies issue year-end summaries, acclaiming success (or not), and forecasting incredible growth for the next year (or not). I thought I’d do something similar for Coiled. It’s been quite a year for us ...

Coiled (@coiledhq) 's Twitter Profile Photo

🔨 Job setup option for Coiled Batch Use `--host-setup-script` to configure your VM before your batch job starts. Easily: ✅ Install dependencies ✅ Mount cloud storage ✅ Handle authentication or any other setup your jobs need. docs.coiled.io/user_guide/bat…

Coiled (@coiledhq) 's Twitter Profile Photo

Easily configure shared memory size for CLI jobs with `--docker-shm-size`. Training PyTorch models on a GPU and need more memory? Ever run into "Error: No space left on device"? Customize Docker shared memory size with `--docker-shm-size`. docs.coiled.io/user_guide/cli…

Easily configure shared memory size for CLI jobs with `--docker-shm-size`.

Training PyTorch models on a GPU and need more memory? Ever run into "Error: No space left on device"?

Customize Docker shared memory size with `--docker-shm-size`.

docs.coiled.io/user_guide/cli…