Oh, you're writing CUDA kernels? Everyone's on Triton now. Just kidding, we're all on Mojo. We're using cuTile. We're using ROCm. We have an in-house DSL compiler targeting the NVGPU MLIR dialect but wait, Tile IR just dropped so we're going to target that instead. Our PM is on
only jobs left in 2030
- associate claude operator
- junior claude operator
- senior claude operator
- principal claude operator
- partner claude operator
There was a flippening in the last few months: you can run your own LLM inference with rates and performance that match or beat LLM inference APIs.
We wrote up the techniques to do so in a new guide, along with code samples.
modal.com/docs/guide/hig…
Notice we're not yet recommending Blackwells for most LLM inference workloads. In our experience, open source just isn't there yet, though we've internally managed it in a few cases.
We're making contributions to SGLang and Tri Dao's FlashAttention 4 to change this.
Completely right. Growing up in Cork in the early 2000s meant not knowing many people my age interested in history, economics, politics, or the genres of music I was into.
If it wasn't for online discussion forums (the contents of which have mostly shifted on to sites like
We were discussing how much disk space we needed and of the engineers said his estimate was “conservative”. I don’t want any of that partisan crap so I fired him on the spot.
Being super visibly pregnant in Manhattan is so funny bc it reminds you that there actually are a few things you can do in nyc that are still edgy enough to get you stares