NYRE (@sleenyre) 's Twitter Profile
NYRE

@sleenyre

ML @krea_ai, prev fire fighter oleve.co

ID: 1656820518119407619

linkhttps://re-n-y.github.io/devlog/rambling/ calendar_today12-05-2023 00:36:20

169 Tweet

266 Takipçi

178 Takip Edilen

NYRE (@sleenyre) 's Twitter Profile Photo

I've been also wanting to write a blog on pretraining + post training. Open challenges and solutions to new round of generative models. Hopefully I will get to it this month.

KREA AI (@krea_ai) 's Twitter Profile Photo

today we're open-sourcing Krea Realtime. this 14B autoregressive model is 10x larger than any open-source equivalent, and it can generate long-form videos at 11 fps on a single B200. weights and technical report below 👇

Vik Paruchuri (@vikparuchuri) 's Twitter Profile Photo

I'm excited to announce that Chandra OCR is open source! - Full layout information - Extracts and captions images and diagrams - Strong handwriting, form, table support - Works with transformers and vLLM

I'm excited to announce that Chandra OCR is open source!

- Full layout information
- Extracts and captions images and diagrams
- Strong handwriting, form, table support
- Works with transformers and vLLM
NYRE (@sleenyre) 's Twitter Profile Photo

Torchcomm + Monarch Christmas came early. I know what I'm doing this weekend :) pytorch.org/blog/torchcomm… pytorch.org/blog/introduci…

Sayak Paul (@risingsayak) 's Twitter Profile Photo

With simple changes, I was able to cut down KREA AI's new real-time video gen's timing from 25.54s to 18.14s 🔥🚀 1. FA3 through `kernels` 2. Regional compilation 3. Selective (FP8) quantization Notes are in 🧵 below

NYRE (@sleenyre) 's Twitter Profile Photo

Here's a fun ML engineering question. In TorchTitan / Lingua, qkv projections are unfused (i.e. separate three linear layers) which is known to be inefficient. Is this on purpose? If so, why?

Physical Intelligence (@physical_int) 's Twitter Profile Photo

Our model can now learn from its own experience with RL! Our new π*0.6 model can more than double throughput over a base model trained without RL, and can perform real-world tasks: making espresso drinks, folding diverse laundry, and assembling boxes. More in the thread below.