Alexander S (@devdef) 's Twitter Profile
Alexander S

@devdef

warpfusion, ArcaneGAN, face2comics.
All tweets are sarcastic unless stated otherwise.

ID: 435945593

linkhttp://sxe.la calendar_today13-12-2011 16:26:01

2,2K Tweet

7,7K Followers

1,1K Following

Alexander S (@devdef) 's Twitter Profile Photo

Hey @FAL guys, can you please fix the topaz Topaz Video Upscale space fal.ai - it can't upscale vertical video even x2, as it's hitting the horizontal 4k limits :D

Saining Xie (@sainingxie) 's Twitter Profile Photo

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right. today, we introduce Representation Autoencoders (RAE). >> Retire VAEs. Use RAEs. 👇(1/n)

three years ago, DiT replaced the legacy unet with a transformer-based denoising backbone. we knew the bulky VAEs would be the next to go -- we just waited until we could do it right.

today, we introduce Representation Autoencoders (RAE).

>> Retire VAEs. Use RAEs. 👇(1/n)
Thomas Wolf (@thom_wolf) 's Twitter Profile Photo

Everyone wants to get into robotics. No one knows where to start. LeRobot's Francesco just dropped a 70-page crash course that takes you from zero to cutting-edge: - RL sim/real - ACT, Diffusion policies - VLAs, SmolVLA, Pi-0 Absolute gold if you want to catch up fast.

Robert Youssef (@rryssf_) 's Twitter Profile Photo

Holy shit... Tencent researchers just killed fine-tuning AND reinforcement learning in one shot 😳 They call it Training-Free GRPO (Group Relative Policy Optimization). Instead of updating weights, the model literally learns from 'its own experiences' like an evolving memory

Holy shit... Tencent researchers just killed fine-tuning AND reinforcement learning in one shot 😳

They call it Training-Free GRPO (Group Relative Policy Optimization).

Instead of updating weights, the model literally learns from 'its own experiences' like an evolving memory
Mago (@mago_studio_ai) 's Twitter Profile Photo

Introducing our new seamless workflow update in Mago 💚 🌀 New Flow – No more empty tracks! Generate multiple renders in just a few clicks. 🧠 New Img2Img Model: Qwen Edit+ for accurate image editing. ⭐ Star Ratings –Keep track of your top creations.

Ankit Goyal (@imankitgoyal) 's Twitter Profile Photo

What's the right architecture for a VLA? VLM + custom action heads (π₀)? VLM with special discrete action tokens (OpenVLA)? Custom design on top of the VLM (OpenVLA-OFT)? Or... VLM with ZERO modifications? Just predict action as text. The results will surprise you. VLA-0:

Robert Youssef (@rryssf_) 's Twitter Profile Photo

Holy shit… Baidu just dropped the most efficient multimodal model ever. It’s called PaddleOCR-VL a 0.9B parameter beast that outperforms GPT-4o, Gemini 2.5, and every doc-AI model on the planet. This thing reads 109 languages, parses text, tables, formulas, charts, and still

Holy shit… Baidu just dropped the most efficient multimodal model ever.

It’s called PaddleOCR-VL a 0.9B parameter beast that outperforms GPT-4o, Gemini 2.5, and every doc-AI model on the planet.

This thing reads 109 languages, parses text, tables, formulas, charts, and still