Walter Hugo Lopez Pinaya 🍍 (@warvito) 's Twitter Profile
Walter Hugo Lopez Pinaya 🍍

@warvito

Senior Research Engineer @synthesiaIO | Ex-Research Fellow @KingsCollegeLon
Text-to-Video | Generative Models | Medical Imaging

ID: 81100667

calendar_today09-10-2009 12:49:37

2,2K Tweet

987 Followers

544 Following

Saining Xie (@sainingxie) 's Twitter Profile Photo

I used to think diffusion models struggled to denoise efficiently in high-dimensional spaces -- but I was wrong again. since RAE latent spaces are inherently high-dimensional, diffusion transformers require adaptation, but with just three simple tweaks, they perform *remarkably*

I used to think diffusion models struggled to denoise efficiently in high-dimensional spaces -- but I was wrong again.

since RAE latent spaces are inherently high-dimensional, diffusion transformers require adaptation, but with just three simple tweaks, they perform *remarkably*
Abdullah Hamdi (@eng_hemdi) 's Twitter Profile Photo

If you are attending #ICCV2025 this week please check our 3 main conference papers and 1 oral paper at the workshops covering topics on spatial intelligence and medical imaging 1- UKBOB : the biggest 3D MRI segmentation dataset of over 1 billion labeled masks + SOTa foundation

If you are attending #ICCV2025 this week please check our 3 main conference papers and 1 oral paper at the workshops covering topics on spatial intelligence and medical imaging 

1- UKBOB : the biggest 3D MRI segmentation dataset of over 1 billion labeled masks + SOTa foundation
Cai Zhou (@zhuci19) 's Twitter Profile Photo

(1/6) Check out our new paper: Coevolutionary Continuous Discrete Diffusion: Make Your Diffusion Language Model A Latent Reasoner! arxiv: arxiv.org/abs/2510.03206 Do diffusion language models (DLMs) need to be discrete? No! We show that continuous diffusion models are more

Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Nice, short post illustrating how simple text (discrete) diffusion can be. Diffusion (i.e. parallel, iterated denoising, top) is the pervasive generative paradigm in image/video, but autoregression (i.e. go left to right bottom) is the dominant paradigm in text. For audio I've

Xintao Wang (@xinntao) 's Twitter Profile Photo

🥳🥳DiT w/o VAE, but with Semantic Encoder, such as DINO! We introduce SVG (Self-supervised representation for Visual Generation) . Paper: huggingface.co/papers/2510.15… Code: github.com/shiml20/SVG

🥳🥳DiT w/o VAE, but with Semantic Encoder, such as DINO!
We introduce SVG (Self-supervised representation for Visual Generation) .
Paper: huggingface.co/papers/2510.15…
Code: github.com/shiml20/SVG
Kwang Moo Yi (@kwangmoo_yi) 's Twitter Profile Photo

Choudhury and Kim et al., "Accelerating Vision Transformers With Adaptive Patch Sizes" Transformer patches don't need to be of uniform size -- choose sizes based on entropy --> faster training/inference. Are scale-spaces gonna make a comeback?

Vaibhav (VB) Srivastav (@reach_vb) 's Twitter Profile Photo

Chinese doordash dropping MIT license foundation video models??? “We introduce LongCat-Video, a foundational video generation model with 13.6B parameters, delivering strong performance across Text-to-Video, Image-to-Video, and Video-Continuation generation tasks.”

Meituan LongCat (@meituan_longcat) 's Twitter Profile Photo

🚀 LongCat-Video Now Open-Source: Text/Image-to-Video + Video Continuation in One Model 🏆 Text/Image-to-Video Performance Hits Open-Source SOTA 🎬 Minutes-Long High-Quality Videos: No Color Drift/Quality Loss (Industry-Standout) ⚙ 13.6B Params | Strong Open-Source DiT-Based

fly51fly (@fly51fly) 's Twitter Profile Photo

[CV] Accelerating Vision Transformers with Adaptive Patch Sizes R Choudhury, J Kim, J Park, E Yang... [CMU & KAIST] (2025) arxiv.org/abs/2510.18091

[CV] Accelerating Vision Transformers with Adaptive Patch Sizes
R Choudhury, J Kim, J Park, E Yang... [CMU & KAIST] (2025)
arxiv.org/abs/2510.18091
DailyPapers (@huggingpapers) 's Twitter Profile Photo

A new Latent Diffusion Model without VAE from Kuaishou Technology is here! Introducing SVG: it ditches the VAE for self-supervised representations, enabling 62x faster training & 35x faster inference, all while boosting generative quality.

A new Latent Diffusion Model without VAE from Kuaishou Technology is here!

Introducing SVG: it ditches the VAE for self-supervised representations, enabling 62x faster training & 35x faster inference, all while boosting generative quality.
Chieh-Hsin (Jesse) Lai (@jcjesselai) 's Twitter Profile Photo

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on! 📘 We’re excited to release 《The Principles of Diffusion Models》— with Yang Song, Dongjun Kim, Yuki Mitsufuji, and Stefano Ermon. It traces the core

Tired to go back to the original papers again and again? Our monograph: a systematic and fundamental recipe you can rely on!

📘 We’re excited to release 《The Principles of Diffusion Models》— with <a href="/DrYangSong/">Yang Song</a>, <a href="/gimdong58085414/">Dongjun Kim</a>, <a href="/mittu1204/">Yuki Mitsufuji</a>, and <a href="/StefanoErmon/">Stefano Ermon</a>.

It traces the core
ℏεsam (@hesamation) 's Twitter Profile Photo

holy shit... Hugging Face cooked again! 🔥 they just dropped a free blog (BOOK) that covers the no-bs reality of building SOTA models. i haven't seen any lab/researcher go into the real decisions behind the LLM research and its nuances. this is literally a gem. Syllabus: →

holy shit... Hugging Face cooked again! 🔥

they just dropped a free blog (BOOK) that covers the no-bs reality of building SOTA models. i haven't seen any lab/researcher go into the real decisions behind the LLM research and its nuances. this is literally a gem.

Syllabus:
→
ModelScope (@maasai42) 's Twitter Profile Photo

🚀 Training 64K+ context LLMs on consumer GPUs? Now possible with Ulysses + Ring Attention! We’ve fused two sequence parallelism techniques in ModelScope SWIFT: ✅ Ulysses: Low-comm, head-split (but limited by # of attention heads) ✅ Ring Attention: Scales beyond head count

🚀 Training 64K+ context LLMs on consumer GPUs? Now possible with Ulysses + Ring Attention!

We’ve fused two sequence parallelism techniques in ModelScope SWIFT:

✅ Ulysses: Low-comm, head-split (but limited by # of attention heads)
✅ Ring Attention: Scales beyond head count
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

TabTune makes tabular AI models easy to try, compare, and trust. It hides messy prep and gives 1 simple fit, predict, evaluate flow. Work on tables is messy because every model wants different preprocessing, training modes, and metrics. This paper's technique supports 7

TabTune makes tabular AI models easy to try, compare, and trust. 

It hides messy prep and gives 1 simple fit, predict, evaluate flow.

Work on tables is messy because every model wants different preprocessing, training modes, and metrics.

This paper's technique supports 7
Leon Klein (@leonklein26) 's Twitter Profile Photo

(1/n) Can diffusion models simulate molecular dynamics instead of generating independent samples? In our NeurIPS2025 paper, we train energy-based diffusion models that can do both: - Generate independent samples - Learn the underlying potential 𝑼 🧵👇 arxiv.org/abs/2506.17139

Sean McLeish (@seanmcleish) 's Twitter Profile Photo

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales. We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis. 📜1/7

Looped latent reasoning models like TRM, HRM, Ouro and Huginn are great for reasoning, but they’re inefficient to train at larger scales.

We fix this by post training regular language models into looped models, achieving higher accuracy on a per training FLOP basis.
📜1/7
Jacob Bamberger (@jacobbamberger) 's Twitter Profile Photo

Flow Matching models often struggle to balance memorization and generalization. 😱 We set out to fix this — by using the geometry of the data manifold. Introducing Carré du Champ Flow Matching (CDCFM)🧑‍🎨🥖 — improving generalization without sacrificing sample quality.

Flow Matching models often struggle to balance memorization and generalization. 😱
We set out to fix this — by using the geometry of the data manifold. 

Introducing Carré du Champ Flow Matching (CDCFM)🧑‍🎨🥖 — improving generalization without sacrificing sample quality.
Niels Rogge (@nielsrogge) 's Twitter Profile Photo

This is a phenomenal video by Jia-Bin Huang explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting

This is a phenomenal video by <a href="/jbhuang0604/">Jia-Bin Huang</a> explaining seminal papers in computer vision, including CLIP, SimCLR, DINO v1/v2/v3 in 15 minutes 

DINO is actually a brilliant idea, I found the decision of 65k neurons in the output head pretty interesting