Soran Ghaderi (@soranghadri) 's Twitter Profile
Soran Ghaderi

@soranghadri

Looking for PhD position | Reasoning 🧠💻 (Diffusion/EBMs/Flow-based/RL) | AI MSc @uni_of_essex

GitHub: github.com/soran-ghaderi | 🍒 cuRBLAS/🍓 TorchEBM libs

ID: 891620113769738240

linkhttps://soran-ghaderi.github.io/ calendar_today30-07-2017 11:22:55

943 Tweet

186 Followers

735 Following

Soran Ghaderi (@soranghadri) 's Twitter Profile Photo

Working on cuRBLAS 🍒 - a CUDA library for randomized numerical linear algebra. It aims to enhance cuBLAS for very large-scale matrix operations with the help of probabilistic math techniques rather than pure GPU-kernel optimization (though kernel optimization is part/ 🧵 1/9

Working on cuRBLAS 🍒 - a CUDA library for randomized numerical linear algebra. 

It aims to enhance cuBLAS for very large-scale matrix operations with the help of probabilistic math techniques rather than pure GPU-kernel optimization (though kernel optimization is part/ 🧵 1/9
Soran Ghaderi (@soranghadri) 's Twitter Profile Photo

Interesting approach. But is it still considered "latent diffusion" in the same sense that we use the compressed representations? This alleviates end-to-end training of image/video generation models. Probably allows for a bunch of new loss functions as well.

Interesting approach. But is it still considered "latent diffusion" in the same sense that we use the compressed representations?

This alleviates end-to-end training of image/video generation models. 

Probably allows for a bunch of new loss functions as well.
Jessy Lin (@realjessylin) 's Twitter Profile Photo

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with AI at Meta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full

🧠 How can we equip LLMs with memory that allows them to continually learn new things?

In our new paper with <a href="/AIatMeta/">AI at Meta</a>, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge.

While full
机器之心 JIQIZHIXIN (@synced_global) 's Twitter Profile Photo

You can now generate 4-minute-long videos! UCLA, ByteDance, and UCF have just released a new paper on this. It tackles a core challenge: long-horizon video quality collapse caused by error accumulation when models generate beyond their training length. Their simple but

You can now generate 4-minute-long videos!

UCLA, ByteDance, and UCF have just released a new paper on this.

It tackles a core challenge: long-horizon video quality collapse caused by error accumulation when models generate beyond their training length.

Their simple but
Soran Ghaderi (@soranghadri) 's Twitter Profile Photo

TorchEBM Just pushed the new Strawberry 🍓 and updated the website. Tutorials, API references, examples, and developer guides. Check it out here: soran-ghaderi.github.io/torchebm/lates…

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

🇨🇳 Chinese doordash Meituan launched LongCat-Video on Hugging Face under MIT License. A small 13.6B model that unifies Text-to-Video, Image-to-Video, and Video-Continuation, targeting minutes-long coherent clips and fast 720p 30fps output. It frames every task as continuing

🇨🇳 Chinese doordash Meituan launched LongCat-Video on <a href="/huggingface/">Hugging Face</a>  under MIT License.

A small 13.6B model that unifies Text-to-Video, Image-to-Video, and Video-Continuation, targeting minutes-long coherent clips and fast 720p 30fps output.

It frames every task as continuing
Yilun Du (@du_yilun) 's Twitter Profile Photo

Sharing our work at NeurIPS Conference on reasoning with EBMs! We learn an EBM over simple subproblems and combine EBMs at test-time to solve complex reasoning problems (3-SAT, graph coloring, crosswords). Generalizes well to complex 3-SAT / graph coloring/ N-queens problems.

Yang Song (@dryangsong) 's Twitter Profile Photo

Applications change, but the principles are enduring. After a year's hard work led by Chieh-Hsin (Jesse) Lai, we are really excited to share this deep, systematic dive into the mathematical principles of diffusion models. This is a monograph we always wished we had.

Soran Ghaderi (@soranghadri) 's Twitter Profile Photo

Somehow, hardware-level randomized linear algebra! and applicable to energy-based models and other related variants. EBMs: x.com/soranghadri/st… Randomized LNA: x.com/soranghadri/st…

Sander Dieleman (@sedielem) 's Twitter Profile Photo

Generative modelling used to be about capturing the training data distribution. Interestingly, this stopped being the case when we started actually using them🤔 We tweak temps, use classifier-free guidance and post-train to get a distribution better than the training data.

Xun Huang (@xunhuang1995) 's Twitter Profile Photo

We present MotionStream — real-time, long-duration video generation that you can interactively control just by dragging your mouse. All videos here are raw, real-time screen captures without any post-processing. Model runs on a single H100 at 29 FPS and 0.4s latency.

Dileep George (@dileeplearning) 's Twitter Profile Photo

AI Consciousness, qualia, and personhood...my current thoughts. Can AI systems have consciousness? Yes, I think it is possible to build AI systems to have consciousness. While we haven’t pinned down exactly what it means, we will. Consciousness is related to information

Leon Klein (@leonklein26) 's Twitter Profile Photo

(1/n) Can diffusion models simulate molecular dynamics instead of generating independent samples? In our NeurIPS2025 paper, we train energy-based diffusion models that can do both: - Generate independent samples - Learn the underlying potential 𝑼 🧵👇 arxiv.org/abs/2506.17139

Randall Balestriero (@randall_balestr) 's Twitter Profile Photo

LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...) - 60+ arch., up to 2B params - 10+ datasets - in-domain training (>DINOv3) - corr(train loss, test perf)=95% Paper: arxiv.org/pdf/2511.08544 Code: github.com/rbalestr-lab/l…

Jeffrey Emanuel (@doodlestein) 's Twitter Profile Photo

Just read through the new LeJEPA paper by Yann LeCun and Randall Balestriero. I’ve been curious to know what Yann’s been working on lately, especially considering all his criticisms of LLMs (which I disagree with, as I think LLMs will keep improving and will take us to ASI fairly

Just read through the new LeJEPA paper by Yann LeCun and Randall Balestriero. I’ve been curious to know what Yann’s been working on lately, especially considering all his criticisms of LLMs (which I disagree with, as I think LLMs will keep improving and will take us to ASI fairly
François Chollet (@fchollet) 's Twitter Profile Photo

The ladder of intelligence is the ladder of abstraction. L1: Memorizing answers (no generalization) L2: Interpolative retrieval of answers, pattern matching, memorizing answer-generating rules (local generalization) L3: Synthesizing causal rules on the fly (strong