Soran Ghaderi (@soranghadri) Twitter Tweets • TwiCopy

Soran Ghaderi

8 months ago

Working on cuRBLAS 🍒 - a CUDA library for randomized numerical linear algebra. It aims to enhance cuBLAS for very large-scale matrix operations with the help of probabilistic math techniques rather than pure GPU-kernel optimization (though kernel optimization is part/ 🧵 1/9

thumb_up_off_alt9

chat_bubble_outline2

repeat3

shareShare

Soran Ghaderi

@soranghadri

6 months ago

Interesting approach. But is it still considered "latent diffusion" in the same sense that we use the compressed representations? This alleviates end-to-end training of image/video generation models. Probably allows for a bunch of new loss functions as well.

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

Jessy Lin

@realjessylin

6 months ago

🧠 How can we equip LLMs with memory that allows them to continually learn new things? In our new paper with AI at Meta, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge. While full

🧠 How can we equip LLMs with memory that allows them to continually learn new things?

In our new paper with <a href="/AIatMeta/">AI at Meta</a>, we show how sparsely finetuning memory layers enables targeted updates for continual learning, w/ minimal interference with existing knowledge.

While full

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat195

shareShare

机器之心 JIQIZHIXIN

@synced_global

6 months ago

You can now generate 4-minute-long videos! UCLA, ByteDance, and UCF have just released a new paper on this. It tackles a core challenge: long-horizon video quality collapse caused by error accumulation when models generate beyond their training length. Their simple but

thumb_up_off_alt116

chat_bubble_outline0

repeat22

shareShare

Soran Ghaderi

@soranghadri

6 months ago

Knowledge is the log likelihood of the real world.

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Soran Ghaderi

@soranghadri

6 months ago

TorchEBM Just pushed the new Strawberry 🍓 and updated the website. Tutorials, API references, examples, and developer guides. Check it out here: soran-ghaderi.github.io/torchebm/lates…

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Rohan Paul

@rohanpaul_ai

6 months ago

🇨🇳 Chinese doordash Meituan launched LongCat-Video on Hugging Face under MIT License. A small 13.6B model that unifies Text-to-Video, Image-to-Video, and Video-Continuation, targeting minutes-long coherent clips and fast 720p 30fps output. It frames every task as continuing

🇨🇳 Chinese doordash Meituan launched LongCat-Video on <a href="/huggingface/">Hugging Face</a> under MIT License.

A small 13.6B model that unifies Text-to-Video, Image-to-Video, and Video-Continuation, targeting minutes-long coherent clips and fast 720p 30fps output.

It frames every task as continuing

thumb_up_off_alt162

chat_bubble_outline8

repeat30

shareShare

Yilun Du

@du_yilun

6 months ago

Sharing our work at NeurIPS Conference on reasoning with EBMs! We learn an EBM over simple subproblems and combine EBMs at test-time to solve complex reasoning problems (3-SAT, graph coloring, crosswords). Generalizes well to complex 3-SAT / graph coloring/ N-queens problems.

thumb_up_off_alt346

chat_bubble_outline6

repeat39

shareShare

Yang Song

@dryangsong

6 months ago

Applications change, but the principles are enduring. After a year's hard work led by Chieh-Hsin (Jesse) Lai, we are really excited to share this deep, systematic dive into the mathematical principles of diffusion models. This is a monograph we always wished we had.

thumb_up_off_alt439

chat_bubble_outline7

repeat40

shareShare

Soran Ghaderi

@soranghadri

6 months ago

Somehow, hardware-level randomized linear algebra! and applicable to energy-based models and other related variants. EBMs: x.com/soranghadri/st… Randomized LNA: x.com/soranghadri/st…

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Soran Ghaderi

@soranghadri

6 months ago

The principles of diffusion models.

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Sander Dieleman

@sedielem

6 months ago

Generative modelling used to be about capturing the training data distribution. Interestingly, this stopped being the case when we started actually using them🤔 We tweak temps, use classifier-free guidance and post-train to get a distribution better than the training data.

thumb_up_off_alt269

chat_bubble_outline17

repeat14

shareShare

Xun Huang

@xunhuang1995

6 months ago

We present MotionStream — real-time, long-duration video generation that you can interactively control just by dragging your mouse. All videos here are raw, real-time screen captures without any post-processing. Model runs on a single H100 at 29 FPS and 0.4s latency.

thumb_up_off_alt981

chat_bubble_outline35

repeat135

shareShare

Jeff Liang

@liangjeff95

6 months ago

One of the best talks I've ever had for World Model. Definitely worth watching!

thumb_up_off_alt263

chat_bubble_outline2

repeat35

shareShare

Dileep George

@dileeplearning

6 months ago

AI Consciousness, qualia, and personhood...my current thoughts. Can AI systems have consciousness? Yes, I think it is possible to build AI systems to have consciousness. While we haven’t pinned down exactly what it means, we will. Consciousness is related to information

thumb_up_off_alt78

chat_bubble_outline49

repeat10

shareShare

Leon Klein

@leonklein26

6 months ago

(1/n) Can diffusion models simulate molecular dynamics instead of generating independent samples? In our NeurIPS2025 paper, we train energy-based diffusion models that can do both: - Generate independent samples - Learn the underlying potential 𝑼 🧵👇 arxiv.org/abs/2506.17139

thumb_up_off_alt702

chat_bubble_outline11

repeat119

shareShare

Randall Balestriero

@randall_balestr

5 months ago

LeJEPA: a novel pretraining paradigm free of the (many) heuristics we relied on (stop-grad, teacher, ...) - 60+ arch., up to 2B params - 10+ datasets - in-domain training (>DINOv3) - corr(train loss, test perf)=95% Paper: arxiv.org/pdf/2511.08544 Code: github.com/rbalestr-lab/l…

thumb_up_off_alt1,1K

chat_bubble_outline38

repeat198

shareShare

Jeffrey Emanuel

@doodlestein

5 months ago

Just read through the new LeJEPA paper by Yann LeCun and Randall Balestriero. I’ve been curious to know what Yann’s been working on lately, especially considering all his criticisms of LLMs (which I disagree with, as I think LLMs will keep improving and will take us to ASI fairly

thumb_up_off_alt938

chat_bubble_outline35

repeat99

shareShare

Soran Ghaderi

@soranghadri

5 months ago

Electromagnetic field is actually consisted of virtual photons! I may do another post on this with more details.

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

François Chollet

@fchollet

5 months ago

The ladder of intelligence is the ladder of abstraction. L1: Memorizing answers (no generalization) L2: Interpolative retrieval of answers, pattern matching, memorizing answer-generating rules (local generalization) L3: Synthesizing causal rules on the fly (strong

thumb_up_off_alt2,2K

chat_bubble_outline104

repeat310

shareShare