Ujjwal Upadhyay (@theujjwal9) Twitter Tweets • TwiCopy

Anne Ouyang

9 months ago

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6) [🔗 link in final post]

thumb_up_off_alt776

chat_bubble_outline25

repeat95

shareShare

Toby Ford-Monroe

@tobyfordmonroe

9 months ago

Very interesting paper introducing SpookyBench, which is one of the only benchmarks where the VLM-human gap remains near 100 percentage points Due to architectural limitations, no VLM can perceive meaning dispersed across individually meaningless frames ("Temporal Encoding"). In

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Rohan Paul

@rohanpaul_ai

9 months ago

Time Blindness: Why Video-Language Models Can’t See What Humans Can? LLMs struggle capturing purely temporal patterns when spatial information is obscured. This paper introduces SpookyBench to evaluate this limitation, showing a significant gap compared to human perception.

thumb_up_off_alt12

chat_bubble_outline0

repeat3

shareShare

TuringPost

@theturingpost

9 months ago

Log-linear attention — a new type of attention proposed by Massachusetts Institute of Technology (MIT) which is: - fast and efficient as linear attention - expressive as softmax It uses a small but growing number of memory slots that increases logarithmically with the sequence length. Here's how it works:

Log-linear attention — a new type of attention proposed by <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a> which is:

- fast and efficient as linear attention
- expressive as softmax

It uses a small but growing number of memory slots that increases logarithmically with the sequence length.

Here's how it works:

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat228

shareShare

Jyo Pari

@jyo_pari

9 months ago

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

thumb_up_off_alt3,3K

chat_bubble_outline124

repeat514

shareShare

Sukjun (June) Hwang

@sukjun_hwang

8 months ago

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

thumb_up_off_alt2,2K

chat_bubble_outline58

repeat355

shareShare

Linus Ekenstam – eu/acc

@linusekenstam

7 months ago

This is next level. MeshBlend for Unreal Engine Just wow.

thumb_up_off_alt8,8K

chat_bubble_outline210

repeat457

shareShare

Jason Weston

@jaseweston

6 months ago

🌀Diversity Aware RL (DARLING)🌀 📝: arxiv.org/abs/2509.02534 - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

thumb_up_off_alt404

chat_bubble_outline4

repeat81

shareShare

PyTorch

@pytorch

6 months ago

Large Language Models (#LLMs) are optimized for Intel GPUs labeled as xpu in #PyTorch. Learn how to speed up local inference on Intel Arc discrete, built-in, and Arc Pro GPUs, bringing advanced AI to laptops and desktops. 🔗 hubs.la/Q03GYFrV0 #PyTorch #LLM #OpenSourceAI

thumb_up_off_alt104

chat_bubble_outline8

repeat22

shareShare

Aleksa Gordić (水平问题)

@gordic_aleksa

5 months ago

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along. (Remember matmul is the single most important operation that transformers execute

thumb_up_off_alt2,2K

chat_bubble_outline47

repeat390

shareShare

Aniket

@0xaniketsharma

4 months ago

Interestingly every state of the art model out there fails to understand videos like these x.com/theujjwal9/sta…

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Andrej Karpathy

@karpathy

a day ago

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes,

thumb_up_off_alt8,8K

chat_bubble_outline438

repeat859

shareShare