Ujjwal Upadhyay (@theujjwal9) 's Twitter Profile
Ujjwal Upadhyay

@theujjwal9

Vision Language Models | Medical Imaging | Neuroscience

ID: 833094373894156289

linkhttps://ujjwal9.com calendar_today18-02-2017 23:22:31

500 Tweet

84 Takipçi

472 Takip Edilen

Anne Ouyang (@anneouyang) 's Twitter Profile Photo

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6) [🔗 link in final post]

✨ New blog post 👀: We have some very fast AI-generated kernels generated with a simple test-time only search. They are performing close to or in some cases even beating the standard expert-optimized production kernels shipped in PyTorch. (1/6)

[🔗 link in final post]
Toby Ford-Monroe (@tobyfordmonroe) 's Twitter Profile Photo

Very interesting paper introducing SpookyBench, which is one of the only benchmarks where the VLM-human gap remains near 100 percentage points Due to architectural limitations, no VLM can perceive meaning dispersed across individually meaningless frames ("Temporal Encoding"). In

Very interesting paper introducing SpookyBench, which is one of the only benchmarks where the VLM-human gap remains near 100 percentage points

Due to architectural limitations, no VLM can perceive meaning dispersed across individually meaningless frames ("Temporal Encoding"). In
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Time Blindness: Why Video-Language Models Can’t See What Humans Can? LLMs struggle capturing purely temporal patterns when spatial information is obscured. This paper introduces SpookyBench to evaluate this limitation, showing a significant gap compared to human perception.

Time Blindness: Why Video-Language Models Can’t See What Humans Can?

LLMs struggle capturing purely temporal patterns when spatial information is obscured.

This paper introduces SpookyBench to evaluate this limitation, showing a significant gap compared to human perception.
TuringPost (@theturingpost) 's Twitter Profile Photo

Log-linear attention — a new type of attention proposed by Massachusetts Institute of Technology (MIT) which is: - fast and efficient as linear attention - expressive as softmax It uses a small but growing number of memory slots that increases logarithmically with the sequence length. Here's how it works:

Log-linear attention — a new type of attention proposed by <a href="/MIT/">Massachusetts Institute of Technology (MIT)</a> which is:

- fast and efficient as linear attention
- expressive as softmax

It uses a small but growing number of memory slots that increases logarithmically with the sequence length.

Here's how it works:
Jyo Pari (@jyo_pari) 's Twitter Profile Photo

What if an LLM could update its own weights? Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs. Self-editing is learned via RL, using the updated model’s downstream performance as reward.

What if an LLM could update its own weights?

Meet SEAL🦭: a framework where LLMs generate their own training data (self-edits) to update their weights in response to new inputs.

Self-editing is learned via RL, using the updated model’s downstream performance as reward.
Sukjun (June) Hwang (@sukjun_hwang) 's Twitter Profile Photo

Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical network that replaces tokenization with a dynamic chunking process directly inside the model, automatically discovering and operating over meaningful units of data

Jason Weston (@jaseweston) 's Twitter Profile Photo

🌀Diversity Aware RL (DARLING)🌀 📝: arxiv.org/abs/2509.02534 - Jointly optimizes for quality & diversity using a learned partition function - Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k - Works for both non-verifiable & verifiable tasks 🧵1/5

🌀Diversity Aware RL (DARLING)🌀
📝: arxiv.org/abs/2509.02534
- Jointly optimizes for quality &amp; diversity using a learned partition function
- Outperforms standard RL in quality AND diversity metrics, e.g. higher pass@1/p@k
- Works for both non-verifiable &amp; verifiable tasks
🧵1/5
PyTorch (@pytorch) 's Twitter Profile Photo

Large Language Models (#LLMs) are optimized for Intel GPUs labeled as xpu in #PyTorch. Learn how to speed up local inference on Intel Arc discrete, built-in, and Arc Pro GPUs, bringing advanced AI to laptops and desktops. 🔗 hubs.la/Q03GYFrV0 #PyTorch #LLM #OpenSourceAI

Large Language Models (#LLMs) are optimized for Intel GPUs labeled as xpu in #PyTorch. Learn how to speed up local inference on Intel Arc discrete, built-in, and Arc Pro GPUs, bringing advanced AI to laptops and desktops.

🔗 hubs.la/Q03GYFrV0

#PyTorch #LLM #OpenSourceAI
Aleksa Gordić (水平问题) (@gordic_aleksa) 's Twitter Profile Photo

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along. (Remember matmul is the single most important operation that transformers execute

New in-depth blog post time: "Inside NVIDIA GPUs: Anatomy of high performance matmul kernels". If you want to deeply understand how one writes state of the art matmul kernels in CUDA read along.

(Remember matmul is the single most important operation that transformers execute
Andrej Karpathy (@karpathy) 's Twitter Profile Photo

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes,

Three days ago I left autoresearch tuning nanochat for ~2 days on depth=12 model. It found ~20 changes that improved the validation loss. I tested these changes yesterday and all of them were additive and transferred to larger (depth=24) models. Stacking up all of these changes,