Sabri Eyuboglu (@eyuboglusabri) Twitter Tweets • TwiCopy

Sabri Eyuboglu

5 months ago

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x

thumb_up_off_alt287

chat_bubble_outline12

repeat66

shareShare

Charles Packer

@charlespacker

5 months ago

sleep-time compute* for kv cache *arxiv.org/abs/2504.13171

thumb_up_off_alt12

chat_bubble_outline1

repeat4

shareShare

Shirley Wu

@shirleyyxwu

5 months ago

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral ICML Conference) transforms LLMs from passive responders into active collaborators.

thumb_up_off_alt140

chat_bubble_outline6

repeat43

shareShare

Pierce Freeman

@piercefreeman

5 months ago

Text diffusion models might be the most unintuitive architecture around Like: let's start randomly filling in words in a paragraph and iterate enough times to get something sensible But now that google's gemini diffusion is near sota, I think we need to take them seriously

thumb_up_off_alt4

chat_bubble_outline2

repeat3

shareShare

Jon Saad-Falcon

@jonsaadfalcon

5 months ago

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

thumb_up_off_alt204

chat_bubble_outline11

repeat56

shareShare

Dan Biderman

@dan_biderman

5 months ago

Let's go Minions! Avanika Narayan

Let's go Minions!
<a href="/Avanika15/">Avanika Narayan</a>

thumb_up_off_alt31

chat_bubble_outline2

repeat3

shareShare

Agent B

@michelivan92347

5 months ago

One of today's key trends imo 👇 (By the way, the last point about delegation is why keeping an eye on projects like Minions is a good idea ...)

thumb_up_off_alt11

chat_bubble_outline1

repeat4

shareShare

Jerry Liu

@jerrywliu

4 months ago

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

thumb_up_off_alt579

chat_bubble_outline12

repeat109

shareShare

Yasa Baig

@baigyasa

4 months ago

Was extremely fun to work on this paper with Jerry Liu and finally fulfilling our 7 year plan from year one of undergrad to write a paper together! One of many I hope!

thumb_up_off_alt38

chat_bubble_outline2

repeat8

shareShare

Sabri Eyuboglu

@eyuboglusabri

4 months ago

This is awesome work from Jerry Liu and Yasa Baig "10 billion times better than prior work" to be exact

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

Dan Biderman

@dan_biderman

4 months ago

Agent swarm

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Sabri Eyuboglu

@eyuboglusabri

4 months ago

Sure, local LLMs aren't great now, but you can use your imagination a bit and get creative with how you use them! - Cerebras and groq reduce latency which is great, but local compute has different advantages: privacy and utilization of compute that is otherwise sitting idle.

thumb_up_off_alt16

chat_bubble_outline0

repeat1

shareShare

Cartesia

@cartesia_ai

4 months ago

We're excited to announce a new research release from the Cartesia team, as part of a long-term collaboration to advance deep learning architectures. We've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. H-Nets are

thumb_up_off_alt324

chat_bubble_outline5

repeat43

shareShare

Arjun Desai

@jundesai

4 months ago

The concept of tokenization (or chunking more broadly) is fundamental to the way humans consume, process, and react to information. In language, vision, audio, haptics, and many more applications. Seeing that we can model such fundamental principles in an elegant,

thumb_up_off_alt31

chat_bubble_outline1

repeat1

shareShare

Karan Goel

@krandiash

4 months ago

At Cartesia, we've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. Intelligence that can interact and reason over massive amounts of context over decade-long timescales. This research is an important step in our

thumb_up_off_alt64

chat_bubble_outline0

repeat9

shareShare

Brandon Yang

@bclyang

4 months ago

Tokenizer free models! Deep learning has been a story of end-to-end learning replacing hand crafted features, so this next step feels fundamentally important (especially for modalities that are hard to tokenize like audio and DNA). Also lots of cool implications for multimodal

thumb_up_off_alt18

chat_bubble_outline1

repeat1

shareShare

AI at AMD

@aiatamd

4 months ago

We’re thrilled to collaborate with the hazyresearch Stanford AI Lab, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems. This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with

We’re thrilled to collaborate with the <a href="/HazyResearch/">hazyresearch</a> <a href="/StanfordAILab/">Stanford AI Lab</a>, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems.

This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with

thumb_up_off_alt78

chat_bubble_outline2

repeat14

shareShare

Nicholas Roberts

@nick11roberts

4 months ago

🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to Conference on Language Modeling 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]

thumb_up_off_alt47

chat_bubble_outline1

repeat17

shareShare