Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile
Sabri Eyuboglu

@eyuboglusabri

Computer Science PhD student @Stanford working with @HazyResearch and @james_y_zou
🪬

ID: 1097318301569413120

linkhttp://sabrieyuboglu.com calendar_today18-02-2019 02:14:05

292 Tweet

813 Takipçi

295 Takip Edilen

Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile Photo

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size. What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x

When we put lots of text (eg a code repo) into LLM context, cost soars b/c of the KV cache’s size.

What if we trained a smaller KV cache for our documents offline? Using a test-time training recipe we call self-study, we find that this can reduce cache memory on avg 39x
Shirley Wu (@shirleyyxwu) 's Twitter Profile Photo

Even the smartest LLMs can fail at basic multiturn communication Ask for grocery help → without asking where you live 🤦‍♀️ Ask to write articles → assumes your preferences 🤷🏻‍♀️ ⭐️CollabLLM (top 1%; oral ICML Conference) transforms LLMs from passive responders into active collaborators.

Even the smartest LLMs can fail at basic multiturn communication

Ask for grocery help → without asking where you live 🤦‍♀️
Ask to write articles → assumes your preferences 🤷🏻‍♀️

⭐️CollabLLM (top 1%; oral <a href="/icmlconf/">ICML Conference</a>) transforms LLMs from passive responders into active collaborators.
Pierce Freeman (@piercefreeman) 's Twitter Profile Photo

Text diffusion models might be the most unintuitive architecture around Like: let's start randomly filling in words in a paragraph and iterate enough times to get something sensible But now that google's gemini diffusion is near sota, I think we need to take them seriously

Jon Saad-Falcon (@jonsaadfalcon) 's Twitter Profile Photo

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning

How can we close the generation-verification gap when LLMs produce correct answers but fail to select them? 
🧵 Introducing Weaver: a framework that combines multiple weak verifiers (reward models + LM judges) to achieve o3-mini-level accuracy with much cheaper non-reasoning
Agent B (@michelivan92347) 's Twitter Profile Photo

One of today's key trends imo 👇 (By the way, the last point about delegation is why keeping an eye on projects like Minions is a good idea ...)

Jerry Liu (@jerrywliu) 's Twitter Profile Photo

1/10 ML can solve PDEs – but precision🔬is still a challenge. Towards high-precision methods for scientific problems, we introduce BWLer 🎳, a new architecture for physics-informed learning achieving (near-)machine-precision (up to 10⁻¹² RMSE) on benchmark PDEs. 🧵How it works:

Yasa Baig (@baigyasa) 's Twitter Profile Photo

Was extremely fun to work on this paper with Jerry Liu and finally fulfilling our 7 year plan from year one of undergrad to write a paper together! One of many I hope!

Sabri Eyuboglu (@eyuboglusabri) 's Twitter Profile Photo

Sure, local LLMs aren't great now, but you can use your imagination a bit and get creative with how you use them! - Cerebras and groq reduce latency which is great, but local compute has different advantages: privacy and utilization of compute that is otherwise sitting idle.

Cartesia (@cartesia_ai) 's Twitter Profile Photo

We're excited to announce a new research release from the Cartesia team, as part of a long-term collaboration to advance deep learning architectures. We've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. H-Nets are

Arjun Desai (@jundesai) 's Twitter Profile Photo

The concept of tokenization (or chunking more broadly) is fundamental to the way humans consume, process, and react to information. In language, vision, audio, haptics, and many more applications. Seeing that we can model such fundamental principles in an elegant,

Karan Goel (@krandiash) 's Twitter Profile Photo

At Cartesia, we've always believed that model architectures remain a fundamental bottleneck in building truly intelligent systems. Intelligence that can interact and reason over massive amounts of context over decade-long timescales. This research is an important step in our

Brandon Yang (@bclyang) 's Twitter Profile Photo

Tokenizer free models! Deep learning has been a story of end-to-end learning replacing hand crafted features, so this next step feels fundamentally important (especially for modalities that are hard to tokenize like audio and DNA). Also lots of cool implications for multimodal

AI at AMD (@aiatamd) 's Twitter Profile Photo

We’re thrilled to collaborate with the hazyresearch Stanford AI Lab, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems. This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with

We’re thrilled to collaborate with the <a href="/HazyResearch/">hazyresearch</a> <a href="/StanfordAILab/">Stanford AI Lab</a>, led by Chris Ré, to power Minions, their cutting-edge agentic framework tackling the cost-accuracy tradeoff in modern AI systems.

This innovation is enabled on AMD Ryzen AI, thanks to seamless integration with
Nicholas Roberts (@nick11roberts) 's Twitter Profile Photo

🎉 Excited to share that our paper "Pretrained Hybrids with MAD Skills" was accepted to Conference on Language Modeling 2025! We introduce Manticore - a framework for automatically creating hybrid LMs from pretrained models without training from scratch. 🧵[1/n]