Arnav Singhvi (@arnav_thebigman) Twitter Tweets • TwiCopy

Omar Khattab

6 months ago

It's really painful to live a few years ahead of the trend. By the time the trend catches up, you've realized a deeper concept and moved on. At recent AI events, I was stopped so many times and asked ``can I do multi-step/"multi-agent" stuff in DSPy''. What?! DSPy was started

thumb_up_off_alt473

chat_bubble_outline13

repeat56

shareShare

tobi lutke

@tobi

5 months ago

DSPy DSPy is my context engineering tool of choice

thumb_up_off_alt274

chat_bubble_outline12

repeat29

shareShare

DSPy

@dspyoss

5 months ago

DSPy 3.0 beta Talk by Omar Khattab Link below

DSPy 3.0 beta

Talk by <a href="/lateinteraction/">Omar Khattab</a>

Link below

thumb_up_off_alt338

chat_bubble_outline5

repeat44

shareShare

DSPy

@dspyoss

5 months ago

New paper from Stanford University. "Expert-level validation of AI-generated medical text with scalable language models" The authors use dspy.BootstrapFinetune for offline RL to update the weights of their LLMs. They introduce MedVAL, a method to train LLMs to evaluate whether

thumb_up_off_alt253

chat_bubble_outline3

repeat27

shareShare

DSPy

@dspyoss

5 months ago

The mission of DSPy is to turn AI system design into an engineering discipline, with a declarative programming lens. Technically, this isn't *really* about LLMs or prompts. We're seeking the most natural and timeless language to express and iterate on AI software. You may

thumb_up_off_alt179

chat_bubble_outline5

repeat18

shareShare

Lakshya A Agrawal

@lakshyaaagrawal

4 months ago

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

thumb_up_off_alt458

chat_bubble_outline15

repeat87

shareShare

dilara

@dilarafsoylu

4 months ago

Should you RL your compound AI system or optimize its prompts? We think both! 🤯 A short preview of work co-led with Noah Ziems and Lakshya A Agrawal!👇

Should you RL your compound AI system or optimize its prompts? We think both! 🤯

A short preview of work co-led with <a href="/NoahZiems/">Noah Ziems</a> and <a href="/LakshyAAAgrawal/">Lakshya A Agrawal</a>!👇

thumb_up_off_alt288

chat_bubble_outline7

repeat42

shareShare

Omar Khattab

@lateinteraction

4 months ago

Well, in the most lowkey way possible, and on a random Tue afternoon, DSPy 3.0 is out of beta. pip install -U dspy So many amazing people contributed to this. Thank you all! (Release notes below. Stay tuned for a ton of stuff over the next few weeks!!)

Well, in the most lowkey way possible, and on a random Tue afternoon, <a href="/DSPyOSS/">DSPy</a> 3.0 is out of beta.

pip install -U dspy

So many amazing people contributed to this. Thank you all!

(Release notes below. Stay tuned for a ton of stuff over the next few weeks!!)

thumb_up_off_alt443

chat_bubble_outline30

repeat69

shareShare

tobi lutke

@tobi

3 months ago

Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world

thumb_up_off_alt1,1K

chat_bubble_outline85

repeat167

shareShare

Ivan Zhou

@ivanzhouyq

2 months ago

Automated prompt optimization (GEPA) can push open-source models beyond frontier performance on enterprise tasks — at a fraction of the cost! 🔑 Key results from our research Databricks Mosaic Research: 1⃣ gpt-oss-120b + GEPA beats Claude Opus 4.1 on Information Extraction (+2.2 points) —

thumb_up_off_alt535

chat_bubble_outline11

repeat69

shareShare

Arnav Singhvi

@arnav_thebigman

2 months ago

Was awesome cooking with GEPA for Databricks Agent Bricks! So cool to see prompt optimization lift gpt-oss models + stacking GEPA with SFT really is better together

thumb_up_off_alt19

chat_bubble_outline0

repeat5

shareShare

Jonathan Frankle

@jefrankle

2 months ago

I'm getting really excited about prompt optimization as a cost vs. quality sweet spot for enterprises. These results from Ivan Zhou Arnav Singhvi Krista Opsahl-Ong and team make a compelling case.

thumb_up_off_alt71

chat_bubble_outline7

repeat6

shareShare

Matei Zaharia

@matei_zaharia

2 months ago

Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in difficult Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%.

thumb_up_off_alt879

chat_bubble_outline30

repeat127

shareShare

Connor Shorten

@cshorten30

2 months ago

The DSPy community is growing in Boston! ☘️🔥 We are beyond excited to be hosting a DSPy meetup on October 15th! Come meet DSPy and AI builders and learn from talks by Omar Khattab (Omar Khattab), Noah Ziems (Noah Ziems), and Vikram Shenoy (Vikram Shenoy)! See you in

thumb_up_off_alt104

chat_bubble_outline14

repeat30

shareShare

Simon Willison

@simonw

2 months ago

If you've been trying to figure out DSPy - the automatic prompt optimization system - this talk by Drew Breunig is the clearest explanation I've seen yet, with a very useful real-world case study youtube.com/watch?v=I9Ztkg… My notes here: simonwillison.net/2025/Oct/4/dre…

thumb_up_off_alt1,1K

chat_bubble_outline18

repeat128

shareShare

Drew Houston

@drewhouston

2 months ago

Simon Willison Drew Breunig Have heard great things about DSPy plus GEPA, which is an even stronger prompt optimizer than miprov2 — repo and (fascinating) examples of generated prompts at github.com/gepa-ai/gepa and paper at arxiv.org/abs/2507.19457

thumb_up_off_alt210

chat_bubble_outline12

repeat19

shareShare

Omar Khattab

@lateinteraction

2 months ago

Love OpenAI’s new “we have DSPy at home” energy lol Guess that 4-5 years later, we’re no longer contrarian on most fronts from retrieval, multi-step tasks, prompt optimization, and downstream AI systems being ALL necessary pieces. Scale is not all you need. “AGI” is not enough.

thumb_up_off_alt408

chat_bubble_outline14

repeat29

shareShare

Alex Zhang

@a1zhang

2 months ago

What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,

thumb_up_off_alt2,2K

chat_bubble_outline117

repeat342

shareShare

Omar Khattab

@lateinteraction

2 months ago

For a long time, we and others have thought about general-purpose inference scaling axes. But only CoT reasoning and ReAct-style loops stuck around. I think Recursive Language Models may be the next one. Your current LLM can already process 10M+ prompt tokens, recursively.

thumb_up_off_alt225

chat_bubble_outline12

repeat35

shareShare

Omar Khattab

@lateinteraction

2 months ago

while the results are incredible, my favorite part of Alex's post is studying what the recursive LM actually decides what to do how does it take a variable with 10M tokens and figures it out you see strategies like: peeking, map/reduce, summarization, and output accumulation

thumb_up_off_alt235

chat_bubble_outline12

repeat17

shareShare