Arnav Singhvi (@arnav_thebigman) 's Twitter Profile
Arnav Singhvi

@arnav_thebigman

Working on DSPy under the mentorship of Stanford Ph.D. student Omar Khattab @lateinteraction

ID: 741014492687925248

linkhttps://arnavsinghvi11.github.io/ calendar_today09-06-2016 21:09:57

106 Tweet

653 Followers

345 Following

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

It's really painful to live a few years ahead of the trend. By the time the trend catches up, you've realized a deeper concept and moved on. At recent AI events, I was stopped so many times and asked ``can I do multi-step/"multi-agent" stuff in DSPy''. What?! DSPy was started

DSPy (@dspyoss) 's Twitter Profile Photo

New paper from Stanford University. "Expert-level validation of AI-generated medical text with scalable language models" The authors use dspy.BootstrapFinetune for offline RL to update the weights of their LLMs. They introduce MedVAL, a method to train LLMs to evaluate whether

New paper from Stanford University.

"Expert-level validation of AI-generated medical text with scalable language models"

The authors use dspy.BootstrapFinetune for offline RL to update the weights of their LLMs.

They introduce MedVAL, a method to train LLMs to evaluate whether
DSPy (@dspyoss) 's Twitter Profile Photo

The mission of DSPy is to turn AI system design into an engineering discipline, with a declarative programming lens. Technically, this isn't *really* about LLMs or prompts. We're seeking the most natural and timeless language to express and iterate on AI software. You may

Lakshya A Agrawal (@lakshyaaagrawal) 's Twitter Profile Photo

How does prompt optimization compare to RL algos like GRPO? GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't. Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵

How does prompt optimization compare to RL algos like GRPO?

GRPO needs 1000s of rollouts, but humans can learn from a few trials—by reflecting on what worked & what didn't.

Meet GEPA: a reflective prompt optimizer that can outperform GRPO by up to 20% with 35x fewer rollouts!🧵
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Well, in the most lowkey way possible, and on a random Tue afternoon, DSPy 3.0 is out of beta. pip install -U dspy So many amazing people contributed to this. Thank you all! (Release notes below. Stay tuned for a ton of stuff over the next few weeks!!)

Well, in the most lowkey way possible, and on a random Tue afternoon, <a href="/DSPyOSS/">DSPy</a> 3.0 is out of beta.

pip install -U dspy

So many amazing people contributed to this. Thank you all!

(Release notes below. Stay tuned for a ton of stuff over the next few weeks!!)
Ivan Zhou (@ivanzhouyq) 's Twitter Profile Photo

Automated prompt optimization (GEPA) can push open-source models beyond frontier performance on enterprise tasks — at a fraction of the cost! 🔑 Key results from our research Databricks Mosaic Research: 1⃣ gpt-oss-120b + GEPA beats Claude Opus 4.1 on Information Extraction (+2.2 points) —

Arnav Singhvi (@arnav_thebigman) 's Twitter Profile Photo

Was awesome cooking with GEPA for Databricks Agent Bricks! So cool to see prompt optimization lift gpt-oss models + stacking GEPA with SFT really is better together

Jonathan Frankle (@jefrankle) 's Twitter Profile Photo

I'm getting really excited about prompt optimization as a cost vs. quality sweet spot for enterprises. These results from Ivan Zhou Arnav Singhvi Krista Opsahl-Ong and team make a compelling case.

Matei Zaharia (@matei_zaharia) 's Twitter Profile Photo

Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in difficult Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%.

Prompt optimization is becoming a powerful technique for improving AI that can even beat SFT! Here are some of our research results with GEPA at Databricks, in difficult Agent Bricks info extraction tasks. We can match the best models at 90x lower cost, or improve them by ~6%.
Connor Shorten (@cshorten30) 's Twitter Profile Photo

The DSPy community is growing in Boston! ☘️🔥 We are beyond excited to be hosting a DSPy meetup on October 15th! Come meet DSPy and AI builders and learn from talks by Omar Khattab (Omar Khattab), Noah Ziems (Noah Ziems), and Vikram Shenoy (Vikram Shenoy)! See you in

Simon Willison (@simonw) 's Twitter Profile Photo

If you've been trying to figure out DSPy - the automatic prompt optimization system - this talk by Drew Breunig is the clearest explanation I've seen yet, with a very useful real-world case study youtube.com/watch?v=I9Ztkg… My notes here: simonwillison.net/2025/Oct/4/dre…

Drew Houston (@drewhouston) 's Twitter Profile Photo

Simon Willison Drew Breunig Have heard great things about DSPy plus GEPA, which is an even stronger prompt optimizer than miprov2 — repo and (fascinating) examples of generated prompts at github.com/gepa-ai/gepa and paper at arxiv.org/abs/2507.19457

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

Love OpenAI’s new “we have DSPy at home” energy lol Guess that 4-5 years later, we’re no longer contrarian on most fronts from retrieval, multi-step tasks, prompt optimization, and downstream AI systems being ALL necessary pieces. Scale is not all you need. “AGI” is not enough.

Love OpenAI’s new “we have DSPy at home” energy lol

Guess that 4-5 years later, we’re no longer contrarian on most fronts from retrieval, multi-step tasks, prompt optimization, and downstream AI systems being ALL necessary pieces.

Scale is not all you need. “AGI” is not enough.
Alex Zhang (@a1zhang) 's Twitter Profile Photo

What if scaling the context windows of frontier LLMs is much easier than it sounds? We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,

What if scaling the context windows of frontier LLMs is much easier than it sounds?

We’re excited to share our work on Recursive Language Models (RLMs). A new inference strategy where LLMs can decompose and recursively interact with input prompts of seemingly unbounded length,
Omar Khattab (@lateinteraction) 's Twitter Profile Photo

For a long time, we and others have thought about general-purpose inference scaling axes. But only CoT reasoning and ReAct-style loops stuck around. I think Recursive Language Models may be the next one. Your current LLM can already process 10M+ prompt tokens, recursively.

Omar Khattab (@lateinteraction) 's Twitter Profile Photo

while the results are incredible, my favorite part of Alex's post is studying what the recursive LM actually decides what to do how does it take a variable with 10M tokens and figures it out you see strategies like: peeking, map/reduce, summarization, and output accumulation

while the results are incredible, my favorite part of Alex's post is studying what the recursive LM actually decides what to do

how does it take a variable with 10M tokens and figures it out

you see strategies like: peeking, map/reduce, summarization, and output accumulation