Srilakshmi Chavali (@schavalii) 's Twitter Profile
Srilakshmi Chavali

@schavalii

dev rel @ Arize || uc berkeley alum 🎓

ID: 1879242181896200192

calendar_today14-01-2025 19:00:38

14 Tweet

13 Followers

31 Following

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

I got a lot out of Scalable Chain of Thoughts via Elastic Reasoning, a new paper dropped just a couple weeks ago If you care about making LLMs reason efficiently under token and latency constraints, this is worth your time Some of my takeaways and lessons 🧵 Yuhui Xu

I got a lot out of Scalable Chain of Thoughts via Elastic Reasoning, a new paper dropped just a couple weeks ago

If you care about making LLMs reason efficiently under token and latency constraints, this is worth your time

Some of my takeaways and lessons 🧵
<a href="/xyh6666/">Yuhui Xu</a>
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Moore’s Law for LLMs Expect ~2× more tokens per second per dollar every 10–12 months Example 101:Biggest VRAM hog in your LLM isn’t the weights—it’s the KV‑cache A new study—“Accurate KV Cache Quantization with Outlier Tokens Tracing” (OTT), improves accurate 2-bit

Moore’s Law for LLMs

Expect ~2× more tokens per second per dollar every 10–12 months

Example 101:Biggest VRAM hog in your LLM isn’t the weights—it’s the KV‑cache

A new study—“Accurate KV Cache Quantization with Outlier Tokens Tracing” (OTT), improves accurate 2-bit
Arize AI (@arizeai) 's Twitter Profile Photo

Today's the day!🎉 Arize Observe just kicked off, and it's bringing a whole set of new product announcements. From Agent-powered trace debugging to new Prompt Learning techniques, we've got it all! Announcements in the thread below 🧵 👇

arize-phoenix (@arizephoenix) 's Twitter Profile Photo

🌟 Observe 2025 kicked off with a packed keynote We just dropped a stack of new features across Phoenix Here’s what’s new 👇

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

For Arize Observe today, we set out to bring together the people shaping the future of AI—and wow, did you show up From hallway debates to main stage moments, it was a day full of energy, insight, and some real talk about where this field is headed. Here are a few standout

Mikyo (@mikeldking) 's Twitter Profile Photo

🔧 arize-phoenix mcp gets phoenix-support tool for Cursor / Anthropic Claude / windsurf ! You now can click the add to cursor button on phoenix and get a continuously updating MCP server config directly integrated into your IDE. @arizeai/[email protected] also comes

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

The secret to prompt optimization is evals Saw this tweet by Jason Liu and got me thinking about the future of prompt optimization Most of us are in Cursor/Claude Code and it makes a ton of sense to keep prompts close to code and iterate on them with AI code editors The hard

The secret to prompt optimization is evals

Saw this tweet by Jason Liu and got me thinking about the future of prompt optimization

Most of us are in Cursor/Claude Code and it makes a ton of sense to keep prompts close to code and iterate on them with AI code editors

The hard
Arize AI (@arizeai) 's Twitter Profile Photo

🍳Cooking up a great virtual workshop for this Thursday with our friend Tony Kipkemboi from @crewaiinc and our own Srilakshmi Chavali! Register👇 to learn how to train agents automatically and evaluate agentic workflows. bit.ly/3IrvGx5

João Moura (@joaomdmoura) 's Twitter Profile Photo

🚨 Most AI agents break silently. Not because they’re dumb—but because no one’s watching the loop. This Thursday (July 17, 10AM PT), join Shane from CrewAI + Arize for a live session. crewai.com/webinar/buildi…

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Reinforcement Learning in English – Prompt Learning Beyond just Optimization Andrej Karpathy tweeted something this week that I think many of us have been feeling: the resurgence of RL is great, but it’s missing the big picture. We believe that the industry chasing traditional RL is

Reinforcement Learning in English – Prompt Learning Beyond just Optimization

<a href="/karpathy/">Andrej Karpathy</a> tweeted something this week that I think many of us have been feeling: the resurgence of RL is great, but it’s missing the big picture.

We believe that the industry chasing traditional RL is
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Andrej Karpathy This really resonates deeply – we’ve been building almost exactly that loop: rollout → reflection via English feedback → distilled lesson → prompt update. We’ve been using English feedback, via explanations, annotations, and rules, as the core signal for improvement. The

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Thanks for all the love on Prompt Learning! We're really excited about the potential of using English feedback in the prompt learning loop. We’ve been benchmarking our Prompt Optimizer against real-world data sets. First up: Big Bench Hard (BBH) – 50 randomly sampled tasks, 1

Thanks for all the love on Prompt Learning! We're really excited about the potential of using English feedback in the prompt learning loop.

We’ve been benchmarking our Prompt Optimizer against real-world data sets.

First up: Big Bench Hard (BBH) – 50 randomly sampled tasks, 1
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Everyone’s Using Claude Code—But How Is It Actually Planning? Claude Code is everywhere right now. It’s fast, competent, and incredibly good at multi-step tasks out of the box. Kicking off a short series with agent-level breakdowns on how to evaluate Claude Code using

Everyone’s Using Claude Code—But How Is It Actually Planning?

Claude Code is everywhere right now. It’s fast, competent, and incredibly good at multi-step tasks out of the box.

Kicking off a short series with agent-level breakdowns on how to evaluate Claude Code using
Mikyo (@mikeldking) 's Twitter Profile Photo

📈 arize-phoenix now has project dashboards! In the latest release Arize AI Phoenix comes with a dedicated project dashboard with: 📈 Trace latency and errors 📈 Latency Quantiles 📈 Annotation Scores Timeseries 📈 Cost over Time by token type 📊 Top Models by Cost 📊 Token

Priyan Jindal (@priyanjindal) 's Twitter Profile Photo

Can’t stop talking about prompt learning at events lately. It’s awesome seeing people light up when they realize their agents can optimize themselves through autonomous prompt updates. Big thank you to Rootly and Google DeepMind for putting together this incredible event!!

Can’t stop talking about prompt learning at events lately. It’s awesome seeing people light up when they realize their agents can optimize themselves through autonomous prompt updates.

Big thank you to <a href="/rootlyhq/">Rootly</a> and <a href="/GoogleDeepMind/">Google DeepMind</a> for putting together this incredible event!!
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Claude Code spends more on irrelevant input than useful output—by a long shot Last week, we kicked off a series on evaluating Claude Code using instrumentation and trace analysis. In instrumenting Claude Code, we found a clear pattern. Massive input payloads are leading to

Claude Code spends more on irrelevant input than useful output—by a long shot

Last week, we kicked off a series on evaluating Claude Code using instrumentation and trace analysis.

In instrumenting Claude Code, we found a clear pattern.
Massive input payloads are leading to
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Yesterday I shared a post about using claude.md to enforce constraints and compress context. Here's how we’ve been using it in Claude Code sessions. claude.md acts as persistent memory. Claude reads it at the start of each session, pulling in project

sanjana (@sanjanayed) 's Twitter Profile Photo

Span-level evaluations are powerful, but they only show you part of the picture. To really understand how your AI performs in real conversations, you have to think in sessions. A session captures the full back-and-forth between a user and your app. It reflects how people

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Last time, we introduced Prompt Learning (PL) — showing how it can boost your agents and models without touching weights or reward functions. We showed how PL leveraged natural language feedback (evals + critiques in plain English) to optimize a prompt for generating structured

Last time, we introduced Prompt Learning (PL) — showing how it can boost your agents and models without touching weights or reward functions. We showed how PL leveraged natural language feedback (evals + critiques in plain English) to optimize a prompt for generating structured
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best: Explanations make judge models more reliable. They reduce variance across runs, improve agreement

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best:

Explanations make judge models more reliable.  They reduce variance across runs, improve agreement