Srilakshmi Chavali (@schavalii) Twitter Tweets • TwiCopy

Aparna Dhinakaran

6 months ago

I got a lot out of Scalable Chain of Thoughts via Elastic Reasoning, a new paper dropped just a couple weeks ago If you care about making LLMs reason efficiently under token and latency constraints, this is worth your time Some of my takeaways and lessons 🧵 Yuhui Xu

thumb_up_off_alt19

chat_bubble_outline2

repeat11

shareShare

Aparna Dhinakaran

@aparnadhinak

6 months ago

Moore’s Law for LLMs Expect ~2× more tokens per second per dollar every 10–12 months Example 101:Biggest VRAM hog in your LLM isn’t the weights—it’s the KV‑cache A new study—“Accurate KV Cache Quantization with Outlier Tokens Tracing” (OTT), improves accurate 2-bit

thumb_up_off_alt14

chat_bubble_outline1

repeat3

shareShare

Arize AI

@arizeai

5 months ago

Today's the day!🎉 Arize Observe just kicked off, and it's bringing a whole set of new product announcements. From Agent-powered trace debugging to new Prompt Learning techniques, we've got it all! Announcements in the thread below 🧵 👇

thumb_up_off_alt16

chat_bubble_outline1

repeat6

shareShare

arize-phoenix

@arizephoenix

5 months ago

🌟 Observe 2025 kicked off with a packed keynote We just dropped a stack of new features across Phoenix Here’s what’s new 👇

thumb_up_off_alt13

chat_bubble_outline1

repeat7

shareShare

Aparna Dhinakaran

@aparnadhinak

5 months ago

For Arize Observe today, we set out to bring together the people shaping the future of AI—and wow, did you show up From hallway debates to main stage moments, it was a day full of energy, insight, and some real talk about where this field is headed. Here are a few standout

thumb_up_off_alt17

chat_bubble_outline1

repeat4

shareShare

Mikyo

@mikeldking

5 months ago

🔧 arize-phoenix mcp gets phoenix-support tool for Cursor / Anthropic Claude / windsurf ! You now can click the add to cursor button on phoenix and get a continuously updating MCP server config directly integrated into your IDE. @arizeai/[email protected] also comes

thumb_up_off_alt16

chat_bubble_outline0

repeat7

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

The secret to prompt optimization is evals Saw this tweet by Jason Liu and got me thinking about the future of prompt optimization Most of us are in Cursor/Claude Code and it makes a ton of sense to keep prompts close to code and iterate on them with AI code editors The hard

thumb_up_off_alt22

chat_bubble_outline1

repeat8

shareShare

Arize AI

@arizeai

4 months ago

🍳Cooking up a great virtual workshop for this Thursday with our friend Tony Kipkemboi from @crewaiinc and our own Srilakshmi Chavali! Register👇 to learn how to train agents automatically and evaluate agentic workflows. bit.ly/3IrvGx5

thumb_up_off_alt8

chat_bubble_outline0

repeat5

shareShare

João Moura

@joaomdmoura

4 months ago

🚨 Most AI agents break silently. Not because they’re dumb—but because no one’s watching the loop. This Thursday (July 17, 10AM PT), join Shane from CrewAI + Arize for a live session. crewai.com/webinar/buildi…

thumb_up_off_alt18

chat_bubble_outline0

repeat4

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Reinforcement Learning in English – Prompt Learning Beyond just Optimization Andrej Karpathy tweeted something this week that I think many of us have been feeling: the resurgence of RL is great, but it’s missing the big picture. We believe that the industry chasing traditional RL is

Reinforcement Learning in English – Prompt Learning Beyond just Optimization

<a href="/karpathy/">Andrej Karpathy</a> tweeted something this week that I think many of us have been feeling: the resurgence of RL is great, but it’s missing the big picture.

We believe that the industry chasing traditional RL is

thumb_up_off_alt1,1K

chat_bubble_outline25

repeat153

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Andrej Karpathy This really resonates deeply – we’ve been building almost exactly that loop: rollout → reflection via English feedback → distilled lesson → prompt update. We’ve been using English feedback, via explanations, annotations, and rules, as the core signal for improvement. The

thumb_up_off_alt9

chat_bubble_outline0

repeat1

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Thanks for all the love on Prompt Learning! We're really excited about the potential of using English feedback in the prompt learning loop. We’ve been benchmarking our Prompt Optimizer against real-world data sets. First up: Big Bench Hard (BBH) – 50 randomly sampled tasks, 1

thumb_up_off_alt24

chat_bubble_outline2

repeat7

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Everyone’s Using Claude Code—But How Is It Actually Planning? Claude Code is everywhere right now. It’s fast, competent, and incredibly good at multi-step tasks out of the box. Kicking off a short series with agent-level breakdowns on how to evaluate Claude Code using

thumb_up_off_alt46

chat_bubble_outline7

repeat12

shareShare

Mikyo

@mikeldking

4 months ago

📈 arize-phoenix now has project dashboards! In the latest release Arize AI Phoenix comes with a dedicated project dashboard with: 📈 Trace latency and errors 📈 Latency Quantiles 📈 Annotation Scores Timeseries 📈 Cost over Time by token type 📊 Top Models by Cost 📊 Token

thumb_up_off_alt11

chat_bubble_outline2

repeat7

shareShare

Priyan Jindal

@priyanjindal

4 months ago

Can’t stop talking about prompt learning at events lately. It’s awesome seeing people light up when they realize their agents can optimize themselves through autonomous prompt updates. Big thank you to Rootly and Google DeepMind for putting together this incredible event!!

thumb_up_off_alt16

chat_bubble_outline0

repeat3

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Claude Code spends more on irrelevant input than useful output—by a long shot Last week, we kicked off a series on evaluating Claude Code using instrumentation and trace analysis. In instrumenting Claude Code, we found a clear pattern. Massive input payloads are leading to

thumb_up_off_alt328

chat_bubble_outline17

repeat42

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Yesterday I shared a post about using claude.md to enforce constraints and compress context. Here's how we’ve been using it in Claude Code sessions. claude.md acts as persistent memory. Claude reads it at the start of each session, pulling in project

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

sanjana

@sanjanayed

4 months ago

Span-level evaluations are powerful, but they only show you part of the picture. To really understand how your AI performs in real conversations, you have to think in sessions. A session captures the full back-and-forth between a user and your app. It reflects how people

thumb_up_off_alt6

chat_bubble_outline1

repeat4

shareShare

Aparna Dhinakaran

@aparnadhinak

3 months ago

Last time, we introduced Prompt Learning (PL) — showing how it can boost your agents and models without touching weights or reward functions. We showed how PL leveraged natural language feedback (evals + critiques in plain English) to optimize a prompt for generating structured

thumb_up_off_alt103

chat_bubble_outline3

repeat18

shareShare

Aparna Dhinakaran

@aparnadhinak

3 months ago

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best: Explanations make judge models more reliable. They reduce variance across runs, improve agreement

thumb_up_off_alt246

chat_bubble_outline5

repeat26

shareShare