Jason Lopatecki (@jason_lopatecki) Twitter Tweets • TwiCopy

4 months ago

🔥🔥🔥🔥Phoenix update - pumped to start using this

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Mohamed Aboshihata

@mid0

4 months ago

I had a complex prompt to be tested with 6 variables and Arize AI helped me load the test data and prompt easily!

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Mikyo

4 months ago

Design Iteration >> Design Overhaul Arize AI arize-phoenix #AI #LLM #LLMOps

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

Arize AI

4 months ago

LLM observability provides structured visibility into how LLMs and agents behave, from individual spans to full multi-turn sessions. With the right instrumentation, teams can improve systems with the same rigor they apply to conventional software. bit.ly/4mgFOay

thumb_up_off_alt8

chat_bubble_outline0

repeat2

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Claude Code's 100K tokens feels infinite; your weekly cap isn't. Now that Anthropic’s weekly rate limits are live, managing context is no longer optional. In our traces, long sessions reliably led to higher costs, slower completions, and more drift in output behavior. With too

thumb_up_off_alt30

chat_bubble_outline4

repeat3

shareShare

sanjana

4 months ago

Prebuilt evals don’t always cut it - at some point, you’ll need to build your own LLM evaluator from the ground up, tailored to your exact use case. This can be tricky if you aren't sure where to start. My latest tutorial covers this! We being by building a benchmark dataset

thumb_up_off_alt14

chat_bubble_outline2

repeat5

shareShare

Arize AI

4 months ago

Trace your Dify apps in Arize AX! Get deep visibility into tool + agent calls, session flows, and token usage + errors. Setup in seconds: enter your Arize Space ID and API Key in Dify’s Monitoring tab, and start capturing detailed, real-time traces. bit.ly/3HbvjGE

Trace your <a href="/dify_ai/">Dify</a> apps in Arize AX! Get deep visibility into tool + agent calls, session flows, and token usage + errors. Setup in seconds: enter your Arize Space ID and API Key in Dify’s Monitoring tab, and start capturing detailed, real-time traces. bit.ly/3HbvjGE

thumb_up_off_alt12

chat_bubble_outline1

repeat6

shareShare

arize-phoenix

@arizephoenix

4 months ago

Plug arize-phoenix tracing into any Dify app to trace every part of your AI workflow, including: 🧠 LLM messages — capture the full conversation & decision-making process 🛠️ Tools — monitor how tools are used within your workflows 📊 Token usage + errors — monitor

thumb_up_off_alt9

chat_bubble_outline0

repeat5

shareShare

Mikyo

4 months ago

Killer integration

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Arize AI

4 months ago

🔥new @aws blog just dropped covering how to observe and evaluate AI agentic workflows with Strands Agents SDK and Arize AX! Karan Singh go.aws/4l4JiMB

🔥new @aws blog just dropped covering how to observe and evaluate AI agentic workflows with Strands Agents SDK and Arize AX! <a href="/karan5ingh/">Karan Singh</a>
go.aws/4l4JiMB

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

sanjana

4 months ago

Putting together a benchmark dataset to evaluate LLM outputs is underrated. Without ground truth or clear evals, you're flying blind. Not just pass/fail - you want structured, repeatable comparisons across prompts, models, and strategies. Earlier this week, I put out a

thumb_up_off_alt7

chat_bubble_outline1

repeat2

shareShare

sanjana

4 months ago

youtu.be/NaCjm8rdxqk?si…

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Mikyo

4 months ago

arize-phoenix 11.18 comes with lots of user request fixes! - Day 0 support for Anthropic Claude Opus 4.1 - Support for retention policy configuration via Helm - Basic support for air-gapped deployments - REST api for deleting spans for redaction - Typescript evals now

thumb_up_off_alt7

chat_bubble_outline0

repeat5

shareShare

sanjana

4 months ago

Loving these updates 👏 Day 0 Claude Opus 4.1 support, air-gapped deploys, redaction APIs, SpringAI support... and the UI polish too?!?! OSS done right. Focused, fast, and actually listening to users🔥

thumb_up_off_alt9

chat_bubble_outline0

repeat2

shareShare

sanjana

4 months ago

Span-level evaluations are powerful, but they only show you part of the picture. To really understand how your AI performs in real conversations, you have to think in sessions. A session captures the full back-and-forth between a user and your app. It reflects how people

thumb_up_off_alt6

chat_bubble_outline1

repeat4

shareShare

Arize AI

4 months ago

Packed the AWS Loft with builders last night w/ @CrewAIInc & Amazon Web Services—diving into tracing & evals for reliable AI agents. Thanks to João Moura, Karan Singh & Jason Lopatecki for a convo on where agent infra’s headed. Missed it? We’re back Aug 28: bit.ly/4m6YB8M

Packed the AWS Loft with builders last night w/ @CrewAIInc & <a href="/awscloud/">Amazon Web Services</a>—diving into tracing & evals for reliable AI agents.

Thanks to <a href="/joaomdmoura/">João Moura</a>, Karan Singh & <a href="/jason_lopatecki/">Jason Lopatecki</a> for a convo on where agent infra’s headed.

Missed it? We’re back Aug 28: bit.ly/4m6YB8M

thumb_up_off_alt9

chat_bubble_outline0

repeat4

shareShare

Mikyo

4 months ago

Just hit this issue with Claude not being able to use a MCP server talking to localhost. stackoverflow.com/questions/7950… Feels like there was a secret security issue... Am I wrong?

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

Mikyo

4 months ago

I just asked arize-phoenix MCP to figure out why my experiment went wrong and it nailed it first try. 🎯 Error analysis with LLM in the loop. I hate hyperbole but it did feel pretty magical. It does feel like a signal of a new wave: UIs designed for humans are still important,

thumb_up_off_alt7

chat_bubble_outline2

repeat2

shareShare

Arize AI