Dylan Couzon (@dylancouzon) Twitter Tweets • TwiCopy

DeepLearning.AI

4 months ago

Building a reliable RAG system doesn’t stop at retrieval and generation, you need observability too. In the Retrieval Augmented Generation course, you'll explore how LLM observability platforms can help you: - Trace prompts through each step of the pipeline - Log and evaluate

thumb_up_off_alt237

chat_bubble_outline7

repeat40

shareShare

arize-phoenix

@arizephoenix

4 months ago

Over the last few weeks, Phoenix has continued to evolve. A large part of observability is about control over data: filter what’s important, preserve it, and build a workflow that scales with how you debug, analyze, and ship your systems. In Phoenix, your traces are no longer

thumb_up_off_alt11

chat_bubble_outline0

repeat3

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best: Explanations make judge models more reliable. They reduce variance across runs, improve agreement

thumb_up_off_alt246

chat_bubble_outline5

repeat26

shareShare

Arize AI

@arizeai

4 months ago

Experimentation in Arize got better with Diff Mode🌟 Start with a baseline experiment, then run variations to see how your evals shift. The hard part has always been spotting what actually changed and whether it mattered. That’s where Diff Mode now comes in. You can now line up

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

arize-phoenix

@arizephoenix

4 months ago

Every experiment tells a story and it is important to see how one run stacks up against another. Did the model really get better, or just more expensive? Did eval scores improve across the board, or only on a few runs? The Phoenix team has made several improvements to the

thumb_up_off_alt8

chat_bubble_outline0

repeat3

shareShare

Groq Inc

@groqinc

4 months ago

Join Groq Inc , Google and Arize AI at Betaworks NYC on Sept 10, 6–9PM ET to learn how to ship realtime, reliable agents. Groq will show how open source models now rival frontier intelligence without the latency. Details 👇

Join <a href="/GroqInc/">Groq Inc</a> , <a href="/Google/">Google</a> and <a href="/arizeai/">Arize AI</a> at Betaworks NYC on Sept 10, 6–9PM ET to learn how to ship realtime, reliable agents.

Groq will show how open source models now rival frontier intelligence without the latency.

Details 👇

thumb_up_off_alt22

chat_bubble_outline1

repeat7

shareShare

Arize AI

@arizeai

4 months ago

If you're debating whether to make the jump, our own Alec Swanson tackles the major differences between Cursor and @Claude_Code and some power user techniques for the latter. bit.ly/4lPZYYr

If you're debating whether to make the jump, our own Alec Swanson tackles the major differences between <a href="/cursor_ai/">Cursor</a> and @Claude_Code and some power user techniques for the latter. bit.ly/4lPZYYr

thumb_up_off_alt3

chat_bubble_outline1

repeat2

shareShare

Mohit

@cryptonetes

4 months ago

Blog on Trajectory evaluation using Arize AI arize-phoenix by Priyankaaa ! blog.aximox.com/advanced-agent… #Evaluation #agents #testing. .

thumb_up_off_alt3

chat_bubble_outline0

repeat4

shareShare

Arize AI

@arizeai

4 months ago

BERLIN: join Dat Ngo at Qdrant Vector Space Day, where he'll be covering how to build self-improving evals for agentic RAG. RSVP: bit.ly/4lZYjQe

BERLIN: join <a href="/dat_attacked/">Dat Ngo</a> at <a href="/qdrant_engine/">Qdrant</a> Vector Space Day, where he'll be covering how to build self-improving evals for agentic RAG. RSVP: bit.ly/4lZYjQe

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Aparna Dhinakaran

@aparnadhinak

4 months ago

The “AI Evals for Engineers & PMs” course by Hamel Husain & Shreya Shankar nails a huge need: practical processes + tools for evaluating AI & agent apps. In the latest cohort, our team guest-lectured on arize-phoenix (shoutout Mikyo Sally-Ann Delucia Srilakshmi Chavali Priyan Jindal)

The “AI Evals for Engineers & PMs” course by <a href="/HamelHusain/">Hamel Husain</a> & <a href="/sh_reya/">Shreya Shankar</a> nails a huge need: practical processes + tools for evaluating AI & agent apps.

In the latest cohort, our team guest-lectured on <a href="/ArizePhoenix/">arize-phoenix</a> (shoutout <a href="/mikeldking/">Mikyo</a> Sally-Ann Delucia <a href="/schavalii/">Srilakshmi Chavali</a> <a href="/PriyanJindal/">Priyan Jindal</a>)

thumb_up_off_alt53

chat_bubble_outline1

repeat11

shareShare

Arize AI

@arizeai

4 months ago

Arjun Mukerji, PhD, of @atroposhealth will be presenting his paper on LLM summarization of real-world evidence studies at our next community paper reading! RSVP: bit.ly/3K5PiHS

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Arize AI

@arizeai

3 months ago

We are excited for what's possible with Dify and Arize 🤝 Building AI agents is fast & intuitive with Dify, but keeping them accurate and reliable at scale can be a challenge. That’s where Arize comes in: trace every agent step, debug failures, run structured evaluations,

thumb_up_off_alt9

chat_bubble_outline0

repeat3

shareShare

arize-phoenix

@arizephoenix

3 months ago

💡 Your LLM might ace English queries… but what happens when your users switch to Spanish, Hindi, or Mandarin? For teams building global AI systems, this is the hidden challenge: LLMs often fail to generate correct Cypher queries across languages. That means gaps in reasoning,

thumb_up_off_alt5

chat_bubble_outline1

repeat1

shareShare

Aparna Dhinakaran

@aparnadhinak

3 months ago

Everyone is shipping agents right now. With so many agent frameworks popping up, the choice comes down to which one actually fits what you want your agent to do. We broke down orchestrator-worker workflows for 6 of the most common frameworks: @Agno, autogen, あい, @openai

thumb_up_off_alt174

chat_bubble_outline11

repeat16

shareShare

Dat Ngo

@dat_attacked

3 months ago

Excited to see the open source community for AI Engineer Paris! arize-phoenix will be out in full force, so if you're there, please stop by and say hi if you're an open source advocate! Aparna Dhinakaran will be giving one of her epic talks on Prompt Learning for Agents, so

Excited to see the open source community for <a href="/aiDotEngineer/">AI Engineer</a> Paris!

<a href="/ArizePhoenix/">arize-phoenix</a> will be out in full force, so if you're there, please stop by and say hi if you're an open source advocate!

<a href="/aparnadhinak/">Aparna Dhinakaran</a> will be giving one of her epic talks on Prompt Learning for Agents, so

thumb_up_off_alt6

chat_bubble_outline0

repeat2

shareShare

arize-phoenix

@arizephoenix

3 months ago

New feature drop: Label your prompts by use-case, provider, and more!

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

arize-phoenix

@arizephoenix

3 months ago

See the new Phoenix evals library in action and bring your questions for our virtual workshop this Thursday! RSVP: luma.com/45eopucf

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

arize-phoenix

@arizephoenix

3 months ago

Comparing experiments in Phoenix just got a lot easier. The new List View helps you quickly scan results with per-example metrics, while the Metrics View gives you a high-level look at how changes impact cost, latency, tokens, and more. Upgrade to v11.32.1 to start exploring.

thumb_up_off_alt7

chat_bubble_outline0

repeat3

shareShare