Dylan Couzon (@dylancouzon) 's Twitter Profile
Dylan Couzon

@dylancouzon

AI Growth Engineer @ Arize AI

ID: 868418892

linkhttp://arize.com calendar_today08-10-2012 18:04:16

60 Tweet

42 Followers

76 Following

DeepLearning.AI (@deeplearningai) 's Twitter Profile Photo

Building a reliable RAG system doesn’t stop at retrieval and generation, you need observability too. In the Retrieval Augmented Generation course, you'll explore how LLM observability platforms can help you: - Trace prompts through each step of the pipeline - Log and evaluate

arize-phoenix (@arizephoenix) 's Twitter Profile Photo

Over the last few weeks, Phoenix has continued to evolve. A large part of observability is about control over data: filter what’s important, preserve it, and build a workflow that scales with how you debug, analyze, and ship your systems. In Phoenix, your traces are no longer

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best: Explanations make judge models more reliable. They reduce variance across runs, improve agreement

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best:

Explanations make judge models more reliable.  They reduce variance across runs, improve agreement
Arize AI (@arizeai) 's Twitter Profile Photo

Experimentation in Arize got better with Diff Mode🌟 Start with a baseline experiment, then run variations to see how your evals shift. The hard part has always been spotting what actually changed and whether it mattered. That’s where Diff Mode now comes in. You can now line up

arize-phoenix (@arizephoenix) 's Twitter Profile Photo

Every experiment tells a story and it is important to see how one run stacks up against another. Did the model really get better, or just more expensive? Did eval scores improve across the board, or only on a few runs? The Phoenix team has made several improvements to the

Groq Inc (@groqinc) 's Twitter Profile Photo

Join Groq Inc , Google and Arize AI at Betaworks NYC on Sept 10, 6–9PM ET to learn how to ship realtime, reliable agents. Groq will show how open source models now rival frontier intelligence without the latency. Details 👇

Join <a href="/GroqInc/">Groq Inc</a> , <a href="/Google/">Google</a> and <a href="/arizeai/">Arize AI</a> at Betaworks NYC on Sept 10, 6–9PM ET to learn how to ship realtime, reliable agents.

Groq will show how open source models now rival frontier intelligence without the latency.

Details 👇
Arize AI (@arizeai) 's Twitter Profile Photo

If you're debating whether to make the jump, our own Alec Swanson tackles the major differences between Cursor and @Claude_Code and some power user techniques for the latter. bit.ly/4lPZYYr

If you're debating whether to make the jump, our own Alec Swanson tackles the major  differences between <a href="/cursor_ai/">Cursor</a> and @Claude_Code and some power user techniques for the latter. bit.ly/4lPZYYr
Arize AI (@arizeai) 's Twitter Profile Photo

BERLIN: join Dat Ngo at Qdrant Vector Space Day, where he'll be covering how to build self-improving evals for agentic RAG. RSVP: bit.ly/4lZYjQe

BERLIN: join <a href="/dat_attacked/">Dat Ngo</a> at <a href="/qdrant_engine/">Qdrant</a> Vector Space Day, where he'll be covering how to build self-improving evals for agentic RAG. RSVP: bit.ly/4lZYjQe
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

The “AI Evals for Engineers & PMs” course by Hamel Husain & Shreya Shankar nails a huge need: practical processes + tools for evaluating AI & agent apps. In the latest cohort, our team guest-lectured on arize-phoenix (shoutout Mikyo Sally-Ann Delucia Srilakshmi Chavali Priyan Jindal)

The “AI Evals for Engineers &amp; PMs” course by <a href="/HamelHusain/">Hamel Husain</a> &amp; <a href="/sh_reya/">Shreya Shankar</a> nails a huge need: practical processes + tools for evaluating AI &amp; agent apps.

In the latest cohort, our team guest-lectured on <a href="/ArizePhoenix/">arize-phoenix</a> (shoutout <a href="/mikeldking/">Mikyo</a> Sally-Ann Delucia <a href="/schavalii/">Srilakshmi Chavali</a> <a href="/PriyanJindal/">Priyan Jindal</a>)
Arize AI (@arizeai) 's Twitter Profile Photo

Arjun Mukerji, PhD, of @atroposhealth will be presenting his paper on LLM summarization of real-world evidence studies at our next community paper reading! RSVP: bit.ly/3K5PiHS

Arjun Mukerji, PhD, of @atroposhealth will be presenting his paper on LLM summarization of real-world evidence studies at our next community paper reading! RSVP: bit.ly/3K5PiHS
Arize AI (@arizeai) 's Twitter Profile Photo

We are excited for what's possible with Dify and Arize 🤝 Building AI agents is fast & intuitive with Dify, but keeping them accurate and reliable at scale can be a challenge. That’s where Arize comes in: trace every agent step, debug failures, run structured evaluations,

arize-phoenix (@arizephoenix) 's Twitter Profile Photo

💡 Your LLM might ace English queries… but what happens when your users switch to Spanish, Hindi, or Mandarin? For teams building global AI systems, this is the hidden challenge: LLMs often fail to generate correct Cypher queries across languages. That means gaps in reasoning,

💡 Your LLM might ace English queries… but what happens when your users switch to Spanish, Hindi, or Mandarin?

For teams building global AI systems, this is the hidden challenge: LLMs often fail to generate correct Cypher queries across languages. That means gaps in reasoning,
Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Everyone is shipping agents right now. With so many agent frameworks popping up, the choice comes down to which one actually fits what you want your agent to do. We broke down orchestrator-worker workflows for 6 of the most common frameworks: @Agno, autogen, あい, @openai

Dat Ngo (@dat_attacked) 's Twitter Profile Photo

Excited to see the open source community for AI Engineer Paris! arize-phoenix will be out in full force, so if you're there, please stop by and say hi if you're an open source advocate! Aparna Dhinakaran will be giving one of her epic talks on Prompt Learning for Agents, so

Excited to see the open source community for <a href="/aiDotEngineer/">AI Engineer</a> Paris!

<a href="/ArizePhoenix/">arize-phoenix</a> will be out in full force, so if you're there, please stop by and say hi if you're an open source advocate!

<a href="/aparnadhinak/">Aparna Dhinakaran</a> will be giving one of her epic talks on Prompt Learning for Agents, so
arize-phoenix (@arizephoenix) 's Twitter Profile Photo

See the new Phoenix evals library in action and bring your questions for our virtual workshop this Thursday! RSVP: luma.com/45eopucf

See the new Phoenix evals library in action and bring your  questions for our virtual workshop this Thursday!  RSVP: luma.com/45eopucf
arize-phoenix (@arizephoenix) 's Twitter Profile Photo

Comparing experiments in Phoenix just got a lot easier. The new List View helps you quickly scan results with per-example metrics, while the Metrics View gives you a high-level look at how changes impact cost, latency, tokens, and more. Upgrade to v11.32.1 to start exploring.