PatronusAI (@patronusai) 's Twitter Profile
PatronusAI

@patronusai

powerful ai evaluation and optimization 🦄

sign up: app.patronus.ai

ID: 1679186467640205314

linkhttps://www.patronus.ai calendar_today12-07-2023 17:50:30

316 Tweet

1,1K Followers

253 Following

PatronusAI (@patronusai) 's Twitter Profile Photo

Welcome Peng Wang to the team! 🎉 Peng joins Patronus AI as Head of Applied Research. Previously, he was Head of Research at Grammarly, Head of AI at AlphaSense, and an ML Engineer at Google Research. Peng’s research interests include: LLM personalization and contextualization,

Welcome Peng Wang to the team! 🎉 

Peng joins Patronus AI as Head of Applied Research. Previously, he was Head of Research at Grammarly, Head of AI at AlphaSense, and an ML Engineer at Google Research. Peng’s research interests include: LLM personalization and contextualization,
PatronusAI (@patronusai) 's Twitter Profile Photo

Thank you, Professor zhou Yu and Berkeley Summit House, for the AI Agents in Action: Industry Ă— Academia Exchange! Rebecca Qian, our CTO, was on a panel with Vinay Rao (Advisor at Anthropic), Shunyu Yao (Research Scientist at OpenAI), Robert Parker (Founder of Perceptix),

Thank you, Professor <a href="/Zhou_Yu_AI/">zhou Yu</a> and <a href="/bklsummithouse/">Berkeley Summit House</a>, for the AI Agents in Action: Industry Ă— Academia Exchange!

<a href="/rebeccatqian/">Rebecca Qian</a>, our CTO, was on a panel with Vinay Rao (Advisor at <a href="/AnthropicAI/">Anthropic</a>), <a href="/ShunyuYao12/">Shunyu Yao</a> (Research Scientist at <a href="/OpenAI/">OpenAI</a>), Robert Parker (Founder of Perceptix),
PatronusAI (@patronusai) 's Twitter Profile Photo

We’re up to exciting things here at Patronus AI, working at the forefront of AI optimization and evaluation! Recently, we launched Percival, a SOTA AI Agent debugger, and have previously released industry-standard benchmarks for agents like TRAIL and BLUR, as well as

MLCommons (@mlcommons) 's Twitter Profile Photo

Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co-develop an open agent reliability evaluation standard to operationalize trust in agentic deployments. 1/3 🔗 mlcommons.org/2025/06/ares-a…

Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co-develop an open agent reliability evaluation standard to operationalize trust in agentic deployments.  1/3
🔗 mlcommons.org/2025/06/ares-a…
PatronusAI (@patronusai) 's Twitter Profile Photo

Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️ Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into

Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️

Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into
PatronusAI (@patronusai) 's Twitter Profile Photo

At PatronusAI, we're excited to publish a new article on the best practices for Agentic Workflow. 🚀 In this article, you will learn about agentic workflows, which involve specialized AI agents collaborating to solve complex problems without human intervention, and their

PatronusAI (@patronusai) 's Twitter Profile Photo

Meet Snigdha Banda our FDE Lead! 🎉 Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with

Meet <a href="/snigdhabanda/">Snigdha Banda</a> our FDE Lead! 🎉

Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with
PatronusAI (@patronusai) 's Twitter Profile Photo

At PatronusAI, we're excited to publish a new article on the best practices for using AI agent platforms. 🚀 In this article, you will learn about various AI agent platforms like n8n.io, Make, LangChain, CrewAI, and Hugging Face smolagents. The article provides

Baseten (@basetenco) 's Twitter Profile Photo

Building reliable agents requires a different tech stack: one that natively supports compound AI systems and evaluates quality along the full trajectory of agent behavior. We teamed up with PatronusAI to break down what this stack looks like, from infra and models to debuggers.

Building reliable agents requires a different tech stack: one that natively supports compound AI systems and evaluates quality along the full trajectory of agent behavior.

We teamed up with <a href="/PatronusAI/">PatronusAI</a> to break down what this stack looks like, from infra and models to debuggers.
PatronusAI (@patronusai) 's Twitter Profile Photo

Introducing Prompt Management on the PatronusAI platform! Prompting is an essential part of the development and evaluation process, allowing for necessary human-in-the-loop exchanges and improvements. However, managing prompts is challenging and messy, with even the smallest

Introducing Prompt Management on the <a href="/PatronusAI/">PatronusAI</a> platform!

Prompting is an essential part of the development and evaluation process, allowing for necessary human-in-the-loop exchanges and improvements. However, managing prompts is challenging and messy, with even the smallest
PatronusAI (@patronusai) 's Twitter Profile Photo

Unleash the Power of AI Oversight with PatronusAI x Databricks 🎉 With the Patronus AI integration into Databricks MLFlow, you can now trace and transport (via OTel) your logs to the Patronus AI platform backend for detailed analysis. You’ll receive real-time monitoring,

Unleash the Power of AI Oversight with <a href="/PatronusAI/">PatronusAI</a> x <a href="/databricks/">Databricks</a> 🎉

With the Patronus AI integration into Databricks MLFlow, you can now trace and transport (via OTel) your logs to the Patronus AI platform backend for detailed analysis.

You’ll receive real-time monitoring,
PatronusAI (@patronusai) 's Twitter Profile Photo

Thank you, UC Berkeley RDI, for hosting the Agentic AI Summit and having us! Darshan Deshpande, one of our research scientists, who leads agent evaluation here at Patronus, presented at the summit! Here are a few takeaways: * Given context explosion and increasing domain depth and

Thank you, <a href="/BerkeleyRDI/">UC Berkeley RDI</a>, for hosting the Agentic AI Summit and having us!

<a href="/getdarshan/">Darshan Deshpande</a>, one of our research scientists, who leads agent evaluation here at Patronus, presented at the summit!

Here are a few takeaways:
* Given context explosion and increasing domain depth and
PatronusAI (@patronusai) 's Twitter Profile Photo

Our team has been collaborating with Etsy(for a while now) on exciting multimodal evaluations! Last week, we had the opportunity to synthesize learnings from our suite of projects when Varun Joshi, our Head of Engineering, presented at the Etsy ML Summit! Thank you for having

Our team has been collaborating with <a href="/Etsy/">Etsy</a>(for a while now) on exciting multimodal evaluations! Last week, we had the opportunity to synthesize learnings from our suite of projects when <a href="/varjoshi/">Varun Joshi</a>, our Head of Engineering, presented at the Etsy ML Summit! 

Thank you for having
PatronusAI (@patronusai) 's Twitter Profile Photo

At PatronusAI, we're excited to publish a new article on the best practices for custom optimization tools for LLMs. 🚀 In this article, you will learn how large language models (LLMs) are being integrated into application development, with an overview of the tools and

PatronusAI (@patronusai) 's Twitter Profile Photo

Introducing Prompt Tester on the Patronus AI platform! Prompt Tester allows you to build more robust prompts across different types of contexts to test out their effectiveness. Prompt. Test. Evaluate. Iterate. Read more here: patronus.ai/blog/prompt-te… Give it a try and let us

Introducing Prompt Tester on the Patronus AI platform!

Prompt Tester allows you to build more robust prompts across different types of contexts to test out their effectiveness. 

Prompt. Test. Evaluate. Iterate.

Read more here: patronus.ai/blog/prompt-te…

Give it a try and let us
PatronusAI (@patronusai) 's Twitter Profile Photo

This past weekend, our Head of Applied Research, Peng Wang, was invited to give a talk on AI Oversight at Scale: Navigating the Challenges of Evaluating LLMs and Agents. Peng shared our vision of scalable AI oversight being the biggest problem facing widespread societal

This past weekend, our Head of Applied Research, Peng Wang, was invited to give a talk on AI Oversight at Scale: Navigating the Challenges of Evaluating LLMs and Agents. 

Peng shared our vision of scalable AI oversight being the biggest problem facing widespread societal
PatronusAI (@patronusai) 's Twitter Profile Photo

Evaluators are at the heart of the Patronus AI platform, and clients across industries have found them helpful in evaluating context and answer relevance, detecting hallucinations, and analyzing multimodal content! If you’re new to evaluators, this blog post will give a quick

Evaluators are at the heart of the Patronus AI platform, and clients across industries have found them helpful in evaluating context and answer relevance, detecting hallucinations, and analyzing multimodal content!

If you’re new to evaluators, this blog post will give a quick
PatronusAI (@patronusai) 's Twitter Profile Photo

Welcome Josh Weimer to the team! 🎉 Josh joins PatronusAI as a Forward Deployed Engineer. Previously, he worked in the GovTech space, where he supported agencies including the Department of Defense, Department of Justice, Office of Personnel Management, and Department of the

Welcome Josh Weimer to the team! 🎉

Josh joins <a href="/PatronusAI/">PatronusAI</a> as a Forward Deployed Engineer. Previously, he worked in the GovTech space, where he supported agencies including the Department of Defense, Department of Justice, Office of Personnel Management, and Department of the