PatronusAI (@patronusai) Twitter Tweets • TwiCopy

PatronusAI

6 months ago

Welcome Peng Wang to the team! 🎉 Peng joins Patronus AI as Head of Applied Research. Previously, he was Head of Research at Grammarly, Head of AI at AlphaSense, and an ML Engineer at Google Research. Peng’s research interests include: LLM personalization and contextualization,

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

PatronusAI

@patronusai

6 months ago

Thank you, Professor zhou Yu and Berkeley Summit House, for the AI Agents in Action: Industry × Academia Exchange! Rebecca Qian, our CTO, was on a panel with Vinay Rao (Advisor at Anthropic), Shunyu Yao (Research Scientist at OpenAI), Robert Parker (Founder of Perceptix),

Thank you, Professor <a href="/Zhou_Yu_AI/">zhou Yu</a> and <a href="/bklsummithouse/">Berkeley Summit House</a>, for the AI Agents in Action: Industry × Academia Exchange!

<a href="/rebeccatqian/">Rebecca Qian</a>, our CTO, was on a panel with Vinay Rao (Advisor at <a href="/AnthropicAI/">Anthropic</a>), <a href="/ShunyuYao12/">Shunyu Yao</a> (Research Scientist at <a href="/OpenAI/">OpenAI</a>), Robert Parker (Founder of Perceptix),

thumb_up_off_alt10

chat_bubble_outline0

repeat2

shareShare

PatronusAI

@patronusai

6 months ago

We’re up to exciting things here at Patronus AI, working at the forefront of AI optimization and evaluation! Recently, we launched Percival, a SOTA AI Agent debugger, and have previously released industry-standard benchmarks for agents like TRAIL and BLUR, as well as

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

MLCommons

@mlcommons

6 months ago

Today, MLCommons is announcing a new collaboration with contributors from across academia, civil society, and industry to co-develop an open agent reliability evaluation standard to operationalize trust in agentic deployments. 1/3 🔗 mlcommons.org/2025/06/ares-a…

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

PatronusAI

@patronusai

6 months ago

Using your best AI debugger just got easier -- spotlighting our Percival Integrations! 🛡️ Percival is a highly intelligent agent and AI debugger, capable of detecting 20+ failure modes in agentic traces and suggesting optimizations. Generally, these agentic systems run into

thumb_up_off_alt2

chat_bubble_outline1

repeat0

shareShare

PatronusAI

@patronusai

5 months ago

At PatronusAI, we're excited to publish a new article on the best practices for Agentic Workflow. 🚀 In this article, you will learn about agentic workflows, which involve specialized AI agents collaborating to solve complex problems without human intervention, and their

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

PatronusAI

@patronusai

5 months ago

Meet Snigdha Banda our FDE Lead! 🎉 Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with

Meet <a href="/snigdhabanda/">Snigdha Banda</a> our FDE Lead! 🎉

Snigdha has been on our team since this past November and is a valued leader, teammate, and friend. Today, we wanted to highlight her work as a Forward-Deployed Engineer, a role that is becoming one of the most sought-after jobs in tech, with

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

PatronusAI

@patronusai

5 months ago

At PatronusAI, we're excited to publish a new article on the best practices for using AI agent platforms. 🚀 In this article, you will learn about various AI agent platforms like n8n.io, Make, LangChain, CrewAI, and Hugging Face smolagents. The article provides

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

Baseten

@basetenco

5 months ago

Building reliable agents requires a different tech stack: one that natively supports compound AI systems and evaluates quality along the full trajectory of agent behavior. We teamed up with PatronusAI to break down what this stack looks like, from infra and models to debuggers.

thumb_up_off_alt17

chat_bubble_outline2

repeat3

shareShare

Baseten

@basetenco

5 months ago

You can read the blog here: baseten.co/blog/how-to-bu…

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

PatronusAI

@patronusai

5 months ago

Introducing Prompt Management on the PatronusAI platform! Prompting is an essential part of the development and evaluation process, allowing for necessary human-in-the-loop exchanges and improvements. However, managing prompts is challenging and messy, with even the smallest

Introducing Prompt Management on the <a href="/PatronusAI/">PatronusAI</a> platform!

Prompting is an essential part of the development and evaluation process, allowing for necessary human-in-the-loop exchanges and improvements. However, managing prompts is challenging and messy, with even the smallest

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

PatronusAI

@patronusai

5 months ago

Unleash the Power of AI Oversight with PatronusAI x Databricks 🎉 With the Patronus AI integration into Databricks MLFlow, you can now trace and transport (via OTel) your logs to the Patronus AI platform backend for detailed analysis. You’ll receive real-time monitoring,

Unleash the Power of AI Oversight with <a href="/PatronusAI/">PatronusAI</a> x <a href="/databricks/">Databricks</a> 🎉

With the Patronus AI integration into Databricks MLFlow, you can now trace and transport (via OTel) your logs to the Patronus AI platform backend for detailed analysis.

You’ll receive real-time monitoring,

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

PatronusAI

@patronusai

4 months ago

Thank you, UC Berkeley RDI, for hosting the Agentic AI Summit and having us! Darshan Deshpande, one of our research scientists, who leads agent evaluation here at Patronus, presented at the summit! Here are a few takeaways: * Given context explosion and increasing domain depth and

Thank you, <a href="/BerkeleyRDI/">UC Berkeley RDI</a>, for hosting the Agentic AI Summit and having us!

<a href="/getdarshan/">Darshan Deshpande</a>, one of our research scientists, who leads agent evaluation here at Patronus, presented at the summit!

Here are a few takeaways:
* Given context explosion and increasing domain depth and

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

PatronusAI

@patronusai

4 months ago

Our team has been collaborating with Etsy(for a while now) on exciting multimodal evaluations! Last week, we had the opportunity to synthesize learnings from our suite of projects when Varun Joshi, our Head of Engineering, presented at the Etsy ML Summit! Thank you for having

Our team has been collaborating with <a href="/Etsy/">Etsy</a>(for a while now) on exciting multimodal evaluations! Last week, we had the opportunity to synthesize learnings from our suite of projects when <a href="/varjoshi/">Varun Joshi</a>, our Head of Engineering, presented at the Etsy ML Summit!

Thank you for having

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

PatronusAI

@patronusai

4 months ago

At PatronusAI, we're excited to publish a new article on the best practices for custom optimization tools for LLMs. 🚀 In this article, you will learn how large language models (LLMs) are being integrated into application development, with an overview of the tools and

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare

PatronusAI

@patronusai

4 months ago

Introducing Prompt Tester on the Patronus AI platform! Prompt Tester allows you to build more robust prompts across different types of contexts to test out their effectiveness. Prompt. Test. Evaluate. Iterate. Read more here: patronus.ai/blog/prompt-te… Give it a try and let us

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

PatronusAI

@patronusai

4 months ago

This past weekend, our Head of Applied Research, Peng Wang, was invited to give a talk on AI Oversight at Scale: Navigating the Challenges of Evaluating LLMs and Agents. Peng shared our vision of scalable AI oversight being the biggest problem facing widespread societal

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

PatronusAI

@patronusai

4 months ago

Evaluators are at the heart of the Patronus AI platform, and clients across industries have found them helpful in evaluating context and answer relevance, detecting hallucinations, and analyzing multimodal content! If you’re new to evaluators, this blog post will give a quick

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

PatronusAI

@patronusai

4 months ago

Welcome Josh Weimer to the team! 🎉 Josh joins PatronusAI as a Forward Deployed Engineer. Previously, he worked in the GovTech space, where he supported agencies including the Department of Defense, Department of Justice, Office of Personnel Management, and Department of the

Welcome Josh Weimer to the team! 🎉

Josh joins <a href="/PatronusAI/">PatronusAI</a> as a Forward Deployed Engineer. Previously, he worked in the GovTech space, where he supported agencies including the Department of Defense, Department of Justice, Office of Personnel Management, and Department of the

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare