Freddie Vargus (@freddie_v4) 's Twitter Profile
Freddie Vargus

@freddie_v4

CTO & co-founder @quotientai
Research @cohere_labs

past: evals @github Copilot, data @quantopian

Tico 🇨🇷🇺🇸

ID: 614740050

linkhttps://github.com/quotient-ai/judges calendar_today21-06-2012 23:20:04

524 Tweet

799 Followers

1,1K Following

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

Most teams only find out their AI is broken when someone complains or churns. Your agents shouldn’t fail silently. We’re launching Quotient AI Detections: a system to catch agent mistakes, identify how they happened, and automatically fix them.

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

If you're building AI apps and flying blind, we can help. → Sign up: app.quotientai.co → Grab $250 in credits with the ElevenLabs AI Engineer Pack: aiengineerpack.com → Join our Discord: discord.com/invite/YeJzANp… Let us help you understand how your agents fail,

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

We just launched Quotient AI Detections: the first system that helps teams catch AI failures before their users do. As part of the launch, we partnered with ElevenLabs to offer coupons through the AI Engineer Pack: → 1,000,000 extra logs → 10,000 free detections → $250+

Mingxuan (Aldous) Li (@itea1001) 's Twitter Profile Photo

HypoEval evaluators (github.com/ChicagoHAI/Hyp…) are now incorporated into judges from Quotient AI — check it out at github.com/quotient-ai/ju…!

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

detections go brrr One week in, Quotient AI Detections has processed 20M+ tokens, analyzed tens of thousands of logs, and caught thousands of hallucinations across real AI production apps. Still a long way to go, but we're committed to giving builders SOTA AI monitoring.

detections go brrr   

One week in, <a href="/QuotientAI/">Quotient AI</a> Detections has processed 20M+ tokens, analyzed tens of thousands of logs, and caught thousands of hallucinations across real AI production apps.

Still a long way to go, but we're committed to giving builders SOTA AI monitoring.
John Berryman (@jnbrymn) 's Twitter Profile Photo

You probably knew you could turn an LLM into a classifier. But the basic approach returns a hard classification – yes or no; good or bad. In this post I'll show you a simple technique to make your LLM classifiers more nuanced – they tell you HOW good or HOW bad something is. 👇

You probably knew you could turn an LLM into a classifier. But the basic approach returns a hard classification – yes or no; good or bad.

In this post I'll show you a simple technique to make your LLM classifiers more nuanced – they tell you HOW good or HOW bad something is. 👇
Freddie Vargus (@freddie_v4) 's Twitter Profile Photo

so who’s the bad node in the dependency graph right now? supabase, aws, gcp, cursor, cloudflare, azure… surely something upstream?

Julia Neagu (@juliaaneagu) 's Twitter Profile Photo

“You want your model hitting milestones, not minefields.” Most AI eval talk is hand-wavy. This isn’t. Freddie Vargus (Quotient AI CTO) gets into the weeds: how to actually test tool use, avoid minefields, and build agents that don’t break. Check out the recording👇