RELAI (@reliableai) Twitter Tweets • TwiCopy

RELAI

@reliableai

a year ago

x.com/i/article/1856…

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

We evaluated several hallucination detection methods on OpenAI's recently released SimpleQA benchmark. RELAI agents detected over 76% of GPT-4o's hallucinations with just a 5% false positive rate. Even more impressively, RELAI detected nearly 1/3 of GPT-4o's hallucinations with

thumb_up_off_alt6

chat_bubble_outline0

repeat4

shareShare

RELAI

@reliableai

a year ago

We are hiring! Join us to make AI reliability achievable and accessible for everyone! Details below:

thumb_up_off_alt3

chat_bubble_outline0

repeat0

shareShare

Soheil Feizi

@feizisoheil

a year ago

Honored to have contributed to the House Bipartisan Task Force on Artificial Intelligence's report. Learn more about it here: science.house.gov/press-releases… Access the full report: speaker.gov/wp-content/upl… I hope this effort contributes to advancing AI that is reliable, safe, and

thumb_up_off_alt17

chat_bubble_outline0

repeat4

shareShare

RELAI

@reliableai

a year ago

Congratulations to RELAI's founder & CEO for receiving the #PECASE award!

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

RELAI

@reliableai

a year ago

We're hiring a backend developer! Apply here: forms.gle/AWVQZd59TNLtED…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

RELAI

@reliableai

a year ago

🚀 Meet our Data Agents! 📅 Want your own custom benchmarks? Book a demo here: calendly.com/d/crx2-k7b-pcm

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

RELAI

@reliableai

a year ago

Vote! Quantitative results on RELAI leaderboard will be released soon! ⌛

thumb_up_off_alt3

chat_bubble_outline0

repeat2

shareShare

RELAI

@reliableai

a year ago

Our leaderboard is now live! Check it out at: relai.ai Let us know if you want to add your model or data to our leaderboard: calendly.com/d/crx2-k7b-pcm

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

RELAI

@reliableai

a year ago

Our legal reasoning benchmark is live! Check it out! Let us know if you want to add your model or data to our leaderboard: calendly.com/d/crx2-k7b-pcm

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Soheil Feizi

@feizisoheil

9 months ago

Looking forward to the Agentic AI Summit in Berkeley! If you’re working on agentic systems and interested in deep AI agent optimization, let’s chat!

thumb_up_off_alt29

chat_bubble_outline0

repeat2

shareShare

Soheil Feizi

@feizisoheil

9 months ago

How good (or bad) is GPT-5 — and does it matter for you? I’ve been seeing a lot of posts lately debating the quality of GPT-5’s responses. I tried a few of the examples people mentioned. Here’s one from my own experiment (screenshot attached): I asked GPT-5 to solve a simple

thumb_up_off_alt35

chat_bubble_outline7

repeat6

shareShare

Soheil Feizi

@feizisoheil

8 months ago

Introducing Maestro: the holistic optimizer for AI agents. Maestro optimizes the agent graph and tunes prompts/models/tools, fixing agent failure modes that prompt-only or RL weight tuning can’t touch. Maestro outperforms leading prompt optimizers (e.g., MIPROv2, GEPA) on

thumb_up_off_alt328

chat_bubble_outline18

repeat56

shareShare

RELAI

@reliableai

8 months ago

Prompt Tuning ≠ System Tuning. Most AI agent failures are structural; we keep the agent graph frozen (modules & info flow), then wonder why agents hallucinate, misroute tools, or break guidelines. Meet Maestro: the first joint graph + config optimizer for AI agents. It

thumb_up_off_alt6

chat_bubble_outline0

repeat3

shareShare

Soheil Feizi

@feizisoheil

8 months ago

let’s talk instruction-following: In prod, “did it follow the spec?” matters more than vibes. IFBench is a challenging benchmark to check whether agents/models obey unseen output/format constraints (length windows, HTML/Markdown rules, sectioning, etc.). That’s a real

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare

RELAI

@reliableai

6 months ago

🎃 Here’s a sweet Halloween treat from RELAI: We built an AI agent that maps the best trick-or-treat route for you—optimized for time, distance, candy variety, and real walking paths. 👉 Try it free: platform.relai.ai/halloween Built at RELAI.ai, where we ship

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Soheil Feizi

@feizisoheil

6 months ago

🚀 Build AI agents that actually work — in just 2 hours! We’re launching Reliable AI Agent Sprints—free, fully virtual sessions to build practical, reliable agentic solutions. This isn’t a flashy demo contest; we’ll design, simulate, evaluate, and optimize real agents,