Gentrace (@gentraceai) Twitter Tweets • TwiCopy

Doug Safreno

a year ago

Most engineers approach LLM-as-a-judge all wrong. The usual high-level metrics like hallucination or safety rarely tell you if your app actually works as intended. Let’s talk about why that’s a problem and how to fix it:

thumb_up_off_alt10

chat_bubble_outline4

repeat5

shareShare

Doug Safreno

@dougsafreno

a year ago

Big news today: Gentrace raised our $8M Series A led by @MatrixVC. We’re celebrating by launching Experiments, the first collaborative testing environment for LLM product development.

thumb_up_off_alt99

chat_bubble_outline26

repeat25

shareShare

Gentrace

@gentraceai

a year ago

What if your LLM testing system could automatically optimize your prompts? That's where we're headed with Experiments, our new feature helping developers speed up last-mile tuning. Here's how it works: Unlike prompt playgrounds, Experiments provides a testing environment

thumb_up_off_alt7

chat_bubble_outline1

repeat3

shareShare

Gentrace

@gentraceai

a year ago

With the rise of LLMs, the Webflow team set an ambitious goal to use natural language to make modifications to websites. To set up evals, they chose Gentrace. With Gentrace, the Webflow team: - Evaluates multimodal outputs (like website screenshots) using human and

With the rise of LLMs, the <a href="/webflow/">Webflow</a> team set an ambitious goal to use natural language to make modifications to websites. To set up evals, they chose Gentrace.

With Gentrace, the Webflow team:

- Evaluates multimodal outputs (like website screenshots) using human and

thumb_up_off_alt6

chat_bubble_outline1

repeat2

shareShare

Doug Safreno

@dougsafreno

a year ago

I'll die on this hill: people are ripping on LLM as a judge because they aren't doing it right. 80% are asking the LLM the same question as the prompt. Avoid being the 80% by giving your LLM as a judge an "unfair advantage" — aka some additional context or capability that makes

thumb_up_off_alt12

chat_bubble_outline1

repeat5

shareShare

Daniel C. Liem

@damndanielliem

a year ago

Been a chaotic but fun year at Gentrace. We hosted 10+ community events, partnering with Modal, Webflow, Mem, AI Engineer, Matrix and Headline, to host dinners, tech talks, conferences, and office parties. Intensity is high, but we also have fun along

Been a chaotic but fun year at <a href="/GentraceAI/">Gentrace</a>.

We hosted 10+ community events, partnering with <a href="/modal_labs/">Modal</a>, <a href="/webflow/">Webflow</a>, <a href="/memdotai/">Mem</a>, <a href="/aiDotEngineer/">AI Engineer</a>, <a href="/matrixvc/">Matrix</a> and <a href="/HeadlineVC/">Headline</a>, to host dinners, tech talks, conferences, and office parties.

Intensity is high, but we also have fun along

thumb_up_off_alt15

chat_bubble_outline2

repeat2

shareShare

Gentrace

@gentraceai

a year ago

It’s been an exciting year for us with lots of new releases and bugs fixed. Here’s a recap of our 5 favorite things we shipped: 1. Datasets support—organize test data into separate groups within a pipeline 2. Compare—updated compare mode for easily viewing outputs and test

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Gentrace

@gentraceai

a year ago

It was a big year for Gentrace in 2024! 📈 21M+ evals and traces ran, growing 2x in the last 6 months (up from 40k when we first started Oct 2023!) 🤖 19 features and improvements launched 🪲 320 bugs squashed Thank you to our customers and team for making this possible. 🫶

thumb_up_off_alt4

chat_bubble_outline0

repeat1

shareShare

Vivek Nair

@virtuallyvivek

a year ago

How do you process 45,000 tasks/day without adding infrastructure complexity? At @gentraceai, we built a task queue with PostgreSQL using FOR UPDATE SKIP LOCKED. It’s simple and reliable while handling retries and large payloads. Clone our implementation for your own project:

How do you process 45,000 tasks/day without adding infrastructure complexity?

At @gentraceai, we built a task queue with <a href="/PostgreSQL/">PostgreSQL</a> using FOR UPDATE SKIP LOCKED. It’s simple and reliable while handling retries and large payloads.

Clone our implementation for your own project:

thumb_up_off_alt9

chat_bubble_outline1

repeat3

shareShare

Gentrace

@gentraceai

a year ago

We're excited to join the Founders You Should Know showcase on Feb 5th in San Francisco! Meet our cofounders, Doug Safreno and Vivek Nair, and learn how we're helping teams at Webflow and Quizlet test their AI apps. Apply here: newsletter.foundersysk.com/p/2025s-breako…

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Gentrace

@gentraceai

a year ago

Self-hosted just got an upgrade. Now you can deploy Gentrace in your Kubernetes cluster with: - Helm charts for quick setup - @istiomesh for secure service-to-service communication - Support for your existing data infra (Postgres, Kafka, S3) Get started with our guide:

Self-hosted just got an upgrade. Now you can deploy Gentrace in your <a href="/kubernetesio/">Kubernetes</a> cluster with:

- Helm charts for quick setup
- @istiomesh for secure service-to-service communication
- Support for your existing data infra (Postgres, Kafka, S3)

Get started with our guide:

thumb_up_off_alt5

chat_bubble_outline0

repeat3

shareShare

Gentrace

@gentraceai

10 months ago

Last week, we hosted some incredible AI builders from companies like Asana, Cribl, Block, Vanta, and Pinterest to share their stories on shipping AI apps to production. What we learned is that the road from POC to production isn't a straight path: 1. Turns out the biggest

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Gentrace

@gentraceai

10 months ago

Agents are changing everything about how we build software. Will agents replace purpose-built tools or create entirely new opportunities? Learn where agents are actually headed this Thurs 2/6 at Gentrace SF with our speakers brryant Rodrigo Davies Prabhav Jain Elaine Zelby

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Gentrace

@gentraceai

10 months ago

What a night! Packed house and some of the sharpest minds in AI from Webflow, Asana, and 11x sharing how they’re building with agents today. Big thanks to Ampersand for co-hosting and our speakers our speakers brryant Rodrigo Davies Prabhav Jain Elaine Zelby Patrick Thompson for

thumb_up_off_alt7

chat_bubble_outline2

repeat1

shareShare

Gentrace

@gentraceai

10 months ago

Multiverse is using AI to improve how students learn on the job. Their AI team uses LLMs for delivering realtime feedback to students, requiring high quality and reliability. Before Gentrace, evals were managed in spreadsheets, creating bottlenecks. Now they: - Use LLM

thumb_up_off_alt2

chat_bubble_outline0

repeat0

shareShare

Gentrace

@gentraceai

10 months ago

Your dataset will never be perfect! But you need one to get started with evals. Instead, the best AI teams don’t chase a "golden" dataset. They start small with 5-10 examples, capture real production use cases, and iterate continuously. Here’s a practical guide for building

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Gentrace

@gentraceai

9 months ago

LLM-as-a-judge evals often fail because they ask the same question twice. Instead, give the model an unfair advantage: extra context, constraints, or comparisons that make grading easier than generation. Here's a guide breaking down our approach: go.gentrace.ai/RnT9dxG

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Doug Safreno

@dougsafreno

4 months ago

Agents are significantly more powerful than standalone LLM calls. But, debugging them is a nightmare. You can trace their reasoning and tool use, but traces get huge and are impossible to parse. To solve this, we spent the lasts several months building Gentrace for Agents,

thumb_up_off_alt13

chat_bubble_outline0

repeat5

shareShare