baserun.ai (@baserunai) Twitter Tweets • TwiCopy

This week at Baserun: Developers can now create evaluators using custom code or a custom LLM prompt to grade testing results pre-release or to monitor production post-release.

thumb_up_off_alt18

chat_bubble_outline1

repeat2

shareShare

Effy Zhang

@effyyzhang

2 years ago

Support both unit tests and end-to-end tests. Like traces, users can assess the outputs for a particular LLM call or the entire pipeline.

thumb_up_off_alt1

chat_bubble_outline1

repeat1

shareShare

Effy Zhang

@effyyzhang

2 years ago

Comparing changes side-by-side: With our comparison report, developers can compare two branches before merging a PR to prevent regressions.

thumb_up_off_alt0

chat_bubble_outline1

repeat1

shareShare

Maximum flexibility: Baserun supports automatic evaluators, human evaluators, feedback, and checks. Automatic evaluators include string match, fuzzy match, JSON validation, and Regex match. Model-graded evaluators include fact checking, closed QA, and security evaluations. Users

thumb_up_off_alt0

chat_bubble_outline1

repeat1

shareShare

Effy Zhang

@effyyzhang

2 years ago

Saving costs on evaluation: Baserun automatically caches your evaluation results to avoid running redundant assessments.

thumb_up_off_alt0

chat_bubble_outline1

repeat1

shareShare

Effy Zhang

@effyyzhang

2 years ago

Prototype and evaluate prompts in the Baserun UI: Anybody (not just developers!) can prototype ideas in the Playground, test and assess prompts with our Evaluation features directly in the UI, and then deploy those changes to staging or production environments using the Prompt

thumb_up_off_alt0

chat_bubble_outline2

repeat1

shareShare

Effy Zhang

@effyyzhang

2 years ago

Claude-3 is now available in Baserun Playground! Why Baserun Playground? - A collaborative workspace for teams to share prompts - Version history ensures you don't lose any changes - Dynamic Inputs for bulk testing - Compare prompt versions side by side lnkd.in/gUVw23WB

thumb_up_off_alt26

chat_bubble_outline1

repeat4

shareShare

Effy Zhang

@effyyzhang

2 years ago

Crafting the perfect prompt can be challenging. There are tons of prompt techniques out there, but mastering them still requires lots of trial and error. What if we could make it easier? We're teaching an AI to understand all these tricks and write prompts for you. Here is a

thumb_up_off_alt20

chat_bubble_outline0

repeat2

shareShare

Effy Zhang

@effyyzhang

2 years ago

Baserun can now evaluate & improve your prompts with AI — automatically! Try it now → app.baserun.ai/sign-up baserun.ai

thumb_up_off_alt28

chat_bubble_outline0

repeat4

shareShare

Levi

@levidjones

2 years ago

chatted with Effy Zhang today about what she is building at baserun.ai. she's positioned to become the queen bee of ai tools imo. most co's are following the industry buzz to build consumer-facing AI tools, and it's so refreshing to see a startup building to support the buzz.

thumb_up_off_alt22

chat_bubble_outline1

repeat2

shareShare

Irvin Zhan

@irvinzhan

2 years ago

Brian Lovin baserun.ai 💯

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

David Parks

@dparksdev

2 years ago

Brian Lovin baserun.ai Dude, thank you. Started using this yesterday and it’s a game changer.

thumb_up_off_alt7

chat_bubble_outline0

repeat1

shareShare

Chip Huyen

@chipro

2 years ago

Problems I'd do if I'm to do a startup again (though I probably won't any time soon because startups are hard). If you’re solving any of them, I’d love to chat. 1. Data synthesis: AI has become really good both at generating and annotating data. The challenge now is to make sure

thumb_up_off_alt843

chat_bubble_outline52

repeat92

shareShare

Matt Shumer

@mattshumer_

2 years ago

Prompt your way to PMF Then fine-tune to scale

thumb_up_off_alt301

chat_bubble_outline9

repeat26

shareShare

Effy Zhang

@effyyzhang

2 years ago

Baserun now supports Jinja2 template baserun.ai

Baserun now supports Jinja2 template <a href="/baserunai/">baserun.ai</a>

thumb_up_off_alt11

chat_bubble_outline0

repeat1

shareShare

Effy Zhang

@effyyzhang

2 years ago

Congratulations to DeepSource team on the launch of their Autofix AI feature, powered by baserun.ai! There are many agents who write code; DeepSource Autofix feature ensures that your code written by agents is production-ready.

thumb_up_off_alt16

chat_bubble_outline2

repeat1

shareShare

Elpha

@elpha

2 years ago

Effy Zhang joins us for Office Hours this week! She's the CEO & Founder at baserun.ai. Before baserun, Effy Zhang led design at Cruise and Square. ✨ Ask her a question on Elpha by 05/03:bit.ly/4aXaatp

thumb_up_off_alt1

chat_bubble_outline0

repeat1

shareShare