baserun.ai (@baserunai) 's Twitter Profile
baserun.ai

@baserunai

A collaborative platform that enables engineers and product experts to build, monitor, and improve their AI.
(Acquired)

ID: 1689080325211828226

calendar_today09-08-2023 01:05:18

83 Tweet

317 Followers

24 Following

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

New Offline Evaluation Reports: Many UI improvements have been made to how metrics and evaluation details are displayed.

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

This week at Baserun: Developers can now create evaluators using custom code or a custom LLM prompt to grade testing results pre-release or to monitor production post-release.

This week at Baserun:
Developers can now create evaluators using custom code or a custom LLM prompt to grade testing results pre-release or to monitor production post-release.
Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Support both unit tests and end-to-end tests. Like traces, users can assess the outputs for a particular LLM call or the entire pipeline.

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Comparing changes side-by-side: With our comparison report, developers can compare two branches before merging a PR to prevent regressions.

Comparing changes side-by-side: With our comparison report, developers can compare two branches before merging a PR to prevent regressions.
Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Maximum flexibility: Baserun supports automatic evaluators, human evaluators, feedback, and checks. Automatic evaluators include string match, fuzzy match, JSON validation, and Regex match. Model-graded evaluators include fact checking, closed QA, and security evaluations. Users

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Saving costs on evaluation: Baserun automatically caches your evaluation results to avoid running redundant assessments.

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Prototype and evaluate prompts in the Baserun UI: Anybody (not just developers!) can prototype ideas in the Playground, test and assess prompts with our Evaluation features directly in the UI, and then deploy those changes to staging or production environments using the Prompt

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Claude-3 is now available in Baserun Playground! Why Baserun Playground? - A collaborative workspace for teams to share prompts - Version history ensures you don't lose any changes - Dynamic Inputs for bulk testing - Compare prompt versions side by side lnkd.in/gUVw23WB

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Crafting the perfect prompt can be challenging. There are tons of prompt techniques out there, but mastering them still requires lots of trial and error. What if we could make it easier? We're teaching an AI to understand all these tricks and write prompts for you. Here is a

Levi (@levidjones) 's Twitter Profile Photo

chatted with Effy Zhang today about what she is building at baserun.ai. she's positioned to become the queen bee of ai tools imo. most co's are following the industry buzz to build consumer-facing AI tools, and it's so refreshing to see a startup building to support the buzz.

Chip Huyen (@chipro) 's Twitter Profile Photo

Problems I'd do if I'm to do a startup again (though I probably won't any time soon because startups are hard). If you’re solving any of them, I’d love to chat. 1. Data synthesis: AI has become really good both at generating and annotating data. The challenge now is to make sure

Effy Zhang (@effyyzhang) 's Twitter Profile Photo

Congratulations to DeepSource team on the launch of their Autofix AI feature, powered by baserun.ai! There are many agents who write code; DeepSource Autofix feature ensures that your code written by agents is production-ready.

Elpha (@elpha) 's Twitter Profile Photo

Effy Zhang joins us for Office Hours this week! She's the CEO & Founder at baserun.ai. Before baserun, Effy Zhang led design at Cruise and Square. ✨ Ask her a question on Elpha by 05/03:bit.ly/4aXaatp