PareaAI (@pareaai) 's Twitter Profile
PareaAI

@pareaai

Parea AI (YC S23) provides tools for evaluating, testing and monitoring LLM applications.

ID: 1643765340642455554

linkhttps://www.parea.ai calendar_today05-04-2023 23:59:46

242 Tweet

248 Takipçi

33 Takip Edilen

Joschka Braun (@joschkabraun) 's Twitter Profile Photo

This method is powered by DSPy from Omar Khattab and inspired by the work of Shreya Shankar: arxiv.org/pdf/2404.12272 arxiv.org/pdf/2401.03038 Also, thanks to Eugene Yan sharing JudgeBench: arxiv.org/abs/2406.18403

Joel Alexander (@joel_a_wilde) 's Twitter Profile Photo

Moving from demos to production-ready LLM apps can be challenging. In this post, I outline a practical workflow to help teams make this transition, focusing on: - Hypothesis testing - Dataset creation - Effective evals - Experimentation Full post here: zurl.co/27Ad

Joschka Braun (@joschkabraun) 's Twitter Profile Photo

If you use structured outputs with Instructor, track validation errors instantly with PareaAI. Concretely, the integration automatically: - groups any LLM call due to retries together under a single trace - tracks any field which failed validation with the respective error

If you use structured outputs with Instructor, track validation errors instantly with <a href="/PareaAI/">PareaAI</a>. 

Concretely, the integration automatically:
- groups any LLM call due to retries together under a single trace
- tracks any field which failed validation with the respective error
Joel Alexander (@joel_a_wilde) 's Twitter Profile Photo

At this point I could probably have an llm monitor the top foundation model providers and then produce a PR for me that adds any new models to PareaAI the moment they launch.

Joel Alexander (@joel_a_wilde) 's Twitter Profile Photo

There have been so many new models lately. Most recently, Mistral AI 's codestral-mamba. I figured it'd be great to highlight how to use PareaAI for Regression Testing. Check out the Notebook below, where I test codestral-latest vs mamba on LeetCode questions. 👇

There have been so many new models lately. Most recently, <a href="/MistralAI/">Mistral AI</a> 's codestral-mamba. I figured it'd be great to highlight how to use <a href="/PareaAI/">PareaAI</a> for Regression Testing.  Check out the Notebook below, where I test codestral-latest vs mamba on LeetCode questions. 👇
Joschka Braun (@joschkabraun) 's Twitter Profile Photo

📝 Updated self-deployment docs ⭐️ Deploy PareaAI on-prem via Docker in 4 steps: 1. Clone the repo 2. Specify organization slug 3. Pull docker images & run them 4. Point SDK backend URL to self-deployed backend URL 🔗 -> 🧵

Joel Alexander (@joel_a_wilde) 's Twitter Profile Photo

With the latest Groq Inc models for tool calling, we figured it was time to make Groq available across PareaAI's playground and SDK's. Be on the lookout for an updated tool-calling benchmark, OpenAI v Claude v Groq!

With the latest <a href="/GroqInc/">Groq Inc</a> models for tool calling, we figured it was time to make Groq available across <a href="/PareaAI/">PareaAI</a>'s playground and SDK's. Be on the lookout for an updated tool-calling benchmark, OpenAI v Claude v Groq!
Cyrus (@cyrusnewday) 's Twitter Profile Photo

And to help you understand what's going on, we integrate with observability platforms like arize-phoenix, LangChain's LangSmith, langfuse.com, PareaAI, and Lunary AI so you can explore the experiments that zenbase/core automates. Cookbooks here: github.com/zenbase-ai/cor…

Joschka Braun (@joschkabraun) 's Twitter Profile Photo

📝 Updated integration docs ⭐️ Checkout PareaAI's updated docs to automatically trace apps powered by @LangChain, instructor by jason liu, LiteLLM (YC W23), DSPy by Omar Khattab, SGLang by LMSYS Org, and Trigger.dev. Docs: docs.parea.ai/integrations/o…

📝 Updated integration docs ⭐️

Checkout <a href="/PareaAI/">PareaAI</a>'s updated docs to automatically trace apps powered by @LangChain, instructor by <a href="/jxnlco/">jason liu</a>, <a href="/LiteLLM/">LiteLLM (YC W23)</a>, DSPy by <a href="/lateinteraction/">Omar Khattab</a>, SGLang by <a href="/lmsysorg/">LMSYS Org</a>, and <a href="/triggerdotdev/">Trigger.dev</a>.

Docs: docs.parea.ai/integrations/o…
Joel Alexander (@joel_a_wilde) 's Twitter Profile Photo

There are so many “black box” evals that force users to instantiate eval classes. Never fully understood this. At PareaAI we see evals as just functions. You can copy the source code and modify as you see fit, all OSS and based on latest research. Check these out👇🏾

Joel Alexander (@joel_a_wilde) 's Twitter Profile Photo

🚀 New deep dive notebook on PareaAI experiments and LLM evals 📝🔬. I cover some of the key functionalities illustrating the power and flexibility of our API. 🔽 Link in comments 🔽

Joschka Braun (@joschkabraun) 's Twitter Profile Photo

How do you detect unreliable behavior of your LLM app? Recently, we talked to the team at Sixfold and they shared with us a simple, yet powerful way to assess the reliability of their LLM app using PareaAI. More about how they test their risk assessment AI solution for

How do you detect unreliable behavior of your LLM app?

Recently, we talked to the team at <a href="/sixfoldai/">Sixfold</a>  and they shared with us a simple, yet powerful way to assess the reliability of their LLM app using <a href="/PareaAI/">PareaAI</a>. More about how they test their risk assessment AI solution for