PareaAI (@pareaai) Twitter Tweets • TwiCopy

Joschka Braun

@joschkabraun

2 years ago

Blog: docs.parea.ai/blog/self-impr… Docs: docs.parea.ai/manual-review/…

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

This method is powered by DSPy from Omar Khattab and inspired by the work of Shreya Shankar: arxiv.org/pdf/2404.12272 arxiv.org/pdf/2401.03038 Also, thanks to Eugene Yan sharing JudgeBench: arxiv.org/abs/2406.18403

thumb_up_off_alt22

chat_bubble_outline1

repeat4

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

Moving from demos to production-ready LLM apps can be challenging. In this post, I outline a practical workflow to help teams make this transition, focusing on: - Hypothesis testing - Dataset creation - Effective evals - Experimentation Full post here: zurl.co/27Ad

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Joschka Braun

@joschkabraun

2 years ago

If you use structured outputs with Instructor, track validation errors instantly with PareaAI. Concretely, the integration automatically: - groups any LLM call due to retries together under a single trace - tracks any field which failed validation with the respective error

If you use structured outputs with Instructor, track validation errors instantly with <a href="/PareaAI/">PareaAI</a>.

Concretely, the integration automatically:
- groups any LLM call due to retries together under a single trace
- tracks any field which failed validation with the respective error

thumb_up_off_alt17

chat_bubble_outline2

repeat6

shareShare

Joschka Braun

@joschkabraun

2 years ago

Blog: python.useinstructor.com/blog/2024/07/1…

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

At this point I could probably have an llm monitor the top foundation model providers and then produce a PR for me that adds any new models to PareaAI the moment they launch.

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

There have been so many new models lately. Most recently, Mistral AI 's codestral-mamba. I figured it'd be great to highlight how to use PareaAI for Regression Testing. Check out the Notebook below, where I test codestral-latest vs mamba on LeetCode questions. 👇

There have been so many new models lately. Most recently, <a href="/MistralAI/">Mistral AI</a> 's codestral-mamba. I figured it'd be great to highlight how to use <a href="/PareaAI/">PareaAI</a> for Regression Testing. Check out the Notebook below, where I test codestral-latest vs mamba on LeetCode questions. 👇

thumb_up_off_alt7

chat_bubble_outline2

repeat3

shareShare

Joschka Braun

@joschkabraun

2 years ago

📝 Updated self-deployment docs ⭐️ Deploy PareaAI on-prem via Docker in 4 steps: 1. Clone the repo 2. Specify organization slug 3. Pull docker images & run them 4. Point SDK backend URL to self-deployed backend URL 🔗 -> 🧵

thumb_up_off_alt3

chat_bubble_outline1

repeat1

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

With the latest Groq Inc models for tool calling, we figured it was time to make Groq available across PareaAI's playground and SDK's. Be on the lookout for an updated tool-calling benchmark, OpenAI v Claude v Groq!

With the latest <a href="/GroqInc/">Groq Inc</a> models for tool calling, we figured it was time to make Groq available across <a href="/PareaAI/">PareaAI</a>'s playground and SDK's. Be on the lookout for an updated tool-calling benchmark, OpenAI v Claude v Groq!

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

Def agree this could be great. Probably best if you can train the router yourself. Anyscale's RouterLLM tracing support with PareaAI

Def agree this could be great. Probably best if you can train the router yourself. <a href="/anyscalecompute/">Anyscale</a>'s RouterLLM tracing support with <a href="/PareaAI/">PareaAI</a>

thumb_up_off_alt2

chat_bubble_outline2

repeat3

shareShare

Cyrus

@cyrusnewday

2 years ago

And to help you understand what's going on, we integrate with observability platforms like arize-phoenix, LangChain's LangSmith, langfuse.com, PareaAI, and Lunary AI so you can explore the experiments that zenbase/core automates. Cookbooks here: github.com/zenbase-ai/cor…

thumb_up_off_alt17

chat_bubble_outline3

repeat5

shareShare

Joschka Braun

@joschkabraun

2 years ago

Day 1 support for llama 3.1 via Fireworks AI in PareaAI's playground! 🧨🦙

Day 1 support for llama 3.1 via <a href="/FireworksAI_HQ/">Fireworks AI</a> in <a href="/PareaAI/">PareaAI</a>'s playground! 🧨🦙

thumb_up_off_alt7

chat_bubble_outline0

repeat2

shareShare

Joschka Braun

@joschkabraun

2 years ago

📝 Updated integration docs ⭐️ Checkout PareaAI's updated docs to automatically trace apps powered by @LangChain, instructor by jason liu, LiteLLM (YC W23), DSPy by Omar Khattab, SGLang by LMSYS Org, and Trigger.dev. Docs: docs.parea.ai/integrations/o…

📝 Updated integration docs ⭐️

Checkout <a href="/PareaAI/">PareaAI</a>'s updated docs to automatically trace apps powered by @LangChain, instructor by <a href="/jxnlco/">jason liu</a>, <a href="/LiteLLM/">LiteLLM (YC W23)</a>, DSPy by <a href="/lateinteraction/">Omar Khattab</a>, SGLang by <a href="/lmsysorg/">LMSYS Org</a>, and <a href="/triggerdotdev/">Trigger.dev</a>.

Docs: docs.parea.ai/integrations/o…

thumb_up_off_alt4

chat_bubble_outline0

repeat3

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

There are so many “black box” evals that force users to instantiate eval classes. Never fully understood this. At PareaAI we see evals as just functions. You can copy the source code and modify as you see fit, all OSS and based on latest research. Check these out👇🏾

thumb_up_off_alt1

chat_bubble_outline2

repeat2

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

PareaAI Also, learn more about the research behind each here: docs.parea.ai/blog/eval-metr…

thumb_up_off_alt2

chat_bubble_outline0

repeat3

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

cohere 's actually pretty awesome. More folks should be exploring their models. PareaAI , now has auto-instrumentation for the Cohere py sdk 🚀

<a href="/cohere/">cohere</a> 's actually pretty awesome. More folks should be exploring their models. <a href="/PareaAI/">PareaAI</a> , now has auto-instrumentation for the Cohere py sdk 🚀

thumb_up_off_alt6

chat_bubble_outline2

repeat3

shareShare

Joel Alexander

@joel_a_wilde

2 years ago

🚀 New deep dive notebook on PareaAI experiments and LLM evals 📝🔬. I cover some of the key functionalities illustrating the power and flexibility of our API. 🔽 Link in comments 🔽

thumb_up_off_alt5

chat_bubble_outline1

repeat3

shareShare

Joschka Braun

@joschkabraun

2 years ago

Saturdays are for doc upgrades

thumb_up_off_alt6

chat_bubble_outline0

repeat1

shareShare

Joschka Braun

@joschkabraun

2 years ago

How do you detect unreliable behavior of your LLM app? Recently, we talked to the team at Sixfold and they shared with us a simple, yet powerful way to assess the reliability of their LLM app using PareaAI. More about how they test their risk assessment AI solution for

How do you detect unreliable behavior of your LLM app?

Recently, we talked to the team at <a href="/sixfoldai/">Sixfold</a> and they shared with us a simple, yet powerful way to assess the reliability of their LLM app using <a href="/PareaAI/">PareaAI</a>. More about how they test their risk assessment AI solution for

thumb_up_off_alt10

chat_bubble_outline1

repeat3

shareShare

Tom Dörr

@tom_doerr

a year ago

.PareaAI also looks like a good LLM monitoring tool and is open source

.<a href="/PareaAI/">PareaAI</a> also looks like a good LLM monitoring tool and is open source

thumb_up_off_alt21

chat_bubble_outline2

repeat4

shareShare