Mikyo (@mikeldking) Twitter Tweets • TwiCopy

I find great inspiration in others' eloquence. "How can we work wonderfully efficiently to create something with breathtaking quality" - Jony Ive youtube.com/shorts/3XJwM6N…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Wanted to showcase a probably not well understood (but increasingly powerful package) (pypi.org/project/openin…) It comes with things like decorators and utilities for customizing tracing but it also has some useful utilities that help you run evals and capture human feedback

thumb_up_off_alt4

chat_bubble_outline1

repeat3

shareShare

Mikyo

@mikeldking

4 months ago

Hill-climbing for LLM-as-a-judge is extremely difficult if you don't have an experimentation framework. sanjana breaks down how to use benchmarked datasets to apply a scientific approach to your judge building.

thumb_up_off_alt4

chat_bubble_outline0

repeat2

shareShare

Mikyo

@mikeldking

4 months ago

Killer integration

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Mikyo

@mikeldking

3 months ago

Open-Source the system prompt evals

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Mikyo

@mikeldking

3 months ago

Just hit this issue with Claude not being able to use a MCP server talking to localhost. stackoverflow.com/questions/7950… Feels like there was a secret security issue... Am I wrong?

thumb_up_off_alt2

chat_bubble_outline0

repeat1

shareShare

arize-phoenix

@arizephoenix

3 months ago

🚀Day 0 Support for GPT-5 is Live 🚀 We’re thrilled to announce immediate support for GPT-5 in our platform. Whether you’re evaluating performance or scaling production workloads, we’ve got you covered: ️👉Cost tracking: Full visibility into spend per request ️👉Reasoning

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

DeepLearning.AI

@deeplearningai

3 months ago

Building a reliable RAG system doesn’t stop at retrieval and generation, you need observability too. In the Retrieval Augmented Generation course, you'll explore how LLM observability platforms can help you: - Trace prompts through each step of the pipeline - Log and evaluate

thumb_up_off_alt237

chat_bubble_outline7

repeat40

shareShare

Aparna Dhinakaran

@aparnadhinak

3 months ago

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best: Explanations make judge models more reliable. They reduce variance across runs, improve agreement

thumb_up_off_alt246

chat_bubble_outline5

repeat26

shareShare

Mikyo

@mikeldking

3 months ago

This article by Elizabeth Hutton, Srilakshmi Chavali, and Aparna Dhinakaran encapsulates a core discovery we made when building one of the first #oss eval libraries back in 2023 - notably that sometimes the explanation of a particular judgement can be as informative as the judgement itself.

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Mikyo

@mikeldking

3 months ago

Tracking and correlating the back and forth conversations with an AI Agent is an ever increasing requirement for Agent Observability. Whether it's evaluating the coherency of the agent or its ability to recall relevant information over time - sessions and treads are proving to be

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Mikyo

@mikeldking

3 months ago

⌨️ We've been hard at work creating a light-weight package that is fully featured for interacting with arize-phoenix . If you don't know it's called arize-phoenix-client and it has first-class support for interacting with a phoenix instance. You should no longer have to install

thumb_up_off_alt3

chat_bubble_outline0

repeat1

shareShare

Mitchell Hashimoto

@mitchellh

3 months ago

You have to feel it. mitchellh.com/writing/feel-it

thumb_up_off_alt512

chat_bubble_outline16

repeat64

shareShare