Mikyo (@mikeldking) 's Twitter Profile
Mikyo

@mikeldking

Building the future of LLMOps. Head of Open-Source at @arizeai @ArizePhoenix | Former eng @apple ☕️🚲🧗🏻‍♂️🏂⛰🕴

ID: 65280161

linkhttp://mikeldking.com calendar_today13-08-2009 04:57:55

1,1K Tweet

276 Followers

614 Following

Mikyo (@mikeldking) 's Twitter Profile Photo

I find great inspiration in others' eloquence. "How can we work wonderfully efficiently to create something with breathtaking quality" - Jony Ive youtube.com/shorts/3XJwM6N…

Mikyo (@mikeldking) 's Twitter Profile Photo

Wanted to showcase a probably not well understood (but increasingly powerful package) (pypi.org/project/openin…) It comes with things like decorators and utilities for customizing tracing but it also has some useful utilities that help you run evals and capture human feedback

Mikyo (@mikeldking) 's Twitter Profile Photo

Hill-climbing for LLM-as-a-judge is extremely difficult if you don't have an experimentation framework. sanjana breaks down how to use benchmarked datasets to apply a scientific approach to your judge building.

Mikyo (@mikeldking) 's Twitter Profile Photo

Just hit this issue with Claude not being able to use a MCP server talking to localhost. stackoverflow.com/questions/7950… Feels like there was a secret security issue... Am I wrong?

arize-phoenix (@arizephoenix) 's Twitter Profile Photo

🚀Day 0 Support for GPT-5 is Live 🚀 We’re thrilled to announce immediate support for GPT-5 in our platform. Whether you’re evaluating performance or scaling production workloads, we’ve got you covered: ️👉Cost tracking: Full visibility into spend per request ️👉Reasoning

DeepLearning.AI (@deeplearningai) 's Twitter Profile Photo

Building a reliable RAG system doesn’t stop at retrieval and generation, you need observability too. In the Retrieval Augmented Generation course, you'll explore how LLM observability platforms can help you: - Trace prompts through each step of the pipeline - Log and evaluate

Aparna Dhinakaran (@aparnadhinak) 's Twitter Profile Photo

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best: Explanations make judge models more reliable. They reduce variance across runs, improve agreement

Working with teams running LLM-as-a-judge evals, I’ve noticed a shocking amount of variance on when they use reasoning, CoT, and explanations. Here’s what we’ve seen works best:

Explanations make judge models more reliable.  They reduce variance across runs, improve agreement
Mikyo (@mikeldking) 's Twitter Profile Photo

This article by Elizabeth Hutton, Srilakshmi Chavali, and Aparna Dhinakaran encapsulates a core discovery we made when building one of the first #oss eval libraries back in 2023 - notably that sometimes the explanation of a particular judgement can be as informative as the judgement itself.

Mikyo (@mikeldking) 's Twitter Profile Photo

Tracking and correlating the back and forth conversations with an AI Agent is an ever increasing requirement for Agent Observability. Whether it's evaluating the coherency of the agent or its ability to recall relevant information over time - sessions and treads are proving to be

Mikyo (@mikeldking) 's Twitter Profile Photo

⌨️ We've been hard at work creating a light-weight package that is fully featured for interacting with arize-phoenix . If you don't know it's called arize-phoenix-client and it has first-class support for interacting with a phoenix instance. You should no longer have to install