matt turk (@turkmatthew) 's Twitter Profile
matt turk

@turkmatthew

Data Science @cleanlabAI. Prev: Investing & ML @goodwatercap, Quant/ML @coinbase & @goldmansachs, prop trading, EECS @ucberkeley

ID: 517916836

linkhttps://signal.nfx.com/investors/matt-turk calendar_today07-03-2012 20:35:17

1,1K Tweet

611 Takipçi

1,1K Takip Edilen

Cleanlab (@cleanlabai) 's Twitter Profile Photo

Evaluation models for RAG aim to detect incorrect responses in real-time, but can they actually without any ground-truth answers/labels? Just published: A benchmark across six RAG applications comparing popular Evaluation models like: LLM-as-a-Judge, Prometheus, Lynx, HHEM, TLM

Evaluation models for RAG aim to detect incorrect responses in real-time, but can they actually without any ground-truth answers/labels?

Just published:  A benchmark across six RAG applications comparing popular Evaluation models like: LLM-as-a-Judge, Prometheus, Lynx, HHEM, TLM
Curtis G. Northcutt (@cgnorthcutt) 's Twitter Profile Photo

Tomorrow I'm spilling the secrets as to how several Fortune 500 @cleanlabai customers are solving the hardest problem in AI -- producing accurate, compliant, safe fully automated AI Agent responses -- at the AI User Group Conference in SF. Stop by and get your hands dirty and

Tomorrow I'm spilling the secrets as to how several Fortune 500 @cleanlabai customers are solving the hardest problem in AI -- producing accurate, compliant, safe fully automated AI Agent responses -- at the <a href="/aiusergroup/">AI User Group</a> Conference in SF. 

Stop by and get your hands dirty and
matt turk (@turkmatthew) 's Twitter Profile Photo

If you use NVIDIA AI NeMo Guardrails for LLM app reliability, go ahead and try to integrate our Cleanlab Trustworthy Language Model. Developers can add additional safeguards to address hallucination and untrustworthy responses when building LLM-based applications.

matt turk (@turkmatthew) 's Twitter Profile Photo

We Cleanlab reproduced, contained, and fixed Cursor’s rogue AI support bot with automated AI Safety software—an incident that sits atop a growing list of customer support AI Agent meltdowns that cause serious damage to trust with customers. Reach out if you care about

Cleanlab (@cleanlabai) 's Twitter Profile Photo

Cleanlab is now integrated into langfuse.com's observability platform! We're adding real-time trust scores to LLM outputs to quickly surface the most problematic responses for Langfuse users.

Cleanlab is now integrated into <a href="/langfuse/">langfuse.com</a>'s observability platform!

We're adding real-time trust scores to LLM outputs to quickly surface the most problematic responses for Langfuse users.
AWS Developers (@awsdevelopers) 's Twitter Profile Photo

🔍 Unlock generative AI success with quality data! Join #AWS & Cleanlab for an exclusive workshop at SFO Gen AI Loft on May 9, 2025. Learn to build & scale production-ready AI solutions from experts. For developers & decision-makers. Register now! 👉 go.aws/3SeOxwU

🔍 Unlock generative AI success with quality data! Join #AWS &amp; <a href="/CleanlabAI/">Cleanlab</a> for an exclusive workshop at SFO Gen AI Loft on May 9, 2025.

Learn to build &amp; scale production-ready AI solutions from experts. For developers &amp; decision-makers.

Register now! 👉 go.aws/3SeOxwU
MLflow (@mlflow) 's Twitter Profile Photo

Curious about how to systematically evaluate and improve the trustworthiness of your LLM applications? 🤔 Check out how Cleanlab's Trustworthy Language Models (TLM) integrates with #MLflow! TLM analyzes both prompts and responses to flag potentially untrustworthy outputs-no

Curious about how to systematically evaluate and improve the trustworthiness of your LLM applications? 🤔 Check out how <a href="/CleanlabAI/">Cleanlab</a>'s Trustworthy Language Models (TLM) integrates with #MLflow! 

TLM analyzes both prompts and responses to flag potentially untrustworthy outputs-no
matt turk (@turkmatthew) 's Twitter Profile Photo

You can now use Cleanlab with LlamaIndex 🦙 to make your production AI agents trustworthy and actually root cause why certain responses are untrustworthy (knowledge gap/poor retrieval, bad data, hallucination, etc.)

Alex 👋 (@dubs408) 's Twitter Profile Photo

Based on last preseason odds it's not even close too lmao Pacers +6600 15 Warriors +2800 '11 Mavs +2000 '19 Raptors +1850 '23 Nuggets +1800 '04 Pistons +1500

LangChain (@langchainai) 's Twitter Profile Photo

🛑Prevent Hallucinated Responses Our integration with Cleanlab allows developers to catch agent failures in realtime To make this more concrete - they put together a blog and a tutorial showing how to do this for a Customer Support agent Blog: cleanlab.ai/blog/prevent-h…

🛑Prevent Hallucinated Responses

Our integration with <a href="/CleanlabAI/">Cleanlab</a> allows developers to catch agent failures in realtime

To make this more concrete - they put together a blog and a tutorial showing how to do this for a Customer Support agent

Blog: cleanlab.ai/blog/prevent-h…