Karsten Held (@karstenheld) Twitter Tweets • TwiCopy

Karsten Held

@karstenheld

+ Follow

ID: 2214156872

linkhttp://www.karstenheld.com calendar_today25-11-2013 14:30:20

10 Tweet

20 Followers

54 Following

Karsten Held

@karstenheld

a year ago

Autonomous AI agent gets CSV data from eBay ("Computer Use Demo" by Anthropic) linkedin.com/pulse/autonomo…

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

LLM as a Judge: GPT-5 vs GPT-4o How prompt design impacts AI evaluation. I have tested 3 RAG evaluation prompt types and 4 OpenAI models. Simple prompts work best with GPT-4o, complex prompts with GPT-5. #AI #LLM #PromptEngineering #RAG Complete video: youtu.be/dxXzrMHNonE

thumb_up_off_alt0

chat_bubble_outline1

repeat0

shareShare

Karsten Held

@karstenheld

8 months ago

One thing becomes clear after 2 weeks of investigation and 1200 EUR spent for tokens: GPT-5 is worse than GPT-4o for "LLM as a judge" evaluations. More expensive, slower, less stable. After 5 iterations GPT-4o outperforms GPT-5 using the same optimized final prompt.

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Karsten Held

@karstenheld

8 months ago

LLM as a Judge: New DataRobot study shows that larger models + simple prompts give best accuracy, cost, and stability. datarobot.com/blog/llm-judge… #AI #PromptEngineering #AIEvaluation

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare

Karsten Held

@karstenheld

6 months ago

Lutz Roeder (Microsoft, #Netron, #Reflector): "We should measure it [AGI] not by imitation, but by the novelty, depth, and reliability of the knowledge it creates." lutzroeder.com/blog/2025-11-0… #AGI #DavidDeutsch

thumb_up_off_alt0

chat_bubble_outline0

repeat0

shareShare