@zhiyuanzeng_ : Can we use LLMs to evaluate open-ended instruction following generations? Introducing LLMBar, a benchmark for evaluating LLM evaluators 🧐LLMBar is manually curated, objective, and adversarial😈 🤯Most LLM evaluators cannot beat random guess! 📜arxiv.org/abs/23150.07641 [1/n] • TwiCopy

Zhiyuan Zeng

@zhiyuanzeng_

+ Follow

PhD-ing @uwnlp @uwcse | Prev. @Tsinghua_Uni @TsinghuaNLP @princeton_nlp

ID: 1650962310880714753

linkhttp://zhiyuan-zeng.github.io calendar_today25-04-2023 20:37:54

174 Tweet

417 Followers

216 Following

Zhiyuan Zeng

@zhiyuanzeng_

2 years ago

Can we use LLMs to evaluate open-ended instruction following generations? Introducing LLMBar, a benchmark for evaluating LLM evaluators 🧐LLMBar is manually curated, objective, and adversarial😈 🤯Most LLM evaluators cannot beat random guess! 📜arxiv.org/abs/2310.07641 [1/n]

thumb_up_off_alt119

chat_bubble_outline2

repeat40

shareShare