Greg Durrett (@gregd_nlp) Twitter Tweets • TwiCopy

Fangyuan Xu

2 weeks ago

✨RECOMP at #ICLR2024 !
Our poster is at ⏰Thursday 10:45am (Halle B #138). Come check out our work & talk to my advisor Eunsol Choi
and collaborator Weijia Shi @ ICLR24 !

account_circle

RLHF research requires training and hiring annotators to explicitly choose between different model outputs.

What if we can get human preference based on user edits, which are naturally generated in applications like AI writing assistants? arxiv.org/abs/2404.15269

account_circle

Greg Durrett

@gregd_nlp

3 weeks ago

Check out Prasann's work! We think this method has nice potential to deploy in real alignment settings with humans collecting preferences online. It really feels like the right way to squeeze the most out of that kind of data!

thumb_up_off_alt26

chat_bubble_outline0

repeat5

shareShare

account_circle

Greg Durrett

3 weeks ago

repeat2

account_circle

Yasumasa Onoe

@yasumasa_onoe

3 weeks ago

We're excited to announce DOCCI: A new dataset designed to advance vision-language research. DOCCI features 15k images with detailed descriptions crafted to capture complex visual concepts – spatial relations, counting, text and entities more.

arxiv.org/pdf/2404.19753

account_circle

Boyang 'Albert' Li

@AlbertBoyangLi

1 month ago

🚨New NAACL 2024 Paper 🚨
We trained four vision-language models on 23 source tasks and evaluated on 29 target tasks in order to look for patterns and latent factors in vision-language evaluation benchmarks.

arxiv.org/abs/2404.02415

thumb_up_off_alt36

chat_bubble_outline0

repeat8

shareShare

account_circle

Greg Durrett

@gregd_nlp

1 month ago

Great lineup of speakers at our second Disinformation Day at UT Austin! Registration open to all!

thumb_up_off_alt29

chat_bubble_outline0

repeat1

shareShare

account_circle

Yoonsang Lee

@yoonsang_

1 month ago

Can LMs correctly distinguish🔎 confusing entity mentions in multiple documents?

We study how current LMs perform QA task when provided ambiguous questions and a document set📚 that requires challenging entity disambiguation.

Work done at Computer Science at UT Austin✨ w/ Xi Ye, Eunsol Choi

account_circle

Yating Wu

@YatingWu96

1 month ago

LLMs can mimic human curiosity by generating open-ended inquisitive questions given some context, similar to how humans wonder when they read.

But which ones are more important to be answered?🤔

We predict the salience of questions, substantially outperforming GPT-4.🌟 🧵1/5

account_circle

Greg Durrett

@gregd_nlp

1 month ago

Check out Liyan's system + benchmark! Strong LLM fact-checking models like MiniCheck will allow response refinement and training for better factuality (work in progress!). LLM-AggreFact collects 10 high-quality labeled datasets of LLM errors in the literature to evaluate them!

thumb_up_off_alt45

chat_bubble_outline0

repeat9

shareShare

account_circle

Ryo Kamoi

@RyoKamoi

1 month ago

📢 New Preprint! Can LLMs detect mistakes in LLM responses?
We introduce ReaLMistake, error detection benchmark with errors by GPT-4 & Llama 2.
Evaluated 12 LLMs and showed LLM-based error detectors are unreliable!
Rui Zhang Wenpeng_Yin Arman Cohan +
arxiv.org/abs/2404.03602

account_circle

Hongli Zhan

@HongliZhan

1 month ago

🥱Tired of LLM’s generic “hope you feel better” responses?

🧠Can we dive much deeper and instill cognitive capabilities in them?

Under the right instructions, LLMs (zero-shot) score very high per expert psychologist evaluators!

📢arxiv.org/abs/2404.01288

1/🧵

account_circle

Yekyung Kim

@YekyungKim

1 month ago

Summarizing long documents (>100K tokens) is a popular use case for LLMs, but how faithful are these summaries? We present FABLES, a dataset of human annotations of faithfulness & content selection in LLM-generated summaries of books.

arxiv.org/abs/2404.01261

🧵below:

account_circle

Akari Asai @ ICLR2024 🇦🇹

@AkariAsai

1 month ago

Greg Durrett Chaitanya Malaviya Abhika Abhika Mishra also led a project where we annotated 1k LLM responses (llama2 7b&70b chat and ChatGPT) to diverse instruction following prompts with span level hallucinations and types.
The data is publicly available!

thumb_up_off_alt15

chat_bubble_outline0

repeat2

shareShare

account_circle