UPenn NLP (@upennnlp) Twitter Tweets • TwiCopy

Bryan Li

a year ago

RAG enables LLMs to access external info 📖. But when this info is multiple languages 🌐, can LLMs reconcile differing viewpoints 🧐? We introduce BordIRlines, a dataset to study the robustness of cross-lingual RAG. 📃arxiv.org/abs/2410.01171 🗃️ huggingface.co/datasets/borde… 1/4 🧵

thumb_up_off_alt8

chat_bubble_outline1

repeat3

shareShare

Liam Dugan

@liamdugan_

a year ago

📣 New Paper 📣: AI-generated fake news is flooding social media, making detection crucial. Introducing MiRAGeNews—a large dataset of image+caption pairs for detecting fake news. Our MiRAGe model outperforms both humans and SOTA VLMs across various generators.

thumb_up_off_alt9

chat_bubble_outline0

repeat5

shareShare

Runsheng (Anson) Huang

@ansonhuang99

a year ago

🧵Streamlined AI-generated fake news with realistic "photo evidence" from Midjourney and caption from LLM is a misinformation superspreader. Introducing MiRAGeNews--a large dataset of 15,000 image-caption pairs that aims to train more robust detectors. arxiv.org/abs/2410.09045

thumb_up_off_alt6

chat_bubble_outline1

repeat3

shareShare

Chaitanya Malaviya

@cmalaviya11

a year ago

✨Updates✨: • Dolomites was accepted to TACL: dolomites-benchmark.github.io! - Our data is now also up on HuggingFace: huggingface.co/datasets/cmala…. • I will be talking about Dolomites at EMNLP'24 in Miami (Session 11 on Nov 14 at 10:45 ET). Please say hi if you're around!

thumb_up_off_alt40

chat_bubble_outline0

repeat9

shareShare

Veronica Qing Lyu

@veronica3207

a year ago

🤔What model explanation method should you use? How to ensure it reflects the model’s true reasoning? 🌟 In our CL survey, Towards Faithful Model Explanation in NLP, we review 110+ explainability methods through the lens of faithfulness. Check out my presentation at #EMNLP2024!

thumb_up_off_alt33

chat_bubble_outline1

repeat8

shareShare

Chaitanya Malaviya

@cmalaviya11

a year ago

Excited to share ✨ Contextualized Evaluations ✨! Benchmarks like Chatbot Arena contain underspecified queries, which can lead to arbitrary eval judgments. What happens if we provide evaluators with context (e.g who's the user, what's their intent) when judging LM outputs? 🧵↓

thumb_up_off_alt122

chat_bubble_outline2

repeat29

shareShare

Zachary Horvitz

@zachary_horvitz

a year ago

I'm at #EMNLP2024 presenting ✨TinyStyler✨, an efficient, effective, and fast method for few-shot text style transfer! Paper: aclanthology.org/2024.findings-… Demo: huggingface.co/spaces/tinysty… Code: github.com/zacharyhorvitz…

thumb_up_off_alt16

chat_bubble_outline1

repeat5

shareShare

Delip Rao e/σ

@deliprao

a year ago

Excited to share our first preprint on a comprehensive analysis of withdrawn papers from arXiv spanning its entire history through Sept 2024, in collaboration with Thomas G. Dietterich and Jonathan Young from the arXiv.org team! A quick summary and link to the paper in this thread:

thumb_up_off_alt114

chat_bubble_outline3

repeat20

shareShare

Xingyu Fu

@xingyufu2

a year ago

Teach GPT-4o to edit on charts and tables to ReFocus 🔍 and facilitate reasoning 🧠! 🔥 We introduce ReFocus, which edits input table and chart images to better reason visually zeyofu.github.io/ReFocus/ 🤔 Can we teach smaller models to learn such visual CoT reasoning? 🚀 Yes --

thumb_up_off_alt212

chat_bubble_outline8

repeat43

shareShare

Liam Dugan

@liamdugan_

a year ago

🗣️ New Paper 🗣️ Can a single AI text detector generalize to a fixed set of LLMs and domains? Our shared task results suggest yes! Winners Pangram Labs and Leidos got over 99% TPR across 467k documents spanning 11 LLMs, 8 domains, and 4 decoding strategies See thread 🧵

🗣️ New Paper 🗣️ Can a single AI text detector generalize to a fixed set of LLMs and domains? Our shared task results suggest yes!

Winners <a href="/pangramlabs/">Pangram Labs</a> and <a href="/LeidosInc/">Leidos</a> got over 99% TPR across 467k documents spanning 11 LLMs, 8 domains, and 4 decoding strategies

See thread 🧵

thumb_up_off_alt18

chat_bubble_outline3

repeat9

shareShare

Shreya Havaldar

@shreyahavaldar

a year ago

🚨 LLMs must grasp implied language to reason about emotions, social cues, etc. Our Google DeepMind paper presents the Implied NLI dataset. Targeting social norms 🌎 and conversational dynamics 💬, we enhance LLM understanding of real-world implication! arxiv.org/abs/2501.07719

thumb_up_off_alt55

chat_bubble_outline1

repeat15

shareShare

Yue Yang

@yueyangai

10 months ago

We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: yueyang1996.github.io/cosyn/ Dataset: huggingface.co/datasets/allen… Paper:

thumb_up_off_alt197

chat_bubble_outline6

repeat45

shareShare

Thomas Talhelm

@thomastalhelm

10 months ago

New study with a billion words! Here’s the 60-second version. ⏲️ nature.com/articles/s4159… Nature Portfolio Sharath Guntuku The University of Chicago

New study with a billion words! Here’s the 60-second version. ⏲️ nature.com/articles/s4159… <a href="/NaturePortfolio/">Nature Portfolio</a> <a href="/sharathguntuku/">Sharath Guntuku</a> <a href="/UChicago/">The University of Chicago</a>

thumb_up_off_alt58

chat_bubble_outline7

repeat23

shareShare

Delip Rao e/σ

@deliprao

8 months ago

Luca Soldaini 🎀 Heard good things about DataDreamer from UPenn NLP arxiv.org/pdf/2402.10379

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare

Yu Feng

@anniefeng6

8 months ago

#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD

thumb_up_off_alt256

chat_bubble_outline2

repeat38

shareShare

Jialuo Li

@jialuoli1007

8 months ago

🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: arxiv.org/abs/2504.13129 🌐 Project: jialuo-li.github.io/Science-T2I-Web 💻 Code: github.com/Jialuo-Li/Scie… 🤗 Dataset: huggingface.co/collections/Ji… 🔍

thumb_up_off_alt139

chat_bubble_outline4

repeat31

shareShare

Jeffrey (Young-Min) Cho

@jeffrey_ch0

7 months ago

#NAACL2025 How to compare cultural differences with social media data in scale? Our work uses lexica to annotate X 🇺🇸 & Weibo 🇨🇳 posts with valence (😄☹️) & arousal (🔥❄️) scores, revealing cross-cultural differences in emotional expression. aclanthology.org/2025.findings-…

thumb_up_off_alt13

chat_bubble_outline0

repeat4

shareShare

Jeffrey (Young-Min) Cho

@jeffrey_ch0

7 months ago

🤖💬 Herding instincts… in AIs? Yes, even LLMs can follow the crowd! • 📉 Conformity ↑ when agents lack confidence but trust peers • 🧠 Presentation format shapes peer influence • 🎯 Controlled herding can boost collaboration outcomes 👉 Read more: arxiv.org/abs/2505.21588

thumb_up_off_alt12

chat_bubble_outline0

repeat7

shareShare

Chaitanya Malaviya

@cmalaviya11

6 months ago

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

thumb_up_off_alt75

chat_bubble_outline1

repeat17

shareShare

Bryan Li

@bryanlics

5 months ago

In a world of geopolitical conflicts, how can AI help us navigate? Our #ACL2025-F work studies RAG robustness across 49 languages. TL;DR: 📈 boost robustness w/ multilingual RAG, 🤔 take care w/ low-resource citations 📜arxiv.org/abs/2410.01171 🤗huggingface.co/datasets/borde… 1/4 🧵

thumb_up_off_alt6

chat_bubble_outline1

repeat1

shareShare