UPenn NLP (@upennnlp) 's Twitter Profile
UPenn NLP

@upennnlp

@Penn Natural Language Processing group

ID: 1598764380455305224

linkhttps://nlp.cis.upenn.edu calendar_today02-12-2022 19:42:49

71 Tweet

918 Followers

31 Following

Bryan Li (@bryanlics) 's Twitter Profile Photo

RAG enables LLMs to access external info 📖. But when this info is multiple languages 🌐, can LLMs reconcile differing viewpoints 🧐? We introduce BordIRlines, a dataset to study the robustness of cross-lingual RAG. 📃arxiv.org/abs/2410.01171 🗃️ huggingface.co/datasets/borde… 1/4 🧵

RAG enables LLMs to access external info 📖. But when this info is multiple languages 🌐, can LLMs reconcile differing viewpoints 🧐? We introduce BordIRlines, a dataset to study the robustness of cross-lingual RAG.
📃arxiv.org/abs/2410.01171
🗃️ huggingface.co/datasets/borde…
1/4 🧵
Liam Dugan (@liamdugan_) 's Twitter Profile Photo

📣 New Paper 📣: AI-generated fake news is flooding social media, making detection crucial. Introducing MiRAGeNews—a large dataset of image+caption pairs for detecting fake news. Our MiRAGe model outperforms both humans and SOTA VLMs across various generators.

📣 New Paper 📣: AI-generated fake news is flooding social media, making detection crucial. 

Introducing MiRAGeNews—a large dataset of image+caption pairs for detecting fake news. 

Our MiRAGe model outperforms both humans and SOTA VLMs across various generators.
Chaitanya Malaviya (@cmalaviya11) 's Twitter Profile Photo

✨Updates✨: • Dolomites was accepted to TACL: dolomites-benchmark.github.io! - Our data is now also up on HuggingFace: huggingface.co/datasets/cmala…. • I will be talking about Dolomites at EMNLP'24 in Miami (Session 11 on Nov 14 at 10:45 ET). Please say hi if you're around!

✨Updates✨:
• Dolomites was accepted to TACL: dolomites-benchmark.github.io!
- Our data is now also up on HuggingFace: huggingface.co/datasets/cmala….
• I will be talking about Dolomites at EMNLP'24 in Miami (Session 11 on Nov 14 at 10:45 ET). Please say hi if you're around!
Veronica Qing Lyu (@veronica3207) 's Twitter Profile Photo

🤔What model explanation method should you use? How to ensure it reflects the model’s true reasoning? 🌟 In our CL survey, Towards Faithful Model Explanation in NLP, we review 110+ explainability methods through the lens of faithfulness. Check out my presentation at #EMNLP2024!

🤔What model explanation method should you use? How to ensure it reflects the model’s true reasoning?

🌟 In our CL survey, Towards Faithful Model Explanation in NLP, we review 110+ explainability methods through the lens of faithfulness.

Check out my presentation at #EMNLP2024!
Chaitanya Malaviya (@cmalaviya11) 's Twitter Profile Photo

Excited to share ✨ Contextualized Evaluations ✨! Benchmarks like Chatbot Arena contain underspecified queries, which can lead to arbitrary eval judgments. What happens if we provide evaluators with context (e.g who's the user, what's their intent) when judging LM outputs? 🧵↓

Excited to share ✨ Contextualized Evaluations ✨!

Benchmarks like Chatbot Arena contain underspecified queries, which can lead to arbitrary eval judgments. What happens if we provide evaluators with context (e.g who's the user, what's their intent) when judging LM outputs? 🧵↓
Zachary Horvitz (@zachary_horvitz) 's Twitter Profile Photo

I'm at #EMNLP2024 presenting ✨TinyStyler✨, an efficient, effective, and fast method for few-shot text style transfer! Paper: aclanthology.org/2024.findings-… Demo: huggingface.co/spaces/tinysty… Code: github.com/zacharyhorvitz…

I'm at #EMNLP2024 presenting ✨TinyStyler✨, an efficient, effective, and fast method for few-shot text style transfer!

Paper: aclanthology.org/2024.findings-…
Demo: huggingface.co/spaces/tinysty…
Code: github.com/zacharyhorvitz…
Delip Rao e/σ (@deliprao) 's Twitter Profile Photo

Excited to share our first preprint on a comprehensive analysis of withdrawn papers from arXiv spanning its entire history through Sept 2024, in collaboration with Thomas G. Dietterich and Jonathan Young from the arXiv.org team! A quick summary and link to the paper in this thread:

Xingyu Fu (@xingyufu2) 's Twitter Profile Photo

Teach GPT-4o to edit on charts and tables to ReFocus 🔍 and facilitate reasoning 🧠! 🔥 We introduce ReFocus, which edits input table and chart images to better reason visually zeyofu.github.io/ReFocus/ 🤔 Can we teach smaller models to learn such visual CoT reasoning? 🚀 Yes --

Teach GPT-4o to edit on charts and tables to ReFocus 🔍 and facilitate reasoning 🧠!

🔥 We introduce ReFocus, which edits input table and chart images to better reason visually zeyofu.github.io/ReFocus/

🤔 Can we teach smaller models to learn such visual CoT reasoning?

🚀 Yes --
Liam Dugan (@liamdugan_) 's Twitter Profile Photo

🗣️ New Paper 🗣️ Can a single AI text detector generalize to a fixed set of LLMs and domains? Our shared task results suggest yes! Winners Pangram Labs and Leidos got over 99% TPR across 467k documents spanning 11 LLMs, 8 domains, and 4 decoding strategies See thread 🧵

🗣️ New Paper 🗣️ Can a single AI text detector generalize to a fixed set of LLMs and domains? Our shared task results suggest yes!

Winners <a href="/pangramlabs/">Pangram Labs</a> and <a href="/LeidosInc/">Leidos</a> got over 99% TPR across 467k documents spanning 11 LLMs, 8 domains, and 4 decoding strategies

See thread 🧵
Shreya Havaldar (@shreyahavaldar) 's Twitter Profile Photo

🚨 LLMs must grasp implied language to reason about emotions, social cues, etc. Our Google DeepMind paper presents the Implied NLI dataset. Targeting social norms 🌎 and conversational dynamics 💬, we enhance LLM understanding of real-world implication! arxiv.org/abs/2501.07719

Yue Yang (@yueyangai) 's Twitter Profile Photo

We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models. Website: yueyang1996.github.io/cosyn/ Dataset: huggingface.co/datasets/allen… Paper:

We share Code-Guided Synthetic Data Generation: using LLM-generated code to create multimodal datasets for text-rich images, such as charts📊, documents📄, etc., to enhance Vision-Language Models.

Website: yueyang1996.github.io/cosyn/
Dataset: huggingface.co/datasets/allen…
Paper:
Yu Feng (@anniefeng6) 's Twitter Profile Photo

#ICLR2025 Oral LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice. We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty. BIRD

#ICLR2025 Oral

LLMs often struggle with reliable and consistent decisions under uncertainty 😵‍💫 — largely because they can't reliably estimate the probability of each choice.

We propose BIRD 🐦, a framework that significantly enhances LLM decision making under uncertainty.

BIRD
Jialuo Li (@jialuoli1007) 's Twitter Profile Photo

🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation! [CVPR 2025] 📜 Paper: arxiv.org/abs/2504.13129 🌐 Project: jialuo-li.github.io/Science-T2I-Web 💻 Code: github.com/Jialuo-Li/Scie… 🤗 Dataset: huggingface.co/collections/Ji… 🔍

🚀 Introducing Science-T2I - Towards bridging the gap between AI imagination and scientific reality in image generation!  [CVPR 2025]  

📜 Paper: arxiv.org/abs/2504.13129
🌐 Project: jialuo-li.github.io/Science-T2I-Web
💻 Code: github.com/Jialuo-Li/Scie…
🤗 Dataset: huggingface.co/collections/Ji…

🔍
Jeffrey (Young-Min) Cho (@jeffrey_ch0) 's Twitter Profile Photo

#NAACL2025 How to compare cultural differences with social media data in scale? Our work uses lexica to annotate X 🇺🇸 & Weibo 🇨🇳 posts with valence (😄☹️) & arousal (🔥❄️) scores, revealing cross-cultural differences in emotional expression. aclanthology.org/2025.findings-…

Jeffrey (Young-Min) Cho (@jeffrey_ch0) 's Twitter Profile Photo

🤖💬 Herding instincts… in AIs? Yes, even LLMs can follow the crowd! • 📉 Conformity ↑ when agents lack confidence but trust peers • 🧠 Presentation format shapes peer influence • 🎯 Controlled herding can boost collaboration outcomes 👉 Read more: arxiv.org/abs/2505.21588

🤖💬 Herding instincts… in AIs? Yes, even LLMs can follow the crowd!

• 📉 Conformity ↑ when agents lack confidence but trust peers
• 🧠 Presentation format shapes peer influence
• 🎯 Controlled herding can boost collaboration outcomes

👉 Read more: arxiv.org/abs/2505.21588
Chaitanya Malaviya (@cmalaviya11) 's Twitter Profile Photo

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses? Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓

Ever wondered what makes language models generate overly verbose, vague, or sycophantic responses?

Our new paper investigates these and other idiosyncratic biases in preference models, and presents a simple post-training recipe to mitigate them! Thread below 🧵↓
Bryan Li (@bryanlics) 's Twitter Profile Photo

In a world of geopolitical conflicts, how can AI help us navigate? Our #ACL2025-F work studies RAG robustness across 49 languages. TL;DR: 📈 boost robustness w/ multilingual RAG, 🤔 take care w/ low-resource citations 📜arxiv.org/abs/2410.01171 🤗huggingface.co/datasets/borde… 1/4 🧵

In a world of geopolitical conflicts, how can AI help us navigate? Our #ACL2025-F work studies RAG robustness across 49 languages.
TL;DR: 📈 boost robustness w/ multilingual RAG, 🤔 take care w/ low-resource citations

📜arxiv.org/abs/2410.01171
🤗huggingface.co/datasets/borde…
1/4 🧵