Faeze Brahman (@faeze_brh) Twitter Tweets • TwiCopy

Xuhui Zhou

7 months ago

Hi friends, I will be at #NAACL2025 to present: 🧷 AI-LieDar: a framework to study LLMs navigating truthfulness-utility conflicts in interactions, and we found agents "lie" in goal-driven tasks with truthfulness rates below 50% 🫨 🧷 Sotopia-S4: a demo for our Sotopia

thumb_up_off_alt58

chat_bubble_outline1

repeat48

shareShare

Maarten Sap (he/him)

@maartensap

7 months ago

(((ل()(ل() 'yoav))))👾 yeah we some debates about what models "lying" really means and whether to use those words; some related terms have been used before, which we discuss in the paper. I like the conclusion you reached, but agree with the fear of wrongly anthropomorphizing / attributing intent

<a href="/yoavgo/">(((ل()(ل() 'yoav))))👾</a> yeah we some debates about what models "lying" really means and whether to use those words; some related terms have been used before, which we discuss in the paper. I like the conclusion you reached, but agree with the fear of wrongly anthropomorphizing / attributing intent

thumb_up_off_alt5

chat_bubble_outline0

repeat2

shareShare

Vishakh Padmakumar

@vishakh_pk

7 months ago

What does it mean for #LLM output to be novel? In work w/ John(Yueh-Han) Chen, Jane Pan, Valerie Chen, He He we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

What does it mean for #LLM output to be novel?
In work w/ <a href="/jcyhc_ai/">John(Yueh-Han) Chen</a>, <a href="/JanePan_/">Jane Pan</a>, <a href="/valeriechen_/">Valerie Chen</a>, <a href="/hhexiy/">He He</a> we argue it needs to be both original and high quality. While prompting tricks trade one for the other, better models (scaling/post-training) can shift the novelty frontier 🧵

thumb_up_off_alt82

chat_bubble_outline2

repeat22

shareShare

Faeze Brahman

@faeze_brh

7 months ago

Would you trust an AI that chooses deception over truth when faced with conflicting goals? 📅 Checkout our poster at #NAACL2025 on April 30 @ 11am poster session 1 presented by Xuhui Zhou and led by Zhe Su

thumb_up_off_alt29

chat_bubble_outline0

repeat3

shareShare

Ai2

@allen_ai

7 months ago

Have questions? We’re an open book! We’re excited to host an AMA to answer your Qs about OLMo, our family of open language models. 🗓️ When: May 8, 8-10 am PT 🌐 Where: r/huggingface 🧠 Why: Gain insights from our expert researchers Chat soon!

thumb_up_off_alt52

chat_bubble_outline2

repeat13

shareShare

Shayne Longpre

@shayneredford

6 months ago

Delighted to see BigGen Bench paper receive the 🏆best paper award 🏆at NAACL HLT 2025 2025! BigGen Bench introduces fine-grained, scalable, & human-aligned evaluations: 📈 77 challenging, diverse tasks 🛠️ 765 instances w/ ex-specific scoring rubrics 📋More human-aligned than

Delighted to see BigGen Bench paper receive the 🏆best paper award 🏆at <a href="/naaclmeeting/">NAACL HLT 2025</a> 2025!

BigGen Bench introduces fine-grained, scalable, & human-aligned evaluations:

📈 77 challenging, diverse tasks
🛠️ 765 instances w/ ex-specific scoring rubrics
📋More human-aligned than

thumb_up_off_alt85

chat_bubble_outline7

repeat13

shareShare

Ai2

@allen_ai

6 months ago

The story of OLMo, our Open Language Model, goes back to February 2023 when a group of researchers gathered at Ai2 and started planning. What if we made a language model with state-of-the-art performance, but we did it completely in the open? 🧵

thumb_up_off_alt143

chat_bubble_outline2

repeat21

shareShare

Ai2

@allen_ai

6 months ago

📢We’re taking your questions now on Reddit for tomorrow’s AMA! Ask us anything about OLMo, our family of fully-open language models. Our researchers will be on hand to answer them Thursday, May 8 at 8am PST.

thumb_up_off_alt28

chat_bubble_outline16

repeat26

shareShare

Nathan Lambert

@natolambert

6 months ago

thumb_up_off_alt40

chat_bubble_outline0

repeat10

shareShare

Philippe Laban

@philippelaban

6 months ago

🆕paper: LLMs Get Lost in Multi-Turn Conversation In real life, people don’t speak in perfect prompts. So we simulate multi-turn conversations — less lab-like, more like real use. We find that LLMs get lost in conversation. 👀What does that mean? 🧵1/N 📄arxiv.org/abs/2505.06120

thumb_up_off_alt126

chat_bubble_outline5

repeat30

shareShare

Faeze Brahman

@faeze_brh

6 months ago

One of the trickiest problems in LLM deployment: preventing models from mindlessly reproducing training data while keeping intentional recall capabilities intact. Our ParaPO approach achieves this "smart memorization" elegantly through post-training preference optimization. 🎯

thumb_up_off_alt58

chat_bubble_outline1

repeat5

shareShare

Shangbin Feng

@shangbinfeng

6 months ago

Now accepted at ICML 2025

thumb_up_off_alt75

chat_bubble_outline3

repeat9

shareShare

Hyunwoo Kim

@hyunw_kim

6 months ago

📢I'm thrilled to announce that I’ll be joining @KAIST_AI as an Assistant Professor in 2026, leading the Computation & Cognition (COCO) Lab🤖🧠: coco-kaist.github.io We'll be exploring reasoning, learning w/ synthetic data, and social agents! +I'm spending a gap year NVIDIA✨

thumb_up_off_alt330

chat_bubble_outline32

repeat20

shareShare

Yapei Chang

@yapeichang

6 months ago

🤔 Can simple string-matching metrics like BLEU rival reward models for LLM alignment? 🔍 We show that given access to a reference, BLEU can match reward models in human preference agreement, and even train LLMs competitively with them using GRPO. 🫐 Introducing BLEUBERI:

thumb_up_off_alt191

chat_bubble_outline6

repeat41

shareShare

Stella Li

@stellalisy

6 months ago

🤯 We cracked RLVR with... Random Rewards?! Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by: - Random rewards: +21% - Incorrect rewards: +25% - (FYI) Ground-truth rewards: + 28.8% How could this even work⁉️ Here's why: 🧵 Blogpost: tinyurl.com/spurious-rewar…

thumb_up_off_alt1,1K

chat_bubble_outline69

repeat322

shareShare

Jaehun Jung

@jaehunjung_com

6 months ago

Data curation is crucial for LLM reasoning, but how do we know if our dataset is not overfit to one benchmark and generalizes to unseen distributions? 🤔 𝐃𝐚𝐭𝐚 𝐝𝐢𝐯𝐞𝐫𝐬𝐢𝐭𝐲 is key, when measured correct—it strongly predicts model generalization in reasoning tasks! 🧵

thumb_up_off_alt175

chat_bubble_outline4

repeat32

shareShare

Sahil Verma

@sahil1v

6 months ago

🚨 New Paper! 🚨 Guard models slow, language-specific, and modality-limited? Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀 arxiv.org/abs/2505.23856

thumb_up_off_alt73

chat_bubble_outline1

repeat33

shareShare

Saumya Malik

@saumyamalik44

6 months ago

I’m thrilled to share RewardBench 2 📊— We created a new multi-domain reward model evaluation that is substantially harder than RewardBench, we trained and released 70 reward models, and we gained insights about reward modeling benchmarks and downstream performance!

thumb_up_off_alt218

chat_bubble_outline4

repeat46

shareShare

Mohit Iyyer

@mohitiyyer

6 months ago

Tired of AI slop? Our work on "Frankentexts" shows how LLMs can stitch together random fragments of human writing into coherent, relevant responses to arbitrary prompts. Frankentexts are weirdly creative, and they also pose problems for AI detectors: are they AI? human? More 👇

thumb_up_off_alt55

chat_bubble_outline2

repeat8

shareShare

Jiacheng Liu

@liujc1998

5 months ago

We enabled OLMoTrace for Tülu 3 models! 🤠 Matched spans are shorter than for OLMo models, bc we can only search in Tülu's post-training data (base model is Llama). Yet we thought it'd still bring some value. Try yourself on the Ai2 playground -- playground.allenai.org

thumb_up_off_alt42

chat_bubble_outline2

repeat12

shareShare