Batu El (@elb4tu) Twitter Tweets • TwiCopy

Batu El

@elb4tu

+ Follow

PhD Candidate @ Stanford

ID: 1871313747316228096

linkhttps://batu-el.github.io/home/ calendar_today23-12-2024 21:55:50

1 Tweet

52 Followers

94 Following

Mehmet Hamza Erol

@mhamzaerol

7 months ago

How much does a correct answer from an LM cost? How much has AI lowered the cost of solving problems? Meet Cost‑of‑Pass: An Economic Framework for Evaluating LMs! Cost‑of‑Pass = expected $ for one correct answer. Frontier Cost‑of‑Pass = cheapest route: an LM or a human expert.

thumb_up_off_alt66

chat_bubble_outline4

repeat20

shareShare

James Zou

@james_y_zou

2 months ago

We found a troubling emergent behavior in LLM. 💬When LLMs compete for social media likes, they start making things up 🗳️When they compete for votes, they turn inflammatory/populist When optimized for audiences, LLMs inadvertently become misaligned—we call this Moloch’s Bargain

thumb_up_off_alt9,9K

chat_bubble_outline869

repeat2,2K

shareShare

James Zou

@james_y_zou

2 months ago

Competition-induced misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded. This has important implications when LLMs are used to draft media or sell products. Paper: arxiv.org/pdf/2510.06105 Great work by Batu El

thumb_up_off_alt437

chat_bubble_outline27

repeat46

shareShare

Owen Queen

@oq_35

2 months ago

🚀 Excited to share our new paper: CGBench — Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research Can AI truly understand scientific papers? We explore how LLMs interpret real biomedical literature — not just multiple-choice questions.🧵

thumb_up_off_alt60

chat_bubble_outline7

repeat12

shareShare