Batu El (@elb4tu) 's Twitter Profile
Batu El

@elb4tu

PhD Candidate @ Stanford

ID: 1871313747316228096

linkhttps://batu-el.github.io/home/ calendar_today23-12-2024 21:55:50

1 Tweet

52 Followers

94 Following

Mehmet Hamza Erol (@mhamzaerol) 's Twitter Profile Photo

How much does a correct answer from an LM cost? How much has AI lowered the cost of solving problems? Meet Cost‑of‑Pass: An Economic Framework for Evaluating LMs! Cost‑of‑Pass = expected $ for one correct answer. Frontier Cost‑of‑Pass = cheapest route: an LM or a human expert.

How much does a correct answer from an LM cost?
How much has AI lowered the cost of solving problems?

Meet Cost‑of‑Pass: An Economic Framework for Evaluating LMs!

Cost‑of‑Pass = expected $ for one correct answer.
Frontier Cost‑of‑Pass = cheapest route: an LM or a human expert.
James Zou (@james_y_zou) 's Twitter Profile Photo

We found a troubling emergent behavior in LLM. 💬When LLMs compete for social media likes, they start making things up 🗳️When they compete for votes, they turn inflammatory/populist When optimized for audiences, LLMs inadvertently become misaligned—we call this Moloch’s Bargain

We found a troubling emergent behavior in LLM.

💬When LLMs compete for social media likes, they start making things up
🗳️When they compete for votes, they turn inflammatory/populist

When optimized for audiences, LLMs inadvertently become misaligned—we call this Moloch’s Bargain
James Zou (@james_y_zou) 's Twitter Profile Photo

Competition-induced misaligned behaviors emerge even when models are explicitly instructed to remain truthful and grounded. This has important implications when LLMs are used to draft media or sell products. Paper: arxiv.org/pdf/2510.06105 Great work by Batu El

Owen Queen (@oq_35) 's Twitter Profile Photo

🚀 Excited to share our new paper: CGBench — Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research Can AI truly understand scientific papers? We explore how LLMs interpret real biomedical literature — not just multiple-choice questions.🧵

🚀 Excited to share our new paper: CGBench — Benchmarking Language Model Scientific Reasoning for Clinical Genetics Research

Can AI truly understand scientific papers? We explore how LLMs interpret real biomedical literature — not just multiple-choice questions.🧵