Shahan (@shahanmemon) 's Twitter Profile
Shahan

@shahanmemon

phd @UW // visiting @nyuabudhabi // ex- @CarnegieMellon // researching if AI can do science // tweets about genAI, agents, science of science, @SciencePlusAI

ID: 134687994

linkhttp://samemon.github.io calendar_today19-04-2010 04:55:05

2,2K Tweet

948 Followers

2,2K Following

Ben Blaiszik (@benblaiszik) 's Twitter Profile Photo

Academia or industry, expert or novice, infrastructure shouldn't be your bottleneck. Garden democratizes access to OpenCatalyst models for ALL researchers. This is how tomorrow's breakthroughs will be found. Garden is a new superpower for scientists.

Parshin Shojaee (@parshinshojaee) 's Twitter Profile Photo

Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs! 🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical

Scientific discovery with LLMs has so much potential yet is underexplored. Our new benchmark **LLM-SRBench** enable rigorous evaluations of equation discovery with LLMs!  

🧠Key takeaway: Even SOTA discovery models with strong LLM backbones still fail to discover mathematical
Emma Hoes (@emmahoes93) 's Twitter Profile Photo

🚨New paper out in PNASNews ! Existential AI risks do **not** distract from immediate harms. In our study (n = 10,800), people consistently prioritize current threats - bias, misinformation, job loss - over sci-fi doom! 💥👉 pnas.org/doi/10.1073/pn…

🚨New paper out in <a href="/PNASNews/">PNASNews</a> !

Existential AI risks do **not** distract from immediate harms. In our study (n = 10,800), people consistently prioritize current threats - bias, misinformation, job loss - over sci-fi doom! 

💥👉 pnas.org/doi/10.1073/pn…
Melissa Pan (@melissapan) 's Twitter Profile Photo

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️ 🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks! Paper: arxiv.org/pdf/2503.13657 Code: github.com/multi-agent-sy… 🧵1/n

🚨 Why Do Multi-Agent LLM Systems Fail? ⁉️
🔥 Introducing MAST: The first multi-agent failure taxonomy - consists of 14 failure modes and 3 categories, generalizes for diverse multi-agent systems and tasks!

Paper: arxiv.org/pdf/2503.13657
Code: github.com/multi-agent-sy…

🧵1/n
Dashun Wang (@dashunwang) 's Twitter Profile Photo

🚨 Our latest paper is out today in Science! We uncover stark and systematic partisan differences in the amount, content, and character of science used in policy, which mirror differences in political elites’ trust in science. Four years in the making. Led by Zander Furnas 1/n

🚨 Our latest paper is out today in Science! 

We uncover stark and systematic partisan differences in the amount, content, and character of science used in policy, which mirror differences in political elites’ trust in science.

Four years in the making. Led by <a href="/zfurnas/">Zander Furnas</a>

1/n
John B. Holbein (@johnholbein1) 's Twitter Profile Photo

Here's some good news! The file drawer problem may have diminished in recent years, at least in social science survey experiments "This suggests increased recognition of the importance of null results."

Here's some good news!

The file drawer problem may have diminished in recent years, at least in social science survey experiments 

"This suggests increased recognition of the importance of null results."
Ronen Tamari (@rtk254) 's Twitter Profile Photo

"A society that can no longer read complex texts may soon find itself unable to think complex thoughts [...] reading becomes not just a cognitive act, but a civic one: a rehearsal for the intellectual stamina that democracy requires." 🎯

"A society that can no longer read complex texts may soon find itself unable to think complex thoughts [...] reading becomes not just a cognitive act, but a civic one: a rehearsal for the intellectual stamina that democracy requires." 🎯
Haofei Yu 🦋 @haofeiyu.bsky.social (@haofeiyu44) 's Twitter Profile Photo

🧪 Want an AI-generated paper draft in just 1 minute? Or dreaming of building auto-research apps but frustrated with setups? Meet tiny-scientist, a minimal package to start AI-powered research: 👉 pip install tiny-scientist 🔗 github.com/ulab-uiuc/tiny… #AIAgent #pythonpackages

Arthur Spirling (@arthur_spirling) 's Twitter Profile Photo

Again, I think academics have perhaps not quite groked what this sort of wholesale removal of a model means for replication in science. This isn’t versioned or downloadable, and you won’t be able to recreate it

Atoosa Kasirzadeh (@dr_atoosa) 's Twitter Profile Photo

📢 New paper with Iason Gabriel is out! 2025 is being called the year of AI agents, with overwhelming headlines about them every day. But we lack a shared vocabulary to distinguish their fundamental properties. Our paper aims to bridge this gap. A 🧵

📢 New paper with <a href="/IasonGabriel/">Iason Gabriel</a>  is out! 2025 is being called the year of AI agents, with overwhelming headlines about them every day. But we lack a shared vocabulary to distinguish their fundamental properties. Our paper aims to bridge this gap. A 🧵
Dr Claire Malone FRSA (@geeknproud42) 's Twitter Profile Photo

A new webinar series is coming soon 🎤 Designed for anyone who writes—emails, reports, blogs, posts—and wants to use AI as a tool for clarity, not chaos. 💬 Prompt better ⚡ Write smarter 🧠 Keep your voice Watch this space 👀 #AI #writing #ChatGPT #ProductivityHacks

Deb Raji (@rajiinio) 's Twitter Profile Photo

Lately, I've been seriously exploring what it could mean to move beyond the benchmarking paradigm in ML evaluation and it's led to some stats-y papers: (1) a critique of current experimental evals of prediction based interventions & (2) a framework for adverse events reporting.

Lately, I've been seriously exploring what it could mean to move beyond the benchmarking paradigm in ML evaluation and it's led to some stats-y papers: 

(1) a critique of current experimental evals of prediction based interventions &amp; (2) a framework for adverse events reporting.
Sayash Kapoor (@sayashk) 's Twitter Profile Photo

Seth Lazar It is necessary to invest in alternatives right now. Without it, we might see the worst aspects of our current platform economy amplified. We don't have all the answers in the paper, but we have a blueprint for where to start. Paper: arxiv.org/pdf/2505.04345