Mingxuan (Aldous) Li (@itea1001) 's Twitter Profile
Mingxuan (Aldous) Li

@itea1001

Student at the University of Chicago

ID: 1742435203543343104

linkhttps://itea1001.github.io/ calendar_today03-01-2024 06:38:31

4 Tweet

2 Takipçi

106 Takip Edilen

Haokun Liu (@haokunliu5280) 's Twitter Profile Photo

1/ 🚀 New Paper Alert! Excited to share: Literature Meets Data: A Synergistic Approach to Hypothesis Generation 📚📊! We propose a novel framework combining literature insights & observational data with LLMs for hypothesis generation. Here’s how and why it matters.

Dang Nguyen (@divingwithorcas) 's Twitter Profile Photo

1/n You may know that large language models (LLMs) can be biased in their decision-making, but ever wondered how those biases are encoded internally and whether we can surgically remove them?

Haokun Liu (@haokunliu5280) 's Twitter Profile Photo

🚀🚀🚀Excited to share our latest work: HypoBench, a systematic benchmark for evaluating LLM-based hypothesis generation methods!

🚀🚀🚀Excited to share our latest work: HypoBench, a systematic benchmark for evaluating LLM-based hypothesis generation methods!
Mourad Heddaya (@mouradheddaya) 's Twitter Profile Photo

🧑‍⚖️How well can LLMs summarize complex legal documents?  And can we use LLMs to evaluate? Excited to be in Albuquerque presenting our paper this afternoon at NAACL HLT 2025 2025! We develop CaseSumm, a comprehensive dataset comprising 25K U.S. Supreme Court opinions and their

🧑‍⚖️How well can LLMs summarize complex legal documents?  And can we use LLMs to evaluate?

Excited to be in Albuquerque presenting our paper this afternoon at <a href="/naaclmeeting/">NAACL HLT 2025</a> 2025!

We develop CaseSumm, a comprehensive dataset comprising 25K U.S. Supreme Court opinions and their
Mingxuan (Aldous) Li (@itea1001) 's Twitter Profile Photo

HypoEval evaluators (github.com/ChicagoHAI/Hyp…) are now incorporated into judges from Quotient AI — check it out at github.com/quotient-ai/ju…!

Xiaoyan Bai (@elenal3ai) 's Twitter Profile Photo

🚨 New paper alert 🚨 Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️ 1/n 🧵

🚨 New paper alert 🚨

Ever asked an LLM-as-Marilyn Monroe who the US president was in 2000? 🤔 Should the LLM answer at all? We call these clashes Concept Incongruence. Read on! ⬇️

1/n 🧵
Shishir Patil (@shishirpatil_) 's Twitter Profile Photo

🔥 At ICML 2025, we’re delighted to introduce BFCL V4 Agentic. As function-calling (also called tool-calling) forms the bed-rock of Agentic systems, BFCL V4 Agentic benchmark focuses on tool-calling in real-world agentic settings — including: 🔍 Web search with multi-hop

🔥 At ICML 2025, we’re delighted to introduce BFCL V4 Agentic. As function-calling (also called tool-calling) forms the bed-rock of Agentic systems, BFCL V4 Agentic benchmark focuses on tool-calling in real-world agentic settings — including:

🔍 Web search with multi-hop
Mingxuan (Aldous) Li (@itea1001) 's Twitter Profile Photo

Excited to present our work at #ACL2025! Come by Poster Session 1 tomorrow, 11:00–12:30 in Hall X4/X5 — would love to chat!