Ella Minzhi Li (@ellaminzhili) 's Twitter Profile
Ella Minzhi Li

@ellaminzhili

Visiting PhD at Stanford @stanfordnlp🌲, CS PhD student at NUS @wing_nus 🇸🇬, PhD Fellow @Google, NLP researcher📒

ID: 1586640655702368256

linkhttps://yocodeyo.github.io calendar_today30-10-2022 08:46:42

92 Tweet

376 Takipçi

161 Takip Edilen

Min-Yen Kan (@knmnyn) 's Twitter Profile Photo

A good chance to take advantage of a AI research programme dedicated to reaching out to ASEAN undergraduate, masters students and young faculty members. Come work with my group at wing.nus! #llm #NLProc #sigir #www

A good chance to take advantage of a AI research programme dedicated to reaching out to ASEAN undergraduate, masters students and young faculty members.  Come work with my group at <a href="/wing_nus/">wing.nus</a>! #llm #NLProc #sigir #www
Yijia Shao (@echoshao8899) 's Twitter Profile Photo

LM agents today primarily aim to automate tasks. Can we turn them into collaborative teammates? Introducing Collaborative Gym (Co-Gym), a framework for enabling & evaluating human-agent collaboration! I now get used to agents proactively seeking confirmation or my deep thinking.

Dora Zhao (@dorazhao9) 's Twitter Profile Photo

Todo lists, docs, email style – if you've got individual or team knowledge you want ChatGPT/Claude to have access to, Knoll (knollapp.com) is a personal RAG store from Stanford University that you can add any knowledge into. Instead of copy-pasting into your prompt every time,

Salesforce AI Research (@sfresearch) 's Twitter Profile Photo

🚨 New Survey Alert! 🚨 🧠”A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems” 📘 Paper: bit.ly/4cnAhvq 🧠 Project Page: bit.ly/3E6ROv6 🧵 Researcher's thread: 👇 (1/6) Reasoning is the key to unlocking true AI

🚨 New Survey Alert! 🚨
🧠”A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems”

📘 Paper: bit.ly/4cnAhvq
🧠 Project Page: bit.ly/3E6ROv6
🧵 Researcher's thread: 👇

(1/6) Reasoning is the key to unlocking true AI
Ella Minzhi Li (@ellaminzhili) 's Twitter Profile Photo

Excited about LLM reasoning? 🤖🧠 Our latest survey dives into the field through regime & architecture dimensions, as well as input & output perspectives!💡Grateful to collaborate with amazing researchers on this exciting work—check it out!👇

Brendan Jowett (@jowettbrendan) 's Twitter Profile Photo

🚨 BREAKING: Google just dropped the most practical AI release of 2025. It handles your emails, data, docs, meetings and does it with context. This is AI that actually saves time. Here’s everything they announced (and why it matters for your team):

elvis (@omarsar0) 's Twitter Profile Photo

// A Survey of Frontiers in LLM Reasoning // Nice survey on reasoning LLM with focus on inference scaling, enhancing reasoning, and applications in agentic systems.

// A Survey of Frontiers in LLM Reasoning //

Nice survey on reasoning LLM with focus on inference scaling, enhancing reasoning, and applications in agentic systems.
Michael Ryan (@michaelryan207) 's Twitter Profile Photo

Check out CAVA 🍾 our new benchmark for end-to-end voice assistants! Large Audio Models are the next frontier for AI assistants, but what is still missing from making these models into seamless voice assistants?  Inspired by discussions with practitioners, we identify six

Will Held (@williambarrheld) 's Twitter Profile Photo

Large Audio Models should be the foundation models for voice assistants, but most benchmarks focus on chat & audio analysis skills. Read about our big team effort to develop a set of benchmarks to cover all the capabilities a model needs to support a great voice assistant!

Ella Minzhi Li (@ellaminzhili) 's Twitter Profile Photo

Check out CAVA🥂a benchmark for evaluating how Large Audio Models perform on practical tasks that matter for real-world voice assistants: talkarena.org/cava

John Yang (@jyangballin) 's Twitter Profile Photo

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified. We built it by synthesizing a ton of agentic training data from 100+ Python repos. Today we’re open-sourcing the toolkit that made it happen: SWE-smith.

40% with just 1 try per task: SWE-agent-LM-32B is the new #1 open source model on SWE-bench Verified.

We built it by synthesizing a ton of agentic training data from 100+ Python repos.

Today we’re open-sourcing the toolkit that made it happen: SWE-smith.
Yutong Zhang (@zhangyt0704) 's Twitter Profile Photo

AI companions aren’t science fiction anymore 🤖💬❤️ Thousands are turning to AI chatbots for emotional connection – finding comfort, sharing secrets, and even falling in love. But as AI companionship grows, the line between real and artificial relationships blurs. 📰 “Can A.I.

AI companions aren’t science fiction anymore 🤖💬❤️
Thousands are turning to AI chatbots for emotional connection – finding comfort, sharing secrets, and even falling in love. But as AI companionship grows, the line between real and artificial relationships blurs.

📰 “Can A.I.
Min-Yen Kan (@knmnyn) 's Twitter Profile Photo

📣 R/T! Q❓: Do you agree that ethics ⚖️ in LLMs & #NLProc are impt 4 such impactful tech 🤖? Going to ACL 2025 in Vienna? Want to learn more? Join us 27 Jul for our ✨ Ethics Tutorial✨ & 30 Jul for 🕊️ of a 🪶! #ACL2025NLP w/ Luciana Benotti Guido Ivetta Ella Minzhi Li

📣 R/T! Q❓: Do you agree that ethics ⚖️ in LLMs &amp; #NLProc are impt 4 such impactful tech 🤖? Going to <a href="/aclmeeting/">ACL 2025</a> in Vienna? Want to learn more? Join us 27 Jul for our ✨ Ethics Tutorial✨ &amp;  30 Jul for 🕊️ of a 🪶! #ACL2025NLP w/ <a href="/LucianaBenotti/">Luciana Benotti</a> <a href="/guido_ivetta/">Guido Ivetta</a> <a href="/EllaMinzhiLi/">Ella Minzhi Li</a>
Yi Tay (@yitayml) 's Twitter Profile Photo

First official Gold medal at IMO from DeepMind🥇 with Gemini Deep Think. A general purpose text-in text-out model achieving gold medal is something quite unthinkable just about one year ago and here we are! The frontier of AI is incredibly exciting! Happy to have co-led /

Stanford NLP Group (@stanfordnlp) 's Twitter Profile Photo

.Stanford NLP Group papers at ACL 2025 in Vienna next week: • HumT DumT: Measuring and controlling human-like language in LLMs Myra Cheng Sunny Yu Dan Jurafsky • Controllable and Reliable Knowledge-Intensive Task Agents with Declarative GenieWorksheets Harshit Joshi Shicheng Liu

.<a href="/stanfordnlp/">Stanford NLP Group</a> papers at <a href="/aclmeeting/">ACL 2025</a> in Vienna next week:
• HumT DumT: Measuring and controlling human-like language in LLMs <a href="/chengmyra1/">Myra Cheng</a> <a href="/sunnyyuych/">Sunny Yu</a> <a href="/jurafsky/">Dan Jurafsky</a>
• Controllable and Reliable Knowledge-Intensive Task Agents with Declarative GenieWorksheets
<a href="/harshitj__/">Harshit Joshi</a> <a href="/ShichengGLiu/">Shicheng Liu</a>
Min-Yen Kan (@knmnyn) 's Twitter Profile Photo

Last call 👋 for participation for our PM tutorial 🔥Navigating Ethical ⚖️ Challenges in NLP: Hands-on strategies for students & researchers🔥 at #aclmeeting 2025 in Vienna! See you 🫵 Sun, 27 Jul! (w/ Luciana Benotti Guido Ivetta Ella Minzhi Li & more!)

Last call 👋  for participation for our PM tutorial 🔥Navigating Ethical ⚖️  Challenges in NLP: Hands-on strategies for students &amp; researchers🔥  at #aclmeeting 2025 in Vienna! See you 🫵 Sun, 27 Jul! (w/ <a href="/LucianaBenotti/">Luciana Benotti</a> <a href="/guido_ivetta/">Guido Ivetta</a> <a href="/EllaMinzhiLi/">Ella Minzhi Li</a> &amp; more!)
Yanzhe Zhang (@stevenyzzhang) 's Twitter Profile Photo

Soon, AI agents will act for us—collaborating, negotiating, and sharing data. But can they truly protect our privacy? We simulate privacy-critical scenarios, using alternating search to evolve attacks and defenses, uncovering severe vulnerabilities and building protections.

Shafiq Joty (@jotyshafiq) 's Twitter Profile Photo

We can now say we have a stable data and multi-turn RL training recipe for building autonomous deep research agents. Thanks to the awesome team!

Zora Wang (@zhiruow) 's Twitter Profile Photo

Agents are joining us at work -- coding, writing, design. But how do they actually work, especially compared to humans? Their workflows tell a different story: They code everything, slow down human flows, and deliver low-quality work fast. Yet when teamed with humans, they shine