LaunchNLP (@launchnlp) 's Twitter Profile
LaunchNLP

@launchnlp

Research work and announcement by a group of researchers who study natural language processing @UMich. Part of @michigan_AI and @UMichCSE.

ID: 1528114132142247937

linkhttps://launch.eecs.umich.edu/ calendar_today21-05-2022 20:43:24

46 Tweet

134 Takipçi

12 Takip Edilen

Xinliang (Frederick) Zhang (@frederickxzhang) 's Twitter Profile Photo

Proud of Michigan NLP group! I have 3 papers at #EMNLP2023 (1 main, 2 findings): IE for news, biased event understanding (arxiv.org/abs/2310.18827, arxiv.org/abs/2310.18768); annotation disagreemnent (arxiv.org/abs/2305.14663)! Though I won't be in 🇸🇬, always open to chat virtually!

Maitrix.org (@maitrixorg) 's Twitter Profile Photo

Releasing 🔥LLM Reasoners v1.0🔥 🥇Popular library for advanced LLM reasoning - Reasoning-via-Planning (RAP)🎶 - Chain-of-Thoughts (CoT)⛓️ - Tree-of-Thoughts (ToT)🌴 - Grace decoding💄 - Beam search🔎 🥇Enhances #Llama3, GPT4, LLMs on Hugging Face llm-reasoners.net

Releasing 🔥LLM Reasoners v1.0🔥

🥇Popular library for advanced LLM reasoning

 - Reasoning-via-Planning (RAP)🎶
 - Chain-of-Thoughts (CoT)⛓️
 - Tree-of-Thoughts (ToT)🌴
 - Grace decoding💄
 - Beam search🔎

🥇Enhances #Llama3, GPT4, LLMs on <a href="/huggingface/">Hugging Face</a>

llm-reasoners.net
Xin Liu (@xinliu_cs) 's Twitter Profile Photo

LLMs often exhibit poorly calibrated confidence, which undermines users' trust in their outputs. Though methods exist for short-form answers, they don't address long-form responses😕 Discover the solution in our #ICLR2024 paper! 📄 arxiv.org/abs/2310.19208 👀

Tech At Bloomberg (@techatbloomberg) 's Twitter Profile Photo

Congratulations to Computer Science and Engineering at Michigan + MichiganAI / LaunchNLP's Shuyang Cao on being one of the 2023-2024 Bloomberg #DataScience Ph.D. Fellows! Learn more about Shuyang’s research focus and our latest cohort of Fellows: bloom.bg/3WXR0Q0 #AI #ML #NLProc

Congratulations to <a href="/UMichCSE/">Computer Science and Engineering at Michigan</a> + <a href="/michigan_AI/">MichiganAI</a> / <a href="/launchnlp/">LaunchNLP</a>'s Shuyang Cao on being one of the 2023-2024 <a href="/Bloomberg/">Bloomberg</a> #DataScience Ph.D. Fellows!
Learn more about Shuyang’s research focus and our latest cohort of Fellows: bloom.bg/3WXR0Q0
#AI #ML #NLProc
LaunchNLP (@launchnlp) 's Twitter Profile Photo

Don’t miss Frederick’s NAACL work, MOKA, exploring moral value understanding from the perspective of event-level understanding. If AI meeting human values is your jam, this is a must-read! 🌟. ⚠️ Poster session happening soon! Xinliang (Frederick) Zhang Lu Wang MichiganAI Computer Science and Engineering at Michigan

Computer Science and Engineering at Michigan (@umichcse) 's Twitter Profile Photo

👏A round of applause to PhD student Inderjeet Jayakumar Nair and Prof. Lu Wang on winning an🏆SAC Area Chair's Award - announced today at #ACL2024! Awarded to only 21 publications of the 1915 main conference papers accepted. @ACLmeeting 👉Check out their paper: aclanthology.org/2024.acl-long.…

👏A round of applause to PhD student <a href="/InderjeetNair/">Inderjeet Jayakumar Nair</a> and Prof. <a href="/LuWang__/">Lu Wang</a> on winning an🏆SAC Area Chair's Award - announced today at #ACL2024! Awarded to only 21 publications of the 1915 main conference papers accepted. @ACLmeeting

👉Check out their paper: aclanthology.org/2024.acl-long.…
NAACL HLT 2025 (@naaclmeeting) 's Twitter Profile Photo

📢 NAACL needs Reviewers & Area Chairs! 📝 If you haven't received an invite for ARR Oct 2024 & want to contribute, sign up by Oct 22nd! ➡️AC form: forms.office.com/r/8j6jXLfASt ➡️Reviewer form: forms.office.com/r/cjPNtL9gPE Please RT 🔁 and help spread the word! 🗣️ #NLProc ACLRollingReview

Farima Fatahi (on job market) (@farimafb) 's Twitter Profile Photo

🌍 How Verifiable Are LM Responses in the Wild? A Three-Way Factuality Benchmark Meet 𝐅𝐚𝐜𝐭𝐁𝐞𝐧𝐜𝐡 – an updatable benchmark for evaluating language models' factuality in real-world scenarios. 🔗 huggingface.co/spaces/launch/… LaunchNLP MichiganAI Computer Science and Engineering at Michigan

🌍 How Verifiable Are LM Responses in the Wild? A Three-Way Factuality Benchmark
Meet 𝐅𝐚𝐜𝐭𝐁𝐞𝐧𝐜𝐡 – an updatable benchmark for evaluating language models' factuality in real-world scenarios.
🔗 huggingface.co/spaces/launch/…
<a href="/launchnlp/">LaunchNLP</a> <a href="/michigan_AI/">MichiganAI</a> <a href="/UMichCSE/">Computer Science and Engineering at Michigan</a>
Xinliang (Frederick) Zhang (@frederickxzhang) 's Twitter Profile Photo

Heard of the Alaska-Hawaii merger?🤔Wonder if LLMs know it’s pending government approval before it can happen? They stumble, but we’ve got a fix⚒️! Dive into my #EMNLP2024 work 𝐍𝐚𝐫𝐫𝐚𝐭𝐢𝐯𝐞-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭—a special prompting technique to unlock LLMs’ temporal reasoning

Heard of the Alaska-Hawaii merger?🤔Wonder if LLMs know it’s pending government approval before it can happen? They stumble, but we’ve got a fix⚒️!
Dive into my #EMNLP2024 work 𝐍𝐚𝐫𝐫𝐚𝐭𝐢𝐯𝐞-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭—a special prompting technique to unlock LLMs’ temporal reasoning
Yunxiang Zhang (@yunxiangzhang4) 's Twitter Profile Photo

🚨 New Benchmark Drop! Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML conference research competitions? We built MLRC-BENCH to find out. Paper: arxiv.org/abs/2504.09702 Leaderboard: huggingface.co/spaces/launch/… Code: github.com/yunx-z/MLRC-Be…

🚨 New Benchmark Drop!
Can LLMs actually do ML research? Not toy problems, not Kaggle tweaks—but real, unsolved ML conference research competitions?
We built MLRC-BENCH to find out.
Paper: arxiv.org/abs/2504.09702
Leaderboard: huggingface.co/spaces/launch/…
Code: github.com/yunx-z/MLRC-Be…
Ayoung Lee (@o_cube01) 's Twitter Profile Photo

📢New benchmark out! We introduce CLASH, a benchmark of 345💥high-stakes dilemmas and 3,795 perspectives to evaluate how well LLMs handle complex value reasoning. GPT-4 and Claude? Not quite there. 📄 arxiv.org/pdf/2504.10823 🤗 huggingface.co/datasets/launc…

📢New benchmark out!

We introduce CLASH, a benchmark of 345💥high-stakes dilemmas and 3,795 perspectives to evaluate how well LLMs handle complex value reasoning.

GPT-4 and Claude? Not quite there.

📄 arxiv.org/pdf/2504.10823
🤗 huggingface.co/datasets/launc…
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨 The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to Conference on Language Modeling in Montreal this October! This is the first workshop dedicated to this growing research area. 🌐 scalr-workshop.github.io

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨

The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to <a href="/COLM_conf/">Conference on Language Modeling</a>  in Montreal this October!

This is the first workshop dedicated to this growing research area.

🌐 scalr-workshop.github.io
Jie Ruan (@jieruan75) 's Twitter Profile Photo

🔍LLMs now give medical diagnoses, legal advice, and even tackle scientific problems. ❓Your LLM sounds smart. But what if it’s just good at faking expertise? 🚀We built ExpertLongBench to find out. 📉And the results? They revealed several concerns.👇 🔗 huggingface.co/spaces/launch/…

🔍LLMs now give medical diagnoses, legal advice, and even tackle scientific problems.
❓Your LLM sounds smart.  But what if it’s just good at faking expertise?
🚀We built ExpertLongBench to find out.
📉And the results? They revealed several concerns.👇
🔗  huggingface.co/spaces/launch/…
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling &amp; Reasoning Models at COLM '25 <a href="/COLM_conf/">Conference on Language Modeling</a>  is approaching!🚨

scalr-workshop.github.io

🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. 

Topics
Kai Zou (@zkjzou) 's Twitter Profile Photo

🔥 Excited to introduce ManyICLBench (ACL 2025) 🧐 Do many-shot ICL tasks evaluate LCLMs' ability to retrieve the most similar examples or learn from many examples? We carefully analyzed numerous tasks and categorized them. 📄 Paper: arxiv.org/abs/2411.07130 #ACL2025