Yunxiang Zhang (@yunxiangzhang4) 's Twitter Profile
Yunxiang Zhang

@yunxiangzhang4

CS PhD student @UMichCSE, BS @PKU1898, #NLP

ID: 1399732727880949766

linkhttps://yunx-z.github.io/ calendar_today01-06-2021 14:21:11

34 Tweet

109 Takipçi

236 Takip Edilen

Xin Liu (@xinliu_cs) 's Twitter Profile Photo

LLMs often exhibit poorly calibrated confidence, which undermines users' trust in their outputs. Though methods exist for short-form answers, they don't address long-form responses😕 Discover the solution in our #ICLR2024 paper! 📄 arxiv.org/abs/2310.19208 👀

Xinliang (Frederick) Zhang (@frederickxzhang) 's Twitter Profile Photo

Heard of the Alaska-Hawaii merger?🤔Wonder if LLMs know it’s pending government approval before it can happen? They stumble, but we’ve got a fix⚒️! Dive into my #EMNLP2024 work 𝐍𝐚𝐫𝐫𝐚𝐭𝐢𝐯𝐞-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭—a special prompting technique to unlock LLMs’ temporal reasoning

Heard of the Alaska-Hawaii merger?🤔Wonder if LLMs know it’s pending government approval before it can happen? They stumble, but we’ve got a fix⚒️!
Dive into my #EMNLP2024 work 𝐍𝐚𝐫𝐫𝐚𝐭𝐢𝐯𝐞-𝐨𝐟-𝐓𝐡𝐨𝐮𝐠𝐡𝐭—a special prompting technique to unlock LLMs’ temporal reasoning
Inderjeet Jayakumar Nair (@inderjeetnair) 's Twitter Profile Photo

Hi everyone 👋, I will be presenting our work at #EMNLP2024 on automatically optimizing feedback generation systems for improved implementation performance, on 12th Nov, 14:00 - 15:30 in the Generation and Summarization oral session. See you there!

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Evaluating LLM research agents on scientific discovery lacks objective measures for assessing proposed methods. This paper introduces MLRC-BENCH, a benchmark using Machine Learning conference competitions to objectively evaluate agent novelty and effectiveness against human

Evaluating LLM research agents on scientific discovery lacks objective measures for assessing proposed methods.

This paper introduces MLRC-BENCH, a benchmark using Machine Learning conference competitions to objectively evaluate agent novelty and effectiveness against human
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨 The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to Conference on Language Modeling in Montreal this October! This is the first workshop dedicated to this growing research area. 🌐 scalr-workshop.github.io

🚨Announcing SCALR @ COLM 2025 — Call for Papers!🚨

The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to <a href="/COLM_conf/">Conference on Language Modeling</a>  in Montreal this October!

This is the first workshop dedicated to this growing research area.

🌐 scalr-workshop.github.io
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!🚨 scalr-workshop.github.io 🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. Topics

🚨 Deadline for SCALR 2025 Workshop: Test‑time Scaling &amp; Reasoning Models at COLM '25 <a href="/COLM_conf/">Conference on Language Modeling</a>  is approaching!🚨

scalr-workshop.github.io

🧩 Call for short papers (4 pages, non‑archival) now open on OpenReview! Submit by June 23, 2025; notifications out July 24. 

Topics
Kai Zou (@zkjzou) 's Twitter Profile Photo

🔥 Excited to introduce ManyICLBench (ACL 2025) 🧐 Do many-shot ICL tasks evaluate LCLMs' ability to retrieve the most similar examples or learn from many examples? We carefully analyzed numerous tasks and categorized them. 📄 Paper: arxiv.org/abs/2411.07130 #ACL2025