Yunxiang Zhang (@yunxiangzhang4) 's Twitter Profile
Yunxiang Zhang

@yunxiangzhang4

CS PhD student @UMichCSE, BS @PKU1898, #NLP

ID: 1399732727880949766

linkhttps://yunx-z.github.io/ calendar_today01-06-2021 14:21:11

34 Tweet

109 Followers

236 Following

Xin Liu (@xinliu_cs) 's Twitter Profile Photo

LLMs often exhibit poorly calibrated confidence, which undermines users' trust in their outputs. Though methods exist for short-form answers, they don't address long-form responses๐Ÿ˜• Discover the solution in our #ICLR2024 paper! ๐Ÿ“„ arxiv.org/abs/2310.19208 ๐Ÿ‘€

Farima Fatahi (on job market) (@farimafb) 's Twitter Profile Photo

๐ŸŒ How Verifiable Are LM Responses in the Wild? A Three-Way Factuality Benchmark Meet ๐…๐š๐œ๐ญ๐๐ž๐ง๐œ๐ก โ€“ an updatable benchmark for evaluating language models' factuality in real-world scenarios. ๐Ÿ”— huggingface.co/spaces/launch/โ€ฆ LaunchNLP MichiganAI Computer Science and Engineering at Michigan

๐ŸŒ How Verifiable Are LM Responses in the Wild? A Three-Way Factuality Benchmark
Meet ๐…๐š๐œ๐ญ๐๐ž๐ง๐œ๐ก โ€“ an updatable benchmark for evaluating language models' factuality in real-world scenarios.
๐Ÿ”— huggingface.co/spaces/launch/โ€ฆ
<a href="/launchnlp/">LaunchNLP</a> <a href="/michigan_AI/">MichiganAI</a> <a href="/UMichCSE/">Computer Science and Engineering at Michigan</a>
Xinliang (Frederick) Zhang (@frederickxzhang) 's Twitter Profile Photo

Heard of the Alaska-Hawaii merger?๐Ÿค”Wonder if LLMs know itโ€™s pending government approval before it can happen? They stumble, but weโ€™ve got a fixโš’๏ธ! Dive into my #EMNLP2024 work ๐๐š๐ซ๐ซ๐š๐ญ๐ข๐ฏ๐ž-๐จ๐Ÿ-๐“๐ก๐จ๐ฎ๐ ๐ก๐ญโ€”a special prompting technique to unlock LLMsโ€™ temporal reasoning

Heard of the Alaska-Hawaii merger?๐Ÿค”Wonder if LLMs know itโ€™s pending government approval before it can happen? They stumble, but weโ€™ve got a fixโš’๏ธ!
Dive into my #EMNLP2024 work ๐๐š๐ซ๐ซ๐š๐ญ๐ข๐ฏ๐ž-๐จ๐Ÿ-๐“๐ก๐จ๐ฎ๐ ๐ก๐ญโ€”a special prompting technique to unlock LLMsโ€™ temporal reasoning
Inderjeet Jayakumar Nair (@inderjeetnair) 's Twitter Profile Photo

Hi everyone ๐Ÿ‘‹, I will be presenting our work at #EMNLP2024 on automatically optimizing feedback generation systems for improved implementation performance, on 12th Nov, 14:00 - 15:30 in the Generation and Summarization oral session. See you there!

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

Evaluating LLM research agents on scientific discovery lacks objective measures for assessing proposed methods. This paper introduces MLRC-BENCH, a benchmark using Machine Learning conference competitions to objectively evaluate agent novelty and effectiveness against human

Evaluating LLM research agents on scientific discovery lacks objective measures for assessing proposed methods.

This paper introduces MLRC-BENCH, a benchmark using Machine Learning conference competitions to objectively evaluate agent novelty and effectiveness against human
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

๐ŸšจAnnouncing SCALR @ COLM 2025 โ€” Call for Papers!๐Ÿšจ The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to Conference on Language Modeling in Montreal this October! This is the first workshop dedicated to this growing research area. ๐ŸŒ scalr-workshop.github.io

๐ŸšจAnnouncing SCALR @ COLM 2025 โ€” Call for Papers!๐Ÿšจ

The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to <a href="/COLM_conf/">Conference on Language Modeling</a>  in Montreal this October!

This is the first workshop dedicated to this growing research area.

๐ŸŒ scalr-workshop.github.io
Muhammad Khalifa (@mkhalifaaaa) 's Twitter Profile Photo

๐Ÿšจ Deadline for SCALR 2025 Workshop: Testโ€‘time Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!๐Ÿšจ scalr-workshop.github.io ๐Ÿงฉ Call for short papers (4โ€ฏpages, nonโ€‘archival) now open on OpenReview! Submit by Juneโ€ฏ23,โ€ฏ2025; notifications out Julyโ€ฏ24. Topics

๐Ÿšจ Deadline for SCALR 2025 Workshop: Testโ€‘time Scaling &amp; Reasoning Models at COLM '25 <a href="/COLM_conf/">Conference on Language Modeling</a>  is approaching!๐Ÿšจ

scalr-workshop.github.io

๐Ÿงฉ Call for short papers (4โ€ฏpages, nonโ€‘archival) now open on OpenReview! Submit by Juneโ€ฏ23,โ€ฏ2025; notifications out Julyโ€ฏ24. 

Topics
Kai Zou (@zkjzou) 's Twitter Profile Photo

๐Ÿ”ฅ Excited to introduce ManyICLBench (ACL 2025) ๐Ÿง Do many-shot ICL tasks evaluate LCLMs' ability to retrieve the most similar examples or learn from many examples? We carefully analyzed numerous tasks and categorized them. ๐Ÿ“„ Paper: arxiv.org/abs/2411.07130 #ACL2025