
Yunxiang Zhang
@yunxiangzhang4
CS PhD student @UMichCSE, BS @PKU1898, #NLP
ID: 1399732727880949766
https://yunx-z.github.io/ 01-06-2021 14:21:11
34 Tweet
109 Followers
236 Following


๐ How Verifiable Are LM Responses in the Wild? A Three-Way Factuality Benchmark Meet ๐ ๐๐๐ญ๐๐๐ง๐๐ก โ an updatable benchmark for evaluating language models' factuality in real-world scenarios. ๐ huggingface.co/spaces/launch/โฆ LaunchNLP MichiganAI Computer Science and Engineering at Michigan


Heard of the Alaska-Hawaii merger?๐คWonder if LLMs know itโs pending government approval before it can happen? They stumble, but weโve got a fixโ๏ธ! Dive into my #EMNLP2024 work ๐๐๐ซ๐ซ๐๐ญ๐ข๐ฏ๐-๐จ๐-๐๐ก๐จ๐ฎ๐ ๐ก๐ญโa special prompting technique to unlock LLMsโ temporal reasoning




๐จAnnouncing SCALR @ COLM 2025 โ Call for Papers!๐จ The 1st Workshop on Test-Time Scaling and Reasoning Models (SCALR) is coming to Conference on Language Modeling in Montreal this October! This is the first workshop dedicated to this growing research area. ๐ scalr-workshop.github.io


๐LLMs now give medical diagnoses, legal advice, and even tackle scientific problems. โYour LLM sounds smart. But what if itโs just good at faking expertise? ๐We built ExpertLongBench to find out. ๐And the results? They revealed several concerns.๐ ๐ huggingface.co/spaces/launch/โฆ


๐จ Deadline for SCALR 2025 Workshop: Testโtime Scaling & Reasoning Models at COLM '25 Conference on Language Modeling is approaching!๐จ scalr-workshop.github.io ๐งฉ Call for short papers (4โฏpages, nonโarchival) now open on OpenReview! Submit by Juneโฏ23,โฏ2025; notifications out Julyโฏ24. Topics

