Zirui Cheng (@zirui_cheng_) 's Twitter Profile
Zirui Cheng

@zirui_cheng_

MS Student at @UofIllinois | Prev Undergrad @Tsinghua_Uni | Machine Learning, Human-Computer Interaction

ID: 1338850987679698944

linkhttp://chengzr01.github.io calendar_today15-12-2020 14:18:55

14 Tweet

104 Followers

523 Following

Zirui Cheng (@zirui_cheng_) 's Twitter Profile Photo

After spending a wonderful day with my best friend studying at ETH Zurich, I am finally heading for #CHI2023! Can’t wait for my first CHI! 😍

After spending a wonderful day with my best friend studying at ETH Zurich, I am finally heading for #CHI2023! Can’t wait for my first CHI! 😍
Tsinghua CS (@thudcst) 's Twitter Profile Photo

Congrats to our HCI Lab for winning CHI2023 Honorable Mention Award as top-5% papers! It introduces Voice-Accompanying Hand-to-Face Gesture (VAHF) as a parallel channel for smarter voice interaction on wearable devices. Check out more at bit.ly/3NhtUyX. #CHI2023

Congrats to our HCI Lab for winning CHI2023 Honorable Mention Award as top-5% papers! It introduces Voice-Accompanying Hand-to-Face Gesture (VAHF) as a parallel channel for smarter voice interaction on wearable devices. Check out more at bit.ly/3NhtUyX. #CHI2023
Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

Can we use LLMs to evaluate open-ended instruction following generations? Introducing LLMBar, a benchmark for evaluating LLM evaluators 🧐LLMBar is manually curated, objective, and adversarial😈 🤯Most LLM evaluators cannot beat random guess! 📜arxiv.org/abs/2310.07641 [1/n]

Can we use LLMs to evaluate open-ended instruction following generations? Introducing LLMBar, a benchmark for evaluating LLM evaluators
🧐LLMBar is manually curated, objective, and adversarial😈
🤯Most LLM evaluators cannot beat random guess!
📜arxiv.org/abs/2310.07641

[1/n]
WikiResearch (@wikiresearch) 's Twitter Profile Photo

"Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia" (for e.g. vandalism-detecting models such as ORES) arxiv.org/html/2402.1414… meta.wikimedia.org/wiki/Research:…

"Wikibench: Community-Driven Data Curation for AI Evaluation on Wikipedia" (for e.g. vandalism-detecting models such as ORES) arxiv.org/html/2402.1414… meta.wikimedia.org/wiki/Research:…
Tzu-Sheng Kuo 郭子生 (@tzushengkuo) 's Twitter Profile Photo

✨New #CHI2024 Paper How might we empower communities to curate evaluation datasets for AI that impacts them? We present Wikibench, a system that enables communities to collaboratively curate AI datasets, while navigating ambiguities and disagreements through discussion. (1/9)

✨New #CHI2024 Paper

How might we empower communities to curate evaluation datasets for AI that impacts them?

We present Wikibench, a system that enables communities to collaboratively curate AI datasets, while navigating ambiguities and disagreements through discussion. (1/9)
Jiaxuan You (@youjiaxuan) 's Twitter Profile Photo

🚀 Excited to announce "How Far Are We From AGI?", first paper summarizing current and future research directions toward #AGI. We hope it inspires researchers to achieve AGI responsibly as a #community. 📄 Paper: arxiv.org/abs/2405.10313 💻 GitHub (PR🤗): github.com/ulab-uiuc/AGI-…

🚀 Excited to announce "How Far Are We From AGI?", first paper summarizing current and future research directions toward #AGI. We hope it inspires researchers to achieve AGI responsibly as a #community.

📄 Paper: arxiv.org/abs/2405.10313
💻 GitHub (PR🤗): github.com/ulab-uiuc/AGI-…
Jiaxuan You (@youjiaxuan) 's Twitter Profile Photo

We sincerely appreciate the successful organization of the ICLR 2024 AGI Workshop, the most popular workshop at ICLR with 800+ attendees. Keynotes by Yoshua Bengio, Oriol Vinyals, Yejin Choi, Andrew G Wilson, and Song Han are summarized in our paper. Web: agiworkshop.github.io

We sincerely appreciate the successful organization of the ICLR 2024 AGI Workshop, the most popular workshop at ICLR with 800+ attendees. Keynotes by Yoshua Bengio, Oriol Vinyals, Yejin Choi, Andrew G Wilson, and Song Han are summarized in our paper. 
Web: agiworkshop.github.io
Jiaxuan You (@youjiaxuan) 's Twitter Profile Photo

(1/n) Human research community is far from perfect. Frustrated with NeurIPS results? Research Town simulates the community as a graph of LLM agents and knowledge. It helps you find ideas, receive reviews, refine proposals, get metareviews - essentially running “NeurIPS” by LLM.

(1/n)
Human research community is far from perfect. Frustrated with NeurIPS results? Research Town simulates the community as a graph of LLM agents and knowledge. It helps you find ideas, receive reviews, refine proposals, get metareviews - essentially running “NeurIPS” by LLM.
Lifan Yuan (@lifan__yuan) 's Twitter Profile Photo

Wanna train PRMs but process labels, annotated manually or automatically, sound too expensive to you😖? Introduce Implicit PRM🚀 – Get your model free process rewards by training an ORM on the cheaper response-level data, with a simple parameterization at no additional cost💰!

Wanna train PRMs but process labels, annotated manually or automatically, sound too expensive to you😖? 
Introduce Implicit PRM🚀 – Get your model free process rewards by training an ORM on the cheaper response-level data, with a simple parameterization at no additional cost💰!
Zhiyuan Zeng (@zhiyuanzeng_) 's Twitter Profile Photo

Is a single accuracy number all we can get from model evals?🤔 🚨Does NOT tell where the model fails 🚨Does NOT tell how to improve it Introducing EvalTree🌳 🔍identifying LM weaknesses in natural language 🚀weaknesses serve as actionable guidance (paper&demo 🔗in🧵) [1/n]

Beyza Bozdag @ NAACL’25 (@nbbozdag) 's Twitter Profile Photo

Thrilled to announce our new survey that explores the exciting possibilities and troubling risks of computational persuasion in the era of LLMs 🤖💬 📄Arxiv: arxiv.org/pdf/2505.07775 💻 GitHub: github.com/beyzabozdag/Pe…

Thrilled to announce our new survey that explores the exciting possibilities and troubling risks of computational persuasion in the era of LLMs 🤖💬
📄Arxiv: arxiv.org/pdf/2505.07775 
💻 GitHub: github.com/beyzabozdag/Pe…
Sagnik Mukherjee (@saagnikkk) 's Twitter Profile Photo

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models” From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮 And this isn’t a one-off. The pattern holds across RL algorithms and models. 🧵A Deep Dive

🚨 Paper Alert: “RL Finetunes Small Subnetworks in Large Language Models”

From DeepSeek V3 Base to DeepSeek R1 Zero, a whopping 86% of parameters were NOT updated during RL training 😮😮
And this isn’t a one-off. The pattern holds across RL algorithms and models.
🧵A Deep Dive