Yinhong Liu (@yinhongliu2) Twitter Tweets • TwiCopy

Yinhong Liu

@yinhongliu2

+ Follow

PhD student @CambridgeLTL @Cambridge_Uni. Previous research intern at Siri/AIML @Apple and @MSFTResearch. Interested in #ML, #NLProc and #LLM.

ID: 1451161796107284482

linkhttps://williamlyh.github.io/ calendar_today21-10-2021 12:21:53

61 Tweet

237 Followers

182 Following

Yingjia Alisa Wan @ICLR2025

@yingjia_wan

a year ago

💥 Introducing "AutoPSV: Automated Process Supervised Verifier" - accepted at #NeurIPS2024! AutoPSV automatically annotates reasoning steps via confidence tracking, making it efficient and effective even without ground-truth answers. 🔗 arxiv.org/abs/2405.16802 🧵1/5

thumb_up_off_alt110

chat_bubble_outline7

repeat38

shareShare

Han Zhou

@hanzhou032

a year ago

Attending #EMNLP2024 Virtually📺! If you've ever wondered how to PROMPT your LLM-as-a-Judge⚖️, stay tuned! We will present ZEPO in the Gather Room 147 on Tue. 12, 17:45: 1. Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments. See you online🚀

thumb_up_off_alt11

chat_bubble_outline1

repeat4

shareShare

Caiqi Zhang

@caiqizh

a year ago

🔥Check our EMNLP paper with Andreas Vlachos and Zhijiang Guo 🤔Do We Need Language-Specific Fact-Checking Models? The Case of Chinese arxiv.org/abs/2401.15498 ‼️ We find the domain and cultural biases in the Chinese fact-checking area that necessitate language-specific tools!

🔥Check our EMNLP paper with <a href="/vlachos_nlp/">Andreas Vlachos</a> and <a href="/ZhijiangG/">Zhijiang Guo</a>

🤔Do We Need Language-Specific Fact-Checking Models? The Case of Chinese arxiv.org/abs/2401.15498

‼️ We find the domain and cultural biases in the Chinese fact-checking area that necessitate language-specific tools!

thumb_up_off_alt13

chat_bubble_outline0

repeat5

shareShare

Zhijiang Guo

@zhijiangg

a year ago

Life update: 🎉 I'm excited to share that I will be joining HKUST Guangzhou as an Assistant Professor in Spring 2025! I'm looking for multiple PhDs and interns who are passionate about exploring research questions related to knowledge and reasoning in the context of LLMs. 🤖

thumb_up_off_alt186

chat_bubble_outline24

repeat23

shareShare

Wanru Zhao

@renee42581826

a year ago

I'll be presenting CLUES🔍 at #NeurIPS2024 in person! Catch us at the poster session on: ⏰ Wed, Dec 11, 4:30–7:30 PM PST 📍 East Exhibit Hall A-C #1902 (Add it to your calendar: tinyurl.com/neurips-clues😊)

thumb_up_off_alt47

chat_bubble_outline1

repeat7

shareShare

Zhaochen Su

@suzhaochen0110

a year ago

🚀 Interested in building a reliable PRM? Check out our new paper on PRMBENCH – the first process-level reward benchmark! To facilitate the research, we’ve also released a "PRM-Eval Toolkit" to evaluate various PRMs & tasks! 🤗 #AI #Benchmark #PRM

thumb_up_off_alt5

chat_bubble_outline0

repeat1

shareShare

Chengzu Li

@li_chengzu

a year ago

Forget just thinking in words. 🚀 New Era of Multimodal Reasoning🚨 🔍 Imagine While Reasoning in Space with MVoT Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

thumb_up_off_alt739

chat_bubble_outline14

repeat165

shareShare

Ahmad Beirami @ ICLR 2025

@abeirami

a year ago

𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for - improving agents - scaling inference-time compute - preference alignment - jailbreaking models How does 𝐁𝐨𝐧 work? and why is it so strong? Find some answers in the paper we wrote over two Christmas breaks!🧵

thumb_up_off_alt357

chat_bubble_outline5

repeat55

shareShare

Yinhong Liu

@yinhongliu2

10 months ago

Long-text factuality is a challenging topic and here’s our cheap & effective approach! 🚀🚀🚀

thumb_up_off_alt4

chat_bubble_outline0

repeat0

shareShare

River Yijiang Dong

@river_dong121

10 months ago

🚨New Paper Alert🚨 Many personalization methods optimize performance but ignore real-world impact. We examine its effects on: ✅ Performance ⚖️ Fairness: Can it represent minorities fairly? ⚠️ Unintended Effects: Does it harm safety? 🔄 Adaptability: Quickly adapt to new users?

thumb_up_off_alt28

chat_bubble_outline8

repeat6

shareShare

Sicong

@leon_l_s_c

9 months ago

🌟 MMR1 Multimodal Reasoning Project Now Open-Source! We’re thrilled to announce the release of MMR1, an open-source project dedicated to advancing multimodal reasoning research. The first milestone is MMR1-Math, a specialized multimodal model for mathematical tasks, achieving

thumb_up_off_alt136

chat_bubble_outline5

repeat51

shareShare

Yi Xu

@_yixu

9 months ago

🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄

thumb_up_off_alt133

chat_bubble_outline2

repeat33

shareShare

Yi Xu

@_yixu

7 months ago

🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

thumb_up_off_alt1,1K

chat_bubble_outline13

repeat207

shareShare