Yinhong Liu (@yinhongliu2) 's Twitter Profile
Yinhong Liu

@yinhongliu2

PhD student @CambridgeLTL @Cambridge_Uni. Previous research intern at Siri/AIML @Apple and @MSFTResearch. Interested in #ML, #NLProc and #LLM.

ID: 1451161796107284482

linkhttps://williamlyh.github.io/ calendar_today21-10-2021 12:21:53

61 Tweet

237 Followers

182 Following

Yingjia Alisa Wan @ICLR2025 (@yingjia_wan) 's Twitter Profile Photo

💥 Introducing "AutoPSV: Automated Process Supervised Verifier" - accepted at #NeurIPS2024! AutoPSV automatically annotates reasoning steps via confidence tracking, making it efficient and effective even without ground-truth answers. 🔗 arxiv.org/abs/2405.16802 🧵1/5

💥 Introducing "AutoPSV: Automated Process Supervised Verifier" - accepted at #NeurIPS2024!

AutoPSV automatically annotates reasoning steps via confidence tracking, making it efficient and effective even without ground-truth answers.
🔗 arxiv.org/abs/2405.16802

🧵1/5
Han Zhou (@hanzhou032) 's Twitter Profile Photo

Attending #EMNLP2024 Virtually📺! If you've ever wondered how to PROMPT your LLM-as-a-Judge⚖️, stay tuned! We will present ZEPO in the Gather Room 147 on Tue. 12, 17:45: 1. Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments. See you online🚀

Caiqi Zhang (@caiqizh) 's Twitter Profile Photo

🔥Check our EMNLP paper with Andreas Vlachos and Zhijiang Guo 🤔Do We Need Language-Specific Fact-Checking Models? The Case of Chinese arxiv.org/abs/2401.15498 ‼️ We find the domain and cultural biases in the Chinese fact-checking area that necessitate language-specific tools!

🔥Check our EMNLP paper with <a href="/vlachos_nlp/">Andreas Vlachos</a> and <a href="/ZhijiangG/">Zhijiang Guo</a>

🤔Do We Need Language-Specific Fact-Checking Models? The Case of Chinese arxiv.org/abs/2401.15498 

‼️ We find the domain and cultural biases in the Chinese fact-checking area that necessitate language-specific tools!
Zhijiang Guo (@zhijiangg) 's Twitter Profile Photo

Life update: 🎉 I'm excited to share that I will be joining HKUST Guangzhou as an Assistant Professor in Spring 2025! I'm looking for multiple PhDs and interns who are passionate about exploring research questions related to knowledge and reasoning in the context of LLMs. 🤖

Wanru Zhao (@renee42581826) 's Twitter Profile Photo

I'll be presenting CLUES🔍 at #NeurIPS2024 in person! Catch us at the poster session on: ⏰ Wed, Dec 11, 4:30–7:30 PM PST 📍 East Exhibit Hall A-C #1902 (Add it to your calendar: tinyurl.com/neurips-clues😊)

Zhaochen Su (@suzhaochen0110) 's Twitter Profile Photo

🚀 Interested in building a reliable PRM? Check out our new paper on PRMBENCH – the first process-level reward benchmark! To facilitate the research, we’ve also released a "PRM-Eval Toolkit" to evaluate various PRMs & tasks! 🤗 #AI #Benchmark #PRM

Chengzu Li (@li_chengzu) 's Twitter Profile Photo

Forget just thinking in words. 🚀 New Era of Multimodal Reasoning🚨 🔍 Imagine While Reasoning in Space with MVoT Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.

Forget just thinking in words.

🚀 New Era of Multimodal Reasoning🚨
🔍 Imagine While Reasoning in Space with MVoT

Multimodal Visualization-of-Thought (MVoT) revolutionizes reasoning by generating visual "thoughts" that transform how AI thinks, reasons, and explains itself.
Ahmad Beirami @ ICLR 2025 (@abeirami) 's Twitter Profile Photo

𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for - improving agents - scaling inference-time compute - preference alignment - jailbreaking models How does 𝐁𝐨𝐧 work? and why is it so strong? Find some answers in the paper we wrote over two Christmas breaks!🧵

𝐛𝐞𝐬𝐭-𝐨𝐟-𝐧 is a strong baseline for 
- improving agents
- scaling inference-time compute
- preference alignment 
- jailbreaking models

How does 𝐁𝐨𝐧 work? and why is it so strong?
Find some answers in the paper we wrote over two Christmas breaks!🧵
River Yijiang Dong (@river_dong121) 's Twitter Profile Photo

🚨New Paper Alert🚨 Many personalization methods optimize performance but ignore real-world impact. We examine its effects on: ✅ Performance ⚖️ Fairness: Can it represent minorities fairly? ⚠️ Unintended Effects: Does it harm safety? 🔄 Adaptability: Quickly adapt to new users?

Sicong (@leon_l_s_c) 's Twitter Profile Photo

🌟 MMR1 Multimodal Reasoning Project Now Open-Source! We’re thrilled to announce the release of MMR1, an open-source project dedicated to advancing multimodal reasoning research. The first milestone is MMR1-Math, a specialized multimodal model for mathematical tasks, achieving

🌟 MMR1 Multimodal Reasoning Project Now Open-Source!

We’re thrilled to announce the release of MMR1, an open-source project dedicated to advancing multimodal reasoning research. The first milestone is MMR1-Math, a specialized multimodal model for mathematical tasks, achieving
Yi Xu (@_yixu) 's Twitter Profile Photo

🔥Are we ranking LLMs correctly?🔥 Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A > B, B > C, but… C > A?! 🔄

🔥Are we ranking LLMs correctly?🔥

Large Language Models (LLMs) are widely used as automatic judges, but what if their rankings are unstable?😯Our latest study finds non-transitivity in LLM-as-a-judge evaluations—where A &gt; B, B &gt; C, but… C &gt; A?! 🔄
Yi Xu (@_yixu) 's Twitter Profile Photo

🚀Let’s Think Only with Images. No language and No verbal thought.🤔 Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.

🚀Let’s Think Only with Images.

No language and No verbal thought.🤔 

Let’s think through a sequence of images💭, like how humans picture steps in their minds🎨. 

We propose Visual Planning, a novel reasoning paradigm that enables models to reason purely through images.