Chengzhi Mao (@chengzhim) 's Twitter Profile
Chengzhi Mao

@chengzhim

Computer Vision, Trustworthy AI

ID: 1037086655520362497

calendar_today04-09-2018 21:15:01

2 Tweet

164 Followers

276 Following

Chengzhi Mao (@chengzhim) 's Twitter Profile Photo

#ICML 2024 How do large language models (LLM) reach their decisions? Our latest research project, SelfIE, is the first to use an LLM to explain the same LLM's internals. The interpretation can be used for safety alignment and understanding hallucinations. selfie.cs.columbia.edu

Lihao Sun (@1e0sun) 's Twitter Profile Photo

🚨New #ACL2025 paper! Today’s “safe” language models can look unbiased—but alignment can actually make them more biased implicitly by reducing their sensitivity to race-related associations. 🧵Find out more below!

🚨New #ACL2025 paper!

Today’s “safe” language models can look unbiased—but alignment can actually make them more biased implicitly by reducing their sensitivity to race-related associations.

đź§µFind out more below!