Chengzhi Mao (@chengzhim) Twitter Tweets • TwiCopy

Chengzhi Mao

@chengzhim

+ Follow

Computer Vision, Trustworthy AI

ID: 1037086655520362497

calendar_today04-09-2018 21:15:01

2 Tweet

164 Takipçi

276 Takip Edilen

Chengzhi Mao

@chengzhim

2 years ago

#ICML 2024 How do large language models (LLM) reach their decisions? Our latest research project, SelfIE, is the first to use an LLM to explain the same LLM's internals. The interpretation can be used for safety alignment and understanding hallucinations. selfie.cs.columbia.edu

thumb_up_off_alt13

chat_bubble_outline0

repeat0

shareShare

Lihao Sun

@1e0sun

6 months ago

🚨New #ACL2025 paper! Today’s “safe” language models can look unbiased—but alignment can actually make them more biased implicitly by reducing their sensitivity to race-related associations. 🧵Find out more below!

thumb_up_off_alt11

chat_bubble_outline1

repeat2

shareShare