Addison Wu (@addisonwu_) 's Twitter Profile
Addison Wu

@addisonwu_

@princeton '27 | 🇨🇦🇺🇸 | befriending llms and vlms @cocosci_lab | eng/fr

ID: 1729488441648533504

calendar_today28-11-2023 13:12:42

5 Tweet

23 Takipçi

147 Takip Edilen

Addison Wu (@addisonwu_) 's Twitter Profile Photo

New preprint!📢 There's been lots of exciting work recently on identifying gaps in and enhancing LLM reasoning. But it's equally important to take a step back and consider when doing so might HURT performance. Check out our paper for a framework!

Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

CoT prompting can actually hurt LLM performance in some tasks. The paper shows LLMs and humans share similar limitations when forced to explain their thinking Identifies specific scenarios where asking LLMs to explain reduces their accuracy. i.e. When thinking out loud makes

CoT prompting can actually hurt LLM performance in some tasks.

The paper shows LLMs and humans share similar limitations when forced to explain their thinking

Identifies specific scenarios where asking LLMs to explain reduces their accuracy.

i.e. When thinking out loud makes
Ryan Liu @ NeurIPS 2024 (@theryanliu) 's Twitter Profile Photo

Chain of thought can hurt LLM performance 🤖 Verbal (over)thinking can hurt human performance 😵‍💫 Are when/why they happen similar? Come find out at our poster at West-320 ⏰11am tomorrow! #ICML2025

Chain of thought can hurt LLM performance 🤖
Verbal (over)thinking can hurt human performance 😵‍💫

Are when/why they happen similar?

Come find out at our poster at West-320 ⏰11am tomorrow!

#ICML2025
Ed H. Chi (@edchi) 's Twitter Profile Photo

One of the better posters I saw today at #icml25 This gets at the root of the problems we were thinking about when we conceived and wrote the CoT paper.

One of the better posters I saw today at #icml25 

This gets at the root of the problems we were thinking about when we conceived and wrote the CoT paper.
Princeton Laboratory for Artificial Intelligence (@princetonainews) 's Twitter Profile Photo

Shoutout to all the Princeton University researchers participating in ICML Conference #ICML2025 Browse through some of the cutting edge research from AI Lab students, post-docs and faculty being presented this year: pli.princeton.edu/blog/2025/prin…

Shoutout to all the <a href="/Princeton/">Princeton University</a> researchers participating in <a href="/icmlconf/">ICML Conference</a> #ICML2025 

Browse through some of the cutting edge research from AI Lab students, post-docs and faculty being presented this year: pli.princeton.edu/blog/2025/prin…
Addison Wu (@addisonwu_) 's Twitter Profile Photo

Thanks so much for the excellent coverage and stopping by our poster at ICML! It was a pleasure to share our work with you!

Addison Wu (@addisonwu_) 's Twitter Profile Photo

How come LLM agents can carry out remarkable tasks like coding full-stack apps but still fall for poorly crafted pop-up scams? We formalize this using the psychological concept of motivational vigilance. Come to our PragLM spotlight talk (11:15 am 520B) and poster (1:30-2:30 pm

How come LLM agents can carry out remarkable tasks like coding full-stack apps but still fall for poorly crafted pop-up scams?

We formalize this using the psychological concept of motivational vigilance. Come to our PragLM spotlight talk (11:15 am 520B) and poster (1:30-2:30 pm
Ryan Liu @ NeurIPS 2024 (@theryanliu) 's Twitter Profile Photo

Everything online exists because someone had a reason to put it there. 🧐 LLMs process internet data, but do they consider why something was said 🤨 in the first place? NO 🛑 - in real online recommendations, LLMs get < .2 correlation with rationally weighing others' intent 🥴

Everything online exists because someone had a reason to put it there. 🧐

LLMs process internet data, but do they consider why something was said 🤨 in the first place?

NO 🛑 - in real online recommendations, LLMs get &lt; .2 correlation with rationally weighing others' intent 🥴
Rohan Paul (@rohanpaul_ai) 's Twitter Profile Photo

New Anthropic paper tests whether LLMs notice motives behind messages and adjust trust accordingly. It asks whether they can tell when someone has a hidden motive, like being paid to promote something. Shows a weakness in how current LLMs judge trust. In simple test cases, the

New Anthropic paper tests whether LLMs notice motives behind messages and adjust trust accordingly.

It asks whether they can tell when someone has a hidden motive, like being paid to promote something.

Shows a weakness in how current LLMs judge trust.

In simple test cases, the
Jiayi Geng (@jiayiigeng) 's Twitter Profile Photo

We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use. But this growing trust might be masking a hidden risk: what if their beliefs are

We use LLMs for everyday tasks—research, writing, coding, decision-making. They remember our conversations, adapt to our needs and preferences. Naturally, we trust them more with repeated use.

But this growing trust might be masking a hidden risk: what if their beliefs are