profile-img
Luke VanderHart

@levanderhart

https://t.co/0UJ18H8Juj

calendar_today18-02-2010 22:07:11

3,9K Tweets

2,2K Followers

518 Following

Luke VanderHart(@levanderhart) 's Twitter Profile Photo

Based on your current understanding of LLMs, do you believe they are capable of considering saying one thing in response to a prompt, but then deciding to say something else instead?

(The model itself, not a separate censor layer)

account_circle