Peter Henderson (@peterhndrsn) 's Twitter Profile
Peter Henderson

@peterhndrsn

Work on topics related to AI, Law, & Society. Assistant Professor @PrincetonCS @PrincetonSPIA and @PrincetonCITP ๐Ÿ“šJD/PhD (Law+AI) @Stanford

ID: 508360150

linkhttps://www.peterhenderson.co/ calendar_today29-02-2012 04:22:30

943 Tweet

3,3K Followers

997 Following

Peter Henderson (@peterhndrsn) 's Twitter Profile Photo

Interesting analogy used in Hong Kong case judgement from a few months ago: [I]nterviewing a witness is not that different from using ChatGPT, DeepSeek or similar AI models โ€“ one can only get useful answers if one asks the right questions. Defective prompts would lead to

Interesting analogy used in Hong Kong case judgement from a few months ago:

[I]nterviewing a witness is not that different from using ChatGPT, DeepSeek or similar AI models โ€“ one can only get useful answers if one asks the right questions. Defective prompts would lead to
NIK (@ns123abc) 's Twitter Profile Photo

openai is under a court order to log every output and give it to a court including all deleted chats and sensitive chats logged via API

openai is under a court order to log every output and give it to a court including all deleted chats and sensitive chats logged via API
Peter Henderson (@peterhndrsn) 's Twitter Profile Photo

๐ŸšจReddit sues Anthropic!๐Ÿšจ This is going to be a really interesting case. Some quick thoughts... ๐Ÿงต๐Ÿ‘‡ 1โƒฃ Notice, there's no copyright claim. Reddit doesn't really own the copyright to user-uploaded content, so nothing to do here. Reddit also doesn't make *any* federal claims.

๐ŸšจReddit sues Anthropic!๐Ÿšจ

This is going to be a really interesting case. Some quick thoughts... ๐Ÿงต๐Ÿ‘‡

1โƒฃ Notice, there's no copyright claim. Reddit doesn't really own the copyright to user-uploaded content, so nothing to do here. Reddit also doesn't make *any* federal claims.
Andres Guadamuz (@technollama) 's Twitter Profile Photo

The Getty v Stability UK trial starts today, exciting times! A lot of the trial will rest on whether the training of the Stability models took place in the UK, which was an important part of why the judge almost threw out the case back in late 2023. technollama.co.uk/high-court-rulโ€ฆ

Peter Henderson (@peterhndrsn) 's Twitter Profile Photo

This is a cool idea to get around the shortcomings of tokens. But my hope is that in a few years, we'll actually mostly get rid of input tokens in favor of raw bytes (e.g., byT5) or operate on raw visual input more like humans ๐Ÿง . Imo, tokenization still feels really unnatural

Brenden Lake (@lakebrenden) 's Twitter Profile Photo

I'm joining Princeton as an Associate Professor of Computer Science and Psychology this fall! Princeton is ambitiously investing in AI and Natural & Artificial Minds, and I'm excited for my lab to contribute. Recruiting postdocs and Ph.D. students in CS and Psychology โ€” join us!

I'm joining Princeton as an Associate Professor of Computer Science and Psychology this fall! Princeton is ambitiously investing in AI and Natural & Artificial Minds, and I'm excited for my lab to contribute. Recruiting postdocs and Ph.D. students in CS and Psychology โ€” join us!
Wenhao Chai (@wenhaocha1) 's Twitter Profile Photo

We introduce LiveCodeBench Pro. Models like o3-high, o4-mini, and Gemini 2.5 Pro score 0% on hard competitive programming problems.

We introduce LiveCodeBench Pro.
Models like o3-high, o4-mini, and Gemini 2.5 Pro score 0% on hard competitive programming problems.
Peter Henderson (@peterhndrsn) 's Twitter Profile Photo

Thanks to the new utm_source field that ChatGPT attaches to links, we see a lot of filings in court that clearly show evidence of ChatGPT usage despite not yet being called out or do not contain hallucinated citations.

Thanks to the new utm_source field that ChatGPT attaches to links, we see a lot of filings in court that clearly show evidence of ChatGPT usage despite not yet being called out or do not contain hallucinated citations.
Peter Henderson (@peterhndrsn) 's Twitter Profile Photo

Super useful to see how contaminated benchmarks are! โš ๏ธMMLU seems very much deprecated between 9-28% contamination, error rate, and high avg model perf. ๐Ÿ“ˆSlight uptick in contamination of GPQA on more recent CC crawls. Future probably requires live dynamic, slightly