Kaj Bostrom (@alephic2) Twitter Tweets • TwiCopy

Kaj Bostrom

@alephic2

+ Follow

NLP geek with a PhD from @utcompsci, now @ AWS. I like generative modeling but not in an evil way I promise. Also at bsky.app/profile/bostro… He/him

ID: 1069745340444635136

linkhttps://bostromk.net calendar_today04-12-2018 00:08:58

102 Tweet

343 Takipçi

421 Takip Edilen

Anna Ivanova

@neuranna

3 years ago

Three years in the making - our big review/position piece on the capabilities of large language models (LLMs) from the cognitive science perspective. Thread below! 1/ arxiv.org/abs/2301.06627

thumb_up_off_alt1,1K

chat_bubble_outline22

repeat372

shareShare

📣Call for papers! The Natural Language Reasoning and Structured Explanations Workshop will be the first of its kind at ACL 2023, and the deadline for paper submissions is April 24. Learn more and submit here: nl-reasoning-workshop.github.io

thumb_up_off_alt14

chat_bubble_outline1

repeat6

shareShare

Zayne Sprague

@zaynesprague

2 years ago

LLMs are used for reasoning tasks in NL but lack explicit planning abilities. In arxiv.org/abs/2307.02472, we see if vector spaces can enable planning by choosing statements to combine to reach a conclusion. Joint w/ Kaj Bostrom Swarat Chaudhuri & Greg Durrett NLRSE workshop at #ACL2023NLP

thumb_up_off_alt69

chat_bubble_outline1

repeat20

shareShare

Ari Holtzman

@universeinanegg

2 years ago

While demand for generative model training soars 📈, I think a new field is coalescing that’s focused on trying to make sense of generative models _once they’re already trained_: characterizing their behaviors, differences, and underlying mechanisms…so we wrote a paper about it!

thumb_up_off_alt308

chat_bubble_outline6

repeat63

shareShare

Zayne Sprague

@zaynesprague

2 years ago

GPT-4 can write murder mysteries that it can’t solve. 🕵️ We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, & more) 📃 arxiv.org/abs/2310.16049 🌐 zayne-sprague.github.io/MuSR/ w/ Xi Ye Kaj Bostrom Swarat Chaudhuri Greg Durrett

thumb_up_off_alt98

chat_bubble_outline4

repeat26

shareShare

samim

@samim

2 years ago

After extensive training with various music generation neural networks and dedicating countless hours to prompting them, it's become even more evident to me that relying solely on text prompts as interface for music creation significantly limits the creative process.

thumb_up_off_alt202

chat_bubble_outline21

repeat20

shareShare

Zayne Sprague

@zaynesprague

2 years ago

Super excited to bring ChatGPT Murder Mysteries to #ICLR2024 from our dataset MuSR as a spotlight presentation! A big shout-out goes to my coauthors, Xi Ye Kaj Bostrom Swarat Chaudhuri and Greg Durrett See you all there 😀

thumb_up_off_alt42

chat_bubble_outline0

repeat10

shareShare

Zayne Sprague

@zaynesprague

a year ago

🍓 still has a way to go for solving murder mysteries. We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines) MuSR is still a challenge! More to come soon 😎

thumb_up_off_alt180

chat_bubble_outline7

repeat37

shareShare

Kaj Bostrom

@alephic2

a year ago

Definitely updated my mental model of CoT based on these results - give it a read, the paper delivers right off the bat and then keeps following up with more!

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

Csordás Róbert

@robert_csordas

6 months ago

For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6