Kaj Bostrom (@alephic2) 's Twitter Profile
Kaj Bostrom

@alephic2

NLP geek with a PhD from @utcompsci, now @ AWS. I like generative modeling but not in an evil way I promise. Also at bsky.app/profile/bostro… He/him

ID: 1069745340444635136

linkhttps://bostromk.net calendar_today04-12-2018 00:08:58

102 Tweet

343 Takipçi

421 Takip Edilen

Anna Ivanova (@neuranna) 's Twitter Profile Photo

Three years in the making - our big review/position piece on the capabilities of large language models (LLMs) from the cognitive science perspective. Thread below! 1/ arxiv.org/abs/2301.06627

Ai2 (@allen_ai) 's Twitter Profile Photo

📣Call for papers! The Natural Language Reasoning and Structured Explanations Workshop will be the first of its kind at ACL 2023, and the deadline for paper submissions is April 24. Learn more and submit here: nl-reasoning-workshop.github.io

Zayne Sprague (@zaynesprague) 's Twitter Profile Photo

LLMs are used for reasoning tasks in NL but lack explicit planning abilities. In arxiv.org/abs/2307.02472, we see if vector spaces can enable planning by choosing statements to combine to reach a conclusion. Joint w/ Kaj Bostrom Swarat Chaudhuri & Greg Durrett NLRSE workshop at #ACL2023NLP

LLMs are used for reasoning tasks in NL but lack explicit planning abilities. In arxiv.org/abs/2307.02472, we see if vector spaces can enable planning by choosing statements to combine to reach a conclusion. Joint w/ <a href="/alephic2/">Kaj Bostrom</a> <a href="/swarat/">Swarat Chaudhuri</a> &amp; <a href="/gregd_nlp/">Greg Durrett</a> NLRSE workshop at #ACL2023NLP
Ari Holtzman (@universeinanegg) 's Twitter Profile Photo

While demand for generative model training soars 📈, I think a new field is coalescing that’s focused on trying to make sense of generative models _once they’re already trained_: characterizing their behaviors, differences, and underlying mechanisms…so we wrote a paper about it!

While demand for generative model training soars 📈, I think a new field is coalescing that’s focused on trying to make sense of generative models _once they’re already trained_: characterizing their behaviors, differences, and underlying mechanisms…so we wrote a paper about it!
Zayne Sprague (@zaynesprague) 's Twitter Profile Photo

GPT-4 can write murder mysteries that it can’t solve. 🕵️ We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, & more) 📃 arxiv.org/abs/2310.16049 🌐 zayne-sprague.github.io/MuSR/ w/ Xi Ye Kaj Bostrom Swarat Chaudhuri Greg Durrett

GPT-4 can write murder mysteries that it can’t solve. 🕵️

We use GPT-4 to build a dataset, MuSR, to test the limits of LLMs’ textual reasoning abilities (commonsense, ToM, &amp; more)

📃  arxiv.org/abs/2310.16049
🌐  zayne-sprague.github.io/MuSR/

w/ <a href="/xiye_nlp/">Xi Ye</a> <a href="/alephic2/">Kaj Bostrom</a> <a href="/swarat/">Swarat Chaudhuri</a> <a href="/gregd_nlp/">Greg Durrett</a>
samim (@samim) 's Twitter Profile Photo

After extensive training with various music generation neural networks and dedicating countless hours to prompting them, it's become even more evident to me that relying solely on text prompts as interface for music creation significantly limits the creative process.

Zayne Sprague (@zaynesprague) 's Twitter Profile Photo

Super excited to bring ChatGPT Murder Mysteries to #ICLR2024 from our dataset MuSR as a spotlight presentation! A big shout-out goes to my coauthors, Xi Ye Kaj Bostrom Swarat Chaudhuri and Greg Durrett See you all there 😀

Zayne Sprague (@zaynesprague) 's Twitter Profile Photo

🍓 still has a way to go for solving murder mysteries. We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines) MuSR is still a challenge! More to come soon 😎

🍓 still has a way to go for solving murder mysteries.

We ran o1 on our dataset MuSR (ICLR ’24). It doesn’t beat Claude-3.5 Sonnet with CoT. MuSR requires a lot of commonsense reasoning and less math/logic (where 🍓 shines)

MuSR is still a challenge! More to come soon 😎
Kaj Bostrom (@alephic2) 's Twitter Profile Photo

Definitely updated my mental model of CoT based on these results - give it a read, the paper delivers right off the bat and then keeps following up with more!

Csordás Róbert (@robert_csordas) 's Twitter Profile Photo

For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6

For inputs involving many steps, the operands for each step remain important until an identical depth. This indicates that the model is *not* breaking down the computation, solving subproblems, and composing their results together. 2/6