Piotr Piękos (@piotrpiekosai) 's Twitter Profile
Piotr Piękos

@piotrpiekosai

PhD student with @SchmidhuberAI at @KAUST. Interested in systematic generalization and reasoning.

ID: 561139601

linkhttps://piotrpiekos.github.io calendar_today23-04-2012 13:39:19

25 Tweet

49 Followers

144 Following

Bernhard Schölkopf (@bschoelkopf) 's Twitter Profile Photo

Dear Students, please don't get discouraged... we need you. Quoting another Geoff: "The future depends on some graduate student who is deeply suspicious of everything I have said" (8/8)

Łukasz Kuciński (@lukekucinski) 's Twitter Profile Photo

In complex problems, states present varying degrees of difficulty, from easy, with obvious action sequences, to hard, requiring reasoning. Can planning methods adapt to this accordingly? That’s exactly what AdaSubS is doing! youtu.be/7GZbPB1Gu0E 🧵👇

Csordás Róbert (@robert_csordas) 's Twitter Profile Photo

Come visit our poster "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention" on Thursday at 11 am in East Exhibit Hall A-C on #NeurIPS2024. With Piotr Piękos, Kazuki Irie and Jürgen Schmidhuber.

Come visit our poster "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention" on Thursday at 11 am in East Exhibit Hall A-C on #NeurIPS2024. With <a href="/PiotrPiekosAI/">Piotr Piękos</a>, Kazuki Irie and <a href="/SchmidhuberAI/">Jürgen Schmidhuber</a>.
Piotr Piękos (@piotrpiekosai) 's Twitter Profile Photo

What if instead of a couple of dense attention heads, we use lots of sparse heads, each learning to select its own set of tokens to process? Introducing Mixture of Sparse Attention (MoSA)

What if instead of a couple of dense attention heads, we use lots of sparse heads, each learning to select its own set of tokens to process?

Introducing Mixture of Sparse Attention (MoSA)
fly51fly (@fly51fly) 's Twitter Profile Photo

[LG] Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing P Piękos, R Csordás, J Schmidhuber [KAUST & Stanford University] (2025) arxiv.org/abs/2505.00315

[LG] Mixture of Sparse Attention: Content-Based Learnable Sparse Attention via Expert-Choice Routing
P Piękos, R Csordás, J Schmidhuber [KAUST &amp; Stanford University] (2025)
arxiv.org/abs/2505.00315