We'll be presenting the Forgetting Transformer during the poster session at 3 pm on April 25th at #ICLR2025 (board number 282). Come and chat with us!
• Poster info: iclr.cc/virtual/2025/p…
• Paper: arxiv.org/abs/2503.02130
• Code: github.com/zhixuan-lin/fo…
#COLM2025 We introduce Adaptive Computation Pruning (ACP) for the Forgetting Transformer (FoX) — a provably safe pruning method that significantly speeds up our Forgetting Attention kernel, especially for long-context pretraining. Our simple Triton kernel with ACP is 1.7x to 2.4x
going beyond dormancy and into gradient activity for identifying neuron activity. check out our work led by Jason Liu , zihao wu, and Johan Obando-Ceron 👍🏽 .
and if you'll be in san diego for #neurips2025 come by our poster to chat!