🫡 *in monk mode* (@eric_ruleman) 's Twitter Profile
🫡 *in monk mode*

@eric_ruleman

ai & acrobatics

ID: 744967929045803008

linkhttp://acrofestivals.org calendar_today20-06-2016 18:59:30

13,13K Tweet

1,1K Followers

4,4K Following

🫡 *in monk mode* (@eric_ruleman) 's Twitter Profile Photo

training time compute can be amortized across all future inference calls but inference time compute is only valuable for one user inference time compute will therefore dramatically increase compute demands

🫡 *in monk mode* (@eric_ruleman) 's Twitter Profile Photo

Playing with the new Qwern QVQ "visual reasoning" model on Hugging Face. Comes back blank for tank man, Mao Zedong, Xi Jinping, and Jack Ma. It will identify Yao Ming though! huggingface.co/spaces/Qwen/QV…

🫡 *in monk mode* (@eric_ruleman) 's Twitter Profile Photo

Deepseek greatly decreased training costs by reducing inference from O(n^2) to O(n) via Latent Attention. Instead of computing each pairwise attention score, they converted the tokens to a latent space vector (length L) thus O(n*L) forward pass.