Lukas Schmit (@_lukasschmit_) 's Twitter Profile
Lukas Schmit

@_lukasschmit_

Founder/CTO @deeptuneai | Forbes 30u30 | ML & Ableton enthusiast. Release house/techno as Lukas Schmit. Release Dubstep/Future bass as Nokturne

ID: 788176793542860802

linkhttps://www.deeptune.com/ calendar_today18-10-2016 00:36:06

82 Tweet

262 Takipçi

844 Takip Edilen

xlr8harder (@xlr8harder) 's Twitter Profile Photo

pip error: numpy must be installed in a superposition of different versions in order to satisfy all mutually contradictory version requirements

Mark Goldstein (@marikgoldstein) 's Twitter Profile Photo

diffusion models are just a ploy by CS PhDs to get their departments to pay for them to learn stochastic calculus so that they can get finance jobs

tenderizzation (@tenderizzation) 's Twitter Profile Photo

llama uses rmsnorm, r1 uses rmsnorm, grok uses rmsnorm, o1/o3 use [redacted] so why doesn't pytorch have a native (fused) rmsnorm in the year of our lord 2025? not for lack of user requests e.g., github.com/pytorch/pytorc… 1/n

llama uses rmsnorm, r1 uses rmsnorm, grok uses rmsnorm, o1/o3 use [redacted]
so why doesn't pytorch have a native (fused) rmsnorm in the year of our lord 2025?
not for lack of user requests e.g., github.com/pytorch/pytorc…

1/n
Yushun Zhang (@ericzhang0410) 's Twitter Profile Photo

New paper alert! We report that the Hessian of NNs has a very special structure: 1. it appears to be a "block-diagonal-block-circulant" matrix at initialization; 2. then it quickly evolves into a "near-block-diagonal" matrix along training. We then theoretically reveal two

New paper alert!  We report that the Hessian of NNs has a very special structure: 
1. it appears to be a "block-diagonal-block-circulant" matrix at initialization;
2. then it quickly evolves into a "near-block-diagonal" matrix along training.

We then theoretically reveal two