L (@llllvvuu) 's Twitter Profile
L

@llllvvuu

ID: 3281109110

calendar_today16-07-2015 01:13:01

10,10K Tweet

5,5K Followers

480 Following

L (@llllvvuu) 's Twitter Profile Photo

interesting that anything with O(1) state is called “RNN”. i would’ve thought the more meaningful property would have been ω(1) depth, i.e. the success of transformers comes from layer k only depend on layer <k output thus achieving O(1) depth

L (@llllvvuu) 's Twitter Profile Photo

True, but let’s innovate further. Imagine if you not only had vocabulary but also syntax and semantics to express the behavior of your application. Who’s building this?

L (@llllvvuu) 's Twitter Profile Photo

Seen a similar thing also got shouted out on SLIME docs. Could this be the kind of thing that brings back value function research in OSS? Since if you train directly on live user feedback (vs re-run against a reward model) you don’t have groups

L (@llllvvuu) 's Twitter Profile Photo

I find the SLO-matched framing that SemiAnalysis does quite pointless. “chip 2 can achieve better SLOs than chip 1” “it is infinity times cheaper to meet the new SLOs on chip 2 than chip 1” what did we learn?

L (@llllvvuu) 's Twitter Profile Photo

All frontier API providers offer arbitrary prefix match, so I’d guess none are using linear attention? I wonder if there are any tells of SWA, e.g. ABCD hits but AB misses the window

L (@llllvvuu) 's Twitter Profile Photo

It goes like that: you have this nice system for serving LLMs efficiently across multi-turn, long-context, and shared-prompt scenarios. Then researchers invent with some new BS to pump benchmarks by 0.1%. Then they start talking about dynamic this encoder that, then you cry.

Yifan Zhang (@yifan_zhang_) 's Twitter Profile Photo

After 18 months of hard work by Tomas and Zhen, we cooked it! 🚀 Thanks to all friends who give constructive feedback! Deep Learning 2.0, Rethinking every fundamental cornerstone of Modern Foundation Models. It's just the beginning, Hyped! 🚀 github.com/FlashSampling/…

After 18 months of hard work by Tomas and Zhen, we cooked it! 🚀 Thanks to all friends who give constructive feedback!

Deep Learning 2.0, Rethinking every fundamental cornerstone of Modern Foundation Models.

It's just the beginning, Hyped! 🚀

github.com/FlashSampling/…