
L
@codetitanium
ID: 1685205841417560064
29-07-2023 08:32:01
1,1K Tweet
89 Followers
3,3K Following


Adam is similar to many algorithms, but cannot be effectively replaced by any simpler variant in LMs. The community is starting to get the recipe right, but what is the secret sauce? Robert M. Gower 🇺🇦 and I found that it has to do with the beta parameters and variational inference.



Vahab Mirrokni Meisam Razaviyayn Even with a powerful surprise metric and enhanced memory capacity, the memory needs to properly be updated and optimized. In fact, a bad update rule can cause the memory to be stuck in local optima and so does not properly memorize the context. While almost all models are based



Finally some real exciting architecture work which focuses on actual training speed efficiency and performance. Really feel like this is the path towards continual learning for llms. Congrats! (and obv Songlin Yang is on it bruh)



kexue.fm/archives/11006 introduces the idea of using matrices and their msign to perform general operations on the singular values, including singular value clipping, step functions, and arbitrary polynomials (not just odd polynomials). leloy! You Jiacheng rohan anil









