Evan Walters (@evaninwords) 's Twitter Profile
Evan Walters

@evaninwords

ML/RL enthusiast, second-order optimization, plasticity, environmentalist, JAX is easy. @LeonardoAi_ / @canva prev đź–Ť @craiyonAI

ID: 752271803494653952

linkhttps://github.com/evanatyourservice calendar_today10-07-2016 22:42:29

2,2K Tweet

504 Followers

529 Following

xjdr (@_xjdr) 's Twitter Profile Photo

i am an optimizer guy now. i talk about optimzers and their great importance and how you too should be an optimizer guy / gal. before and after

i am an optimizer guy now. i talk about optimzers and their great importance and how you too should be an optimizer guy / gal. 
before and after
Bo Wang (@bowang87) 's Twitter Profile Photo

Tiny Models, Massive Capacity, Zero Labels — this is the future of health AI!! Thrilled to share that our paper-- EVA-X: a foundation model for general chest X-ray analysis with self-supervised learning, is now published in npj Journals! In collaboration with Xinggang Wang’s

Tiny Models, Massive Capacity, Zero Labels — this is the future of health AI!!

Thrilled to share that our paper-- EVA-X: a foundation model for general chest X-ray analysis with self-supervised learning, is now published in <a href="/Nature_NPJ/">npj Journals</a>!

In collaboration with <a href="/XinggangWang/">Xinggang Wang</a>’s
Saining Xie (@sainingxie) 's Twitter Profile Photo

most of people didn’t know this we had been using TPUs at *Facebook* as far back as 2020. Kaiming led the initial development of the TF and JAX codebase, and research projects like MAE, MoCo v3, ConvNeXt v2 and DiT were developed *entirely* on TPUs. because we were the only

Vlad Tenev (@vladtenev) 's Twitter Profile Photo

We are on the cusp of a profound change in the field of mathematics. Vibe proving is here. Aristotle from Harmonic just proved Erdos Problem #124 in Lean, all by itself. This problem has been open for nearly 30 years since conjectured in the paper “Complete sequences

Evan Walters (@evaninwords) 's Twitter Profile Photo

Same. It's easy to test at least, for example run Newton Schulz iters on a rank-1 matrix and it will spit out a full rank matrix!

Anish Giri (@anishgiri) 's Twitter Profile Photo

In between training I wrote down an algorithm of how I analyse an opening, based on the engine output. ChatGPT wrote a bunch of .py code to make it happen and now I have a new chess software called DeepPrep. Code might be gross, but it works!🔥 (not ChessMonitor related🤣)

In between training I wrote down an algorithm of how I analyse an opening, based on the engine output.
ChatGPT wrote a bunch of .py code to make it happen and now I have a new chess software called DeepPrep.

Code might be gross, but it works!🔥
(not ChessMonitor related🤣)
Nico Bohlinger (@nicobohlinger) 's Twitter Profile Photo

I just re-implemented FastTD3 and FastSAC in PyTorch and added a fully jitted JAX version. Feel free to check them out: github.com/nico-bohlinger… They work great, especially FastSAC gets me similar performance as PPO in my own locomotion envs. Excited for large scale off-policy RL!

tensorqt (@tensorqt) 's Twitter Profile Photo

announcement: I will be founding a new company with giulio and Emanuele RodolĂ . it seems very clear to us that we're on the verge of completely re-imagining many of the institutions humans have consolidated across history. one of these is the way we do, interpret,

Jaskirat Singh (@1jaskiratsingh) 's Twitter Profile Photo

‼️ Representations matter for generation! But turns out our understanding of how representations help generation was wrong all along ‼️ What we thought: (we were wrong) ❌ Bigger vision encoders → better representations → better generation ❌ Better Global Semantics→ better

sway (@swaystar123) 's Twitter Profile Photo

Speedrunning ImageNet Diffusion Abstract: Recent advances have significantly improved the training efficiency of diffusion transformers. However, these techniques have largely been studied in isolation, leaving unexplored the potential synergies from combining multiple

Speedrunning ImageNet Diffusion

Abstract:

Recent advances have significantly improved the training efficiency of diffusion transformers. However, these techniques have largely been studied in isolation, leaving unexplored the potential synergies from combining multiple
Dimitri von RĂĽtte (@dvruette) 's Twitter Profile Photo

One surprising finding: Our scaling laws show no signs of an irreducible loss, which is in contrast to autoregressive models. Even if we try to fit an irreducible term, the best fit is almost always just zero. So does this mean that diffusion LMs will overtake AR LMs at very

One surprising finding: Our scaling laws show no signs of an irreducible loss, which is in contrast to autoregressive models. Even if we try to fit an irreducible term, the best fit is almost always just zero.

So does this mean that diffusion LMs will overtake AR LMs at very
Evan Walters (@evaninwords) 's Twitter Profile Photo

Incredibly thorough paper on the various types of diffusion LMs! They train up to 10B param models to create detailed scaling laws, elucidating sometimes surprising differences between AR and diffusion LMs.