Evan Walters (@evaninwords) Twitter Tweets • TwiCopy

Evan Walters

@evaninwords

+ Follow

ML/RL enthusiast, second-order optimization, plasticity, environmentalist, JAX is easy. @LeonardoAi_ / @canva prev 🖍 @craiyonAI

ID: 752271803494653952

linkhttps://github.com/evanatyourservice calendar_today10-07-2016 22:42:29

2,2K Tweet

504 Followers

529 Following

Alexia Jolicoeur-Martineau

@jm_alexia

2 months ago

Cool new work on progressively growing a dynamic vocabulary that merges tokens using LZW compression. arxiv.org/abs/2506.01084

thumb_up_off_alt385

chat_bubble_outline7

repeat53

shareShare

Evan Walters

@evaninwords

2 months ago

Very cool! Looking forward to reading.

thumb_up_off_alt7

chat_bubble_outline0

repeat0

shareShare

xjdr

@_xjdr

2 months ago

i am an optimizer guy now. i talk about optimzers and their great importance and how you too should be an optimizer guy / gal. before and after

thumb_up_off_alt351

chat_bubble_outline17

repeat7

shareShare

xjdr

@_xjdr

a month ago

we have reached stability

thumb_up_off_alt92

chat_bubble_outline10

repeat2

shareShare

Tiny Models, Massive Capacity, Zero Labels — this is the future of health AI!! Thrilled to share that our paper-- EVA-X: a foundation model for general chest X-ray analysis with self-supervised learning, is now published in npj Journals! In collaboration with Xinggang Wang’s

thumb_up_off_alt96

chat_bubble_outline3

repeat20

shareShare

Saining Xie

@sainingxie

a month ago

most of people didn’t know this we had been using TPUs at *Facebook* as far back as 2020. Kaiming led the initial development of the TF and JAX codebase, and research projects like MAE, MoCo v3, ConvNeXt v2 and DiT were developed *entirely* on TPUs. because we were the only

thumb_up_off_alt1,1K

chat_bubble_outline32

repeat82

shareShare

Vlad Tenev

@vladtenev

24 days ago

We are on the cusp of a profound change in the field of mathematics. Vibe proving is here. Aristotle from Harmonic just proved Erdos Problem #124 in Lean, all by itself. This problem has been open for nearly 30 years since conjectured in the paper “Complete sequences

thumb_up_off_alt1,1K

chat_bubble_outline76

repeat177

shareShare

Evan Walters

@evaninwords

20 days ago

Same. It's easy to test at least, for example run Newton Schulz iters on a rank-1 matrix and it will spit out a full rank matrix!

thumb_up_off_alt7

chat_bubble_outline1

repeat1

shareShare

Anish Giri

@anishgiri

16 days ago

In between training I wrote down an algorithm of how I analyse an opening, based on the engine output. ChatGPT wrote a bunch of .py code to make it happen and now I have a new chess software called DeepPrep. Code might be gross, but it works!🔥 (not ChessMonitor related🤣)

thumb_up_off_alt172

chat_bubble_outline15

repeat9

shareShare

Evan Walters

@evaninwords

14 days ago

kudos to mikail on making an awesome optimization repo github.com/NVIDIA-NeMo/Em…

thumb_up_off_alt172

chat_bubble_outline4

repeat22

shareShare

Nico Bohlinger

@nicobohlinger

14 days ago

I just re-implemented FastTD3 and FastSAC in PyTorch and added a fully jitted JAX version. Feel free to check them out: github.com/nico-bohlinger… They work great, especially FastSAC gets me similar performance as PPO in my own locomotion envs. Excited for large scale off-policy RL!

thumb_up_off_alt140

chat_bubble_outline4

repeat18

shareShare

Evan Walters

@evaninwords

12 days ago

Apple was like “let’s do a Windows Vista type thing”

thumb_up_off_alt1

chat_bubble_outline0

repeat0

shareShare

Jonathan Fischoff

@jfischoff

12 days ago

mfw a new arch drops without norm layers

thumb_up_off_alt48

chat_bubble_outline0

repeat2

shareShare

tensorqt

@tensorqt

10 days ago

announcement: I will be founding a new company with giulio and Emanuele Rodolà. it seems very clear to us that we're on the verge of completely re-imagining many of the institutions humans have consolidated across history. one of these is the way we do, interpret,

thumb_up_off_alt423

chat_bubble_outline75

repeat28

shareShare

Jaskirat Singh

@1jaskiratsingh

8 days ago

‼️ Representations matter for generation! But turns out our understanding of how representations help generation was wrong all along ‼️ What we thought: (we were wrong) ❌ Bigger vision encoders → better representations → better generation ❌ Better Global Semantics→ better

thumb_up_off_alt161

chat_bubble_outline4

repeat40

shareShare

sway

@swaystar123

8 days ago

Speedrunning ImageNet Diffusion Abstract: Recent advances have significantly improved the training efficiency of diffusion transformers. However, these techniques have largely been studied in isolation, leaving unexplored the potential synergies from combining multiple

thumb_up_off_alt186

chat_bubble_outline6

repeat18

shareShare

Dimitri von Rütte

@dvruette

7 days ago

One surprising finding: Our scaling laws show no signs of an irreducible loss, which is in contrast to autoregressive models. Even if we try to fit an irreducible term, the best fit is almost always just zero. So does this mean that diffusion LMs will overtake AR LMs at very

thumb_up_off_alt20

chat_bubble_outline2

repeat4

shareShare

Evan Walters

@evaninwords

7 days ago

Incredibly thorough paper on the various types of diffusion LMs! They train up to 10B param models to create detailed scaling laws, elucidating sometimes surprising differences between AR and diffusion LMs.

thumb_up_off_alt1

chat_bubble_outline1

repeat0

shareShare

HessianFree

@hessianfree

5 days ago

the good old days

thumb_up_off_alt9

chat_bubble_outline1

repeat1

shareShare

Evan Walters

Alexia Jolicoeur-Martineau

Evan Walters

xjdr

xjdr

Bo Wang

Saining Xie

Vlad Tenev

Evan Walters

Anish Giri

Evan Walters

Nico Bohlinger

Evan Walters

Jonathan Fischoff

tensorqt

Jaskirat Singh

sway

Dimitri von Rütte

Evan Walters

HessianFree