Joe Melkonian (@joemelko) 's Twitter Profile
Joe Melkonian

@joemelko

tqdm is all you need

ID: 1740886851282452481

calendar_today30-12-2023 00:06:06

178 Tweet

314 Followers

368 Following

Joe Melkonian (@joemelko) 's Twitter Profile Photo

big gpu reservation going online soon (monday?) will _finally_ be able to finish the projects I have been working on. some observations on data diversity, batch construction and how these relate to downstream task characteristics. + something something curriculum learning

Jiaqi Ma (@jiaqi_ma_) 's Twitter Profile Photo

Title: GRaSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection Problem: Gradient-based data attribution methods still suffer from computational efficiency issues for very large models due to its requirement for computing the per-sample gradients. Our

Joe Melkonian (@joemelko) 's Twitter Profile Photo

+1 on this being under-supported. My work around has been disabling train time shuffling in all forms and constructing the ordering offline, but would be great if there was something more hardened.

Joe Melkonian (@joemelko) 's Twitter Profile Photo

TIL: to_sparse_csr() breaks when u call it on a tensor with > INT_MAX elements i.e. my 99.97% sparse (5m, 10k) map of sequences to cluster-token contributions good thing u can build it in chunks

Joe Melkonian (@joemelko) 's Twitter Profile Photo

while I wait for results, going to share something (very) brief that I worked on while I was without gpus post will be up tomorrow

Joe Melkonian (@joemelko) 's Twitter Profile Photo

when I was without a home dar was nice enough to give me a couch to crash on... twice. i got to see his obsession for creating something wonderful for the world. he put his whole soul into making NEO beautiful. congrats dar. can't wait for mine :)