Ziji Shi (@shi23steven) 's Twitter Profile
Ziji Shi

@shi23steven

Ph.D. student at @NUScomputing advised by @JialinLiNUS, intern @Google XLA. Ex- @Apple @alibaba_cloud & @sensetime_ai. I build highly efficient systems for ML.

ID: 1109155836981465089

linkhttp://zijishi.xyz calendar_today22-03-2019 18:12:13

33 Tweet

199 Followers

290 Following

Andrew Ng (@andrewyng) 's Twitter Profile Photo

RIP to my friend, colleague, and AI visionary Nils Nilsson. Your work on the A* algorithm has improved countless lives (this is how we find the shortest path from A to B). I will always remember your work, but even more importantly your kindness. ai.stanford.edu/~nilsson/

Ziji Shi (@shi23steven) 's Twitter Profile Photo

Our paper on model parallel framework #EasyParallellLibrary (#EPL) was accepted by #ATC’22! Many thanks to the reviewer and co-authors. Stay tuned for exciting updates! GitHub: github.com/alibaba/EasyPa…

Our paper on model parallel framework #EasyParallellLibrary (#EPL) was accepted by #ATC’22! Many thanks to the reviewer and co-authors. Stay tuned for exciting updates! GitHub: github.com/alibaba/EasyPa…
Ziji Shi (@shi23steven) 's Twitter Profile Photo

Saddened that due to visa disapproval, I won’t be able to attend the #ATC conference to present our work. That said, I am very thankful to USENIX Association and Noa Zilberman for helping me on this matter. Hope to see you next year!

Ziji Shi (@shi23steven) 's Twitter Profile Photo

Great explanation! I also had the same question regarding tensor core. btw, TPU also uses systolic array, but some argue that systolic array is not ideal due to the lack of flexibility. Maybe the future belongs to RISC-V arch😄

cs.LG Papers (@arxiv_cs_lg) 's Twitter Profile Photo

TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation. Ziji Shi, Le Jiang, Ang Wang, Jie Zhang, Xianyan Jia, Yong Li, Chencan Wu, Jialin Li, and Wei Lin arxiv.org/abs/2302.00247

Ziji Shi (@shi23steven) 's Twitter Profile Photo

#ChatGPT has been phenomenal, but have you ever wondered how it was trained? In fact, finding the optimal parallel strategy for such LLM is very challenging, as the candidate space grows exponentially w.r.t size. (1/2)

Ziji Shi (@shi23steven) 's Twitter Profile Photo

We recently uploaded our work with Alibaba on *quickly* and *automatically* finding the optimal #tensorparallel strategy for #LLM. Compared to SoTA approaches, we are ~20-160x faster. Comments are welcomed! Arxiv: arxiv.org/abs/2302.00247

We recently uploaded our work with Alibaba on *quickly* and *automatically* finding the optimal #tensorparallel strategy for #LLM. Compared to SoTA approaches, we are ~20-160x faster. Comments are welcomed!

Arxiv: arxiv.org/abs/2302.00247
Zijiao Chen (@zijiaoc) 's Twitter Profile Photo

🧵🧠 We're witnessing incredible scientific progress in image & text reconstruction from fMRI nowadays. But what about reconstructing video from fMRI? Allow me to introduce our recent preprint: Mind-Video arxiv.org/abs/2305.11675 mind-video.com drive.google.com/drive/folders/…

Ziji Shi (@shi23steven) 's Twitter Profile Photo

(1/2) As large-scale models continue to evolve, the need for associated foundational systems is also growing. We've set up an MLSys discussion group (mlsys-sg.org), planning to host bi-weekly discussions on academic papers or updates on cutting-edge advancements.

Ziji Shi (@shi23steven) 's Twitter Profile Photo

(2/2) We warmly welcome all professionals in the field to join us, engage in enriching conversations, and contribute to our vibrant community! #LLM #Singapore #MLSys #AI #CommunityBuilding 🤝

Ziji Shi (@shi23steven) 's Twitter Profile Photo

I’m attending #ISCA co-located with #FCRC 🎉 We will present two papers at the MlArchSys and ASSYST workshop on #LLM and #GAN at Canary 2. Feel free to drop by and say hi! ISCA

Fuzhao Xue (Frio) (@xuefz) 's Twitter Profile Photo

1/ Announcing the development of OpenMoE project! 🚀 Open Mixture-of-Experts Language Models! MoE + UL2 objective + umT5 tokenizer + 50% code data mix. GitHub: github.com/XueFuzhao/Open… Blog: xuefuzhao.notion.site/Aug-2023-OpenM…

HPC Papers (@hpcpapers) 's Twitter Profile Photo

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks by Ziji Shi et al. arxiv.org/abs/2411.03999…

DeepSpeed (@deepspeedai) 's Twitter Profile Photo

Introducing Domino: a novel zero-cost communication tensor parallelism (TP) training engine for both single node and multi-node settings. - Near-complete communication hiding - Novel multi-node scalable TP solution Blog: github.com/microsoft/Deep…

Introducing Domino: a novel zero-cost communication tensor parallelism (TP) training engine for both single node and multi-node settings.

- Near-complete communication hiding
- Novel multi-node scalable TP solution 

Blog: github.com/microsoft/Deep…