Distributed Deep learning
# Resources
- https://d2l.ai/chapter_computational-performance/multiple-gpus.html
- https://jhui.github.io/2017/03/07/TensorFlow-GPU/
- https://www.logicalclocks.com/blog/goodbye-horovod-hello-collectiveallreduce
- Twelve ways to fool the masses when reporting performance of deep learning workloads
- Distributed Deep Learning 101: Introduction
# Talks
- #TALK ALCF Datascience frameworks: Tensorflow, PyTorch, Keras, and Horovod
- #TALK Scaling Deep Learning for Scientific Workloads on the #1 Summit Supercomputer
- #TALK Scaling Neural Networks Training - Thorsten Kurth
# Code
See AI/DS and DataEng/Tensorflow, keras
- #CODE
Analytics Zoo
- Distributed Tensorflow, Keras and PyTorch on Apache Spark/Flink & Ray
- https://analytics-zoo.readthedocs.io/en/latest/index.html
- #CODE Horovod
- #CODE Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training
# References
- #PAPER Evaluation of Deep Learning Frameworks over Different HPC Architectures (Shams 2017)
- #PAPER Deep Learning at 15PF: Supervised and Semi-Supervised Classification for Scientific Data (Kurth 2017)
- #PAPER
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis (Tal Ben-Nun and Torsten Hoefler 2018) ^bennun18
- #TALK Hoefler 2018
- #TALK Hoefler 2020
- #TALK Ben-Nun 2020
- #PAPER
Mesh-TensorFlow: Deep Learning for Supercomputers (Shazeer 2018) ^f86598
- #TALK https://www.youtube.com/watch?v=HgGyWS40g-g
- #CODE
Mesh-TensorFlow
- Go beyond data-parallel training
- More sophisticated parallel computations (big models that do not fit on one device)
- #PAPER GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism (Huang 2019)
- #PAPER A Quantitative Study of Deep Learning Training on Heterogeneous Supercomputers (Han 2019)
- #PAPER Channel and filter parallelism for large-scale CNN training (Dryden 2019)
- #PAPER Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism (Dryden 2019)
- #PAPER Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training (Li 2019)
- #PAPER Scalable Deep Learning on Distributed Infrastructures: Challenges, Techniques and Tools (Mayer 2019)
- #PAPER Performance Analysis of Deep Learning Workloads on Leading-edge Systems (Ren 2019)
- #PAPER
TensorFlow on State-of-the-Art HPC Clusters: A Machine Learning use Case (Ramirez-Gargallo 2019) ^ramirez19
- https://core.ac.uk/download/pdf/196280993.pdf
- Compared MN4, Power9 and Dibona HPC clusters. Only CPUs compared (Power9 GPUs are not evaluated)
- #PAPER Exascale Deep Learning for Scientific Inverse Problems (Laanait 2019)
- #PAPER TensorFlow Doing HPC (Chien 2019)
- #PAPER
ZeRO: memory optimizations toward training trillion parameter models (Rajbhandari 2019)
- #CODE
DeepSpeed
- DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective. For pytorch
- www.deepspeed.ai/
- https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
- #CODE
DeepSpeed
- #PAPER
Towards a Scalable and Distributed Infrastructure for Deep Learning Applications (Hasheminezhad 2020)
- Phylanx Deep Learning Framework
- Good comparison with respect to SOTA
- Phylanx provides a high-productivity debugable Python-based interactive interface, JetLag
- Tests only on CPU. Does it support GPUs?
- #PAPER Distributed Training of Deep Learning Models: A Taxonomic Perspective (Langer 2020)
- #PAPER Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training (Bian 2021)
- #PAPER Pathways: Asynchronous Distributed Dataflow for ML (Barham 2022)
- #PAPER Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling? (Tay 2022)