| Kun Yuan

logo

BlueFog: A Decentralized Framework for Optimization and Deep Learning

Decentralized optimization algorithms are low-communication-overhead alternatives to traditional distributed algorithms using a center to conduct global average. However, the lack of an easy-to-use and efficient software package has kept most decentralized algorithms merely on paper. BlueFog is the first python library for straightforward, high-performance implementations of diverse decentralized algorithms.

Performance

Below are the charts representing the performance of BlueFog that was done on ResNet50 benchmark. Each machine has 8 V100 GPUs (64GB memory) with NVLink-enabled and the inter-connected communication speed is 25Gbps. This is the same hardware setup you can get on AWS clusters. We test the scaling efficiency with a batch size of 64 for a computationally intensive scenario, and a batch size of 32 for a communicationally intensive scenario.

Benchmark 1 Benchmark 2

In the figures, the black box represents the ideal linear scaling. It is observed that Bluefog can achieve over 95% scaling efficiency while Horovod (a state-of-the-art distributed deep learning training framework built by Uber AI team) reaches around 66% sacling efficiency with batch size 64 on 128 GPUs. For the communicationally intensive scenario with batch size 32, the scaling efficiency gap between Bluefog and Horovod becomes even larger. To understand more details about the BlueFog benchmark, checkout the performance page.

Code

BlueFog is an open-source library located at Github. Jupyter notebook tutorials are availabe at here.

Papers

The implemented algorithms and system design in BlueFog can be found in the following papers:

Bluefog: Make decentralized algorithms practical for optimization and deep learning
B. Ying, K. Yuan, H. Hu, Y. Chen, and W. Yin
Communicate then adapt: An effective decentralized adaptive method for deep training
B. Ying^E, K. Yuan^E, Y. Chen^E, H. Hu^E, Y. Zhang, P. Pan, and W. Yin
Exponential graph is provably efficient for decentralized deep training
B. Ying^E, K. Yuan^E, Y. Chen^E, H. Hu, P. Pan, and W. Yin NeurIPS 2021
DecentLaM: Decentralized momentum SGD for large-batch deep training
K. Yuan^EC, Y. Chen^E, X. Huang^E, Y. Zhang, P. Pan, Y. Xu, and W. Yin ICCV 2021
On the influence of bias-correction on distributed stochastic optimization
K. Yuan, S. A. Alghunaim, B. Ying, and A. H. Sayed IEEE Transactions on Signal Processing, 2021

Talks

The introduction and usage of BlueFog are reported in the following talks:

Distributed Stochastic Optimization
Kun Yuan, ZJU-CSE Summer School, 2021
Parallel, Distributed, and Decentralized optimization methods
Wotao Yin, East Coast Optimization Meeting (ECOM), 2021
Faster Learning over Networks and BlueFog
BlueFog Team, China Symposium on Machine Learning and Applications (MLA), 2020