Kun Yuan

            logo

BlueFog: A Decentralized Framework for Optimization and Deep Learning

Decentralized optimization algorithms are low-communication-overhead alternatives to traditional distributed algorithms using a center to conduct global average. However, the lack of an easy-to-use and efficient software package has kept most decentralized algorithms merely on paper. BlueFog is the first python library for straightforward, high-performance implementations of diverse decentralized algorithms.

Performance

Below are the charts representing the performance of BlueFog that was done on ResNet50 benchmark. Each machine has 8 V100 GPUs (64GB memory) with NVLink-enabled and the inter-connected communication speed is 25Gbps. This is the same hardware setup you can get on AWS clusters. We test the scaling efficiency with a batch size of 64 for a computationally intensive scenario, and a batch size of 32 for a communicationally intensive scenario.

Benchmark 1 Benchmark 2











In the figures, the black box represents the ideal linear scaling. It is observed that Bluefog can achieve over 95% scaling efficiency while Horovod (a state-of-the-art distributed deep learning training framework built by Uber AI team) reaches around 66% sacling efficiency with batch size 64 on 128 GPUs. For the communicationally intensive scenario with batch size 32, the scaling efficiency gap between Bluefog and Horovod becomes even larger. To understand more details about the BlueFog benchmark, checkout the performance page.

Code

BlueFog is an open-source library located at Github. Jupyter notebook tutorials are availabe at here.

Papers

The implemented algorithms and system design in BlueFog can be found in the following papers:

Talks

The introduction and usage of BlueFog are reported in the following talks: