SparCML: High-performance sparse communication for machine learning

C. Renggli, S. Ashkboos, M. Aghagolzadeh, D.-A. Alistarh, T. Hoefler, in:, International Conference for High Performance Computing, Networking, Storage and Analysis, SC, ACM, 2019.


Conference Paper | Published | English

Scopus indexed
Author
Renggli, Cedric; Ashkboos, SalehIST Austria; Aghagolzadeh, Mehdi; Alistarh, Dan-AdrianIST Austria; Hoefler, Torsten
Department
Abstract
Applying machine learning techniques to the quickly growing data in science and industry requires highly-scalable algorithms. Large datasets are most commonly processed "data parallel" distributed across many nodes. Each node's contribution to the overall gradient is summed using a global allreduce. This allreduce is the single communication and thus scalability bottleneck for most machine learning workloads. We observe that frequently, many gradient values are (close to) zero, leading to sparse of sparsifyable communications. To exploit this insight, we analyze, design, and implement a set of communication-efficient protocols for sparse input data, in conjunction with efficient machine learning algorithms which can leverage these primitives. Our communication protocols generalize standard collective operations, by allowing processes to contribute arbitrary sparse input data vectors. Our generic communication library, SparCML1, extends MPI to support additional features, such as non-blocking (asynchronous) operations and low-precision data representations. As such, SparCML and its techniques will form the basis of future highly-scalable machine learning frameworks.
Publishing Year
Date Published
2019-11-17
Proceedings Title
International Conference for High Performance Computing, Networking, Storage and Analysis, SC
Article Number
a11
Conference
SC: Conference for High Performance Computing, Networking, Storage and Analysis
Conference Location
Denver, CO, Unites States
Conference Date
2019-11-17 – 2019-11-19
ISSN
eISSN
IST-REx-ID

Cite this

Renggli C, Ashkboos S, Aghagolzadeh M, Alistarh D-A, Hoefler T. SparCML: High-performance sparse communication for machine learning. In: International Conference for High Performance Computing, Networking, Storage and Analysis, SC. ACM; 2019. doi:10.1145/3295500.3356222
Renggli, C., Ashkboos, S., Aghagolzadeh, M., Alistarh, D.-A., & Hoefler, T. (2019). SparCML: High-performance sparse communication for machine learning. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. Denver, CO, Unites States: ACM. https://doi.org/10.1145/3295500.3356222
Renggli, Cedric, Saleh Ashkboos, Mehdi Aghagolzadeh, Dan-Adrian Alistarh, and Torsten Hoefler. “SparCML: High-Performance Sparse Communication for Machine Learning.” In International Conference for High Performance Computing, Networking, Storage and Analysis, SC. ACM, 2019. https://doi.org/10.1145/3295500.3356222.
C. Renggli, S. Ashkboos, M. Aghagolzadeh, D.-A. Alistarh, and T. Hoefler, “SparCML: High-performance sparse communication for machine learning,” in International Conference for High Performance Computing, Networking, Storage and Analysis, SC, Denver, CO, Unites States, 2019.
Renggli C, Ashkboos S, Aghagolzadeh M, Alistarh D-A, Hoefler T. 2019. SparCML: High-performance sparse communication for machine learning. International Conference for High Performance Computing, Networking, Storage and Analysis, SC. SC: Conference for High Performance Computing, Networking, Storage and Analysis
Renggli, Cedric, et al. “SparCML: High-Performance Sparse Communication for Machine Learning.” International Conference for High Performance Computing, Networking, Storage and Analysis, SC, a11, ACM, 2019, doi:10.1145/3295500.3356222.
All files available under the following license(s):
Copyright Statement:
This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)
Access Level
OA Open Access

Export

Marked Publications

Open Data IST Research Explorer

Sources

arXiv 1802.08021

Search this title in

Google Scholar
ISBN Search