---
_id: '13053'
abstract:
- lang: eng
text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or
quantization, before they can be deployed in practical settings. In this work
we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization
step in a principled way, in order to produce models whose local loss behavior
is stable under compression operations such as pruning. Thus, dense models trained
via CrAM should be compressible post-training, in a single step, without significant
accuracy loss. Experimental results on standard benchmarks, such as residual networks
for ImageNet classification and BERT models for language modelling, show that
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based
baselines, but which are stable under weight pruning: specifically, we can prune
models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%
with reasonable (∼1%) accuracy loss, which is competitive with gradual compression
methods. Additionally, CrAM can produce sparse models which perform well for transfer
learning, and it also works for semi-structured 2:4 pruning patterns supported
by GPU hardware. The code for reproducing the results is available at this https
URL .'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC)
under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant
agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale
de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further
acknowledge the support from the Scientific Service Units (SSU) of ISTA through
resources provided by Scientific Computing (SciComp)-"
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Adrian
full_name: Vladu, Adrian
last_name: Vladu
- first_name: Eldar
full_name: Kurtic, Eldar
id: 47beb3a5-07b5-11eb-9b87-b108ec578218
last_name: Kurtic
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. In: 11th International Conference on Learning Representations .'
apa: 'Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., & Alistarh, D.-A. (n.d.).
CrAM: A Compression-Aware Minimizer. In 11th International Conference on Learning
Representations . Kigali, Rwanda .'
chicago: 'Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert,
and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In 11th International
Conference on Learning Representations , n.d.'
ieee: 'E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM:
A Compression-Aware Minimizer,” in 11th International Conference on Learning
Representations , Kigali, Rwanda .'
ista: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. 11th International Conference on Learning Representations . ICLR: International
Conference on Learning Representations.'
mla: 'Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” 11th
International Conference on Learning Representations .'
short: E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International
Conference on Learning Representations , n.d.
conference:
end_date: 2023-05-05
location: 'Kigali, Rwanda '
name: 'ICLR: International Conference on Learning Representations'
start_date: 2023-05-01
date_created: 2023-05-23T11:36:18Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2023-06-01T12:54:45Z
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
ec_funded: 1
external_id:
arxiv:
- '2207.14200'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://openreview.net/pdf?id=_eTZBs-yedr
month: '05'
oa: 1
oa_version: Preprint
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: '11th International Conference on Learning Representations '
publication_status: accepted
quality_controlled: '1'
related_material:
record:
- id: '13074'
relation: dissertation_contains
status: public
status: public
title: 'CrAM: A Compression-Aware Minimizer'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '13074'
abstract:
- lang: eng
text: "Deep learning has become an integral part of a large number of important
applications, and many of the recent breakthroughs have been enabled by the ability
to train very large models, capable to capture complex patterns and relationships
from the data. At the same time, the massive sizes of modern deep learning models
have made their deployment to smaller devices more challenging; this is particularly
important, as in many applications the users rely on accurate deep learning predictions,
but they only have access to devices with limited memory and compute power. One
solution to this problem is to prune neural networks, by setting as many of their
parameters as possible to zero, to obtain accurate sparse models with lower memory
footprint. Despite the great research progress in obtaining sparse models that
preserve accuracy, while satisfying memory and computational constraints, there
are still many challenges associated with efficiently training sparse models,
as well as understanding their generalization properties.\r\n\r\nThe focus of
this thesis is to investigate how the training process of sparse models can be
made more efficient, and to understand the differences between sparse and dense
models in terms of how well they can generalize to changes in the data distribution.
We first study a method for co-training sparse and dense models, at a lower cost
compared to regular training. With our method we can obtain very accurate sparse
networks, and dense models that can recover the baseline accuracy. Furthermore,
we are able to more easily analyze the differences, at prediction level, between
the sparse-dense model pairs. Next, we investigate the generalization properties
of sparse neural networks in more detail, by studying how well different sparse
models trained on a larger task can adapt to smaller, more specialized tasks,
in a transfer learning scenario. Our analysis across multiple pruning methods
and sparsity levels reveals that sparse models provide features that can transfer
similarly to or better than the dense baseline. However, the choice of the pruning
method plays an important role, and can influence the results when the features
are fixed (linear finetuning), or when they are allowed to adapt to the new task
(full finetuning). Using sparse models with fixed masks for finetuning on new
tasks has an important practical advantage, as it enables training neural networks
on smaller devices. However, one drawback of current pruning methods is that the
entire training cycle has to be repeated to obtain the initial sparse model, for
every sparsity target; in consequence, the entire training process is costly and
also multiple models need to be stored. In the last part of the thesis we propose
a method that can train accurate dense models that are compressible in a single
step, to multiple sparsity levels, without additional finetuning. Our method results
in sparse models that can be competitive with existing pruning methods, and which
can also successfully generalize to new tasks."
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
citation:
ama: Peste E-A. Efficiency and generalization of sparse neural networks. 2023. doi:10.15479/at:ista:13074
apa: Peste, E.-A. (2023). Efficiency and generalization of sparse neural networks.
Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:13074
chicago: Peste, Elena-Alexandra. “Efficiency and Generalization of Sparse Neural
Networks.” Institute of Science and Technology Austria, 2023. https://doi.org/10.15479/at:ista:13074.
ieee: E.-A. Peste, “Efficiency and generalization of sparse neural networks,” Institute
of Science and Technology Austria, 2023.
ista: Peste E-A. 2023. Efficiency and generalization of sparse neural networks.
Institute of Science and Technology Austria.
mla: Peste, Elena-Alexandra. Efficiency and Generalization of Sparse Neural Networks.
Institute of Science and Technology Austria, 2023, doi:10.15479/at:ista:13074.
short: E.-A. Peste, Efficiency and Generalization of Sparse Neural Networks, Institute
of Science and Technology Austria, 2023.
date_created: 2023-05-23T17:07:53Z
date_published: 2023-05-23T00:00:00Z
date_updated: 2023-08-04T10:33:27Z
day: '23'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
doi: 10.15479/at:ista:13074
ec_funded: 1
file:
- access_level: open_access
checksum: 6b3354968403cb9d48cc5a83611fb571
content_type: application/pdf
creator: epeste
date_created: 2023-05-24T16:11:16Z
date_updated: 2023-05-24T16:11:16Z
file_id: '13087'
file_name: PhD_Thesis_Alexandra_Peste_final.pdf
file_size: 2152072
relation: main_file
success: 1
- access_level: closed
checksum: 8d0df94bbcf4db72c991f22503b3fd60
content_type: application/zip
creator: epeste
date_created: 2023-05-24T16:12:59Z
date_updated: 2023-05-24T16:12:59Z
file_id: '13088'
file_name: PhD_Thesis_APeste.zip
file_size: 1658293
relation: source_file
file_date_updated: 2023-05-24T16:12:59Z
has_accepted_license: '1'
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
page: '147'
project:
- _id: 2564DBCA-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '665385'
name: International IST Doctoral Program
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
issn:
- 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
record:
- id: '11458'
relation: part_of_dissertation
status: public
- id: '13053'
relation: part_of_dissertation
status: public
- id: '12299'
relation: part_of_dissertation
status: public
status: public
supervisor:
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
title: Efficiency and generalization of sparse neural networks
type: dissertation
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2023'
...
---
_id: '14771'
abstract:
- lang: eng
text: Pruning—that is, setting a significant subset of the parameters of a neural
network to zero—is one of the most popular methods of model compression. Yet,
several recent works have raised the issue that pruning may induce or exacerbate
bias in the output of the compressed model. Despite existing evidence for this
phenomenon, the relationship between neural network pruning and induced bias is
not well-understood. In this work, we systematically investigate and characterize
this phenomenon in Convolutional Neural Networks for computer vision. First, we
show that it is in fact possible to obtain highly-sparse models, e.g. with less
than 10% remaining weights, which do not decrease in accuracy nor substantially
increase in bias when compared to dense models. At the same time, we also find
that, at higher sparsities, pruned models exhibit higher uncertainty in their
outputs, as well as increased correlations, which we directly link to increased
bias. We propose easy-to-use criteria which, based only on the uncompressed model,
establish whether bias will increase with pruning, and identify the samples most
susceptible to biased predictions post-compression. Our code can be found at https://github.com/IST-DASLab/pruned-vision-model-bias.
acknowledgement: The authors would like to sincerely thank Sara Hooker for her feedback
during the development of this work. EI was supported in part by the FWF DK VGSCO,
grant agreement number W1260-N35. AP and DA acknowledge generous ERC support, via
Starting Grant 805223 ScaleML.
article_processing_charge: No
author:
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Iofinova EB, Peste E-A, Alistarh D-A. Bias in pruned vision models: In-depth
analysis and countermeasures. In: 2023 IEEE/CVF Conference on Computer Vision
and Pattern Recognition. IEEE; 2023:24364-24373. doi:10.1109/cvpr52729.2023.02334'
apa: 'Iofinova, E. B., Peste, E.-A., & Alistarh, D.-A. (2023). Bias in pruned
vision models: In-depth analysis and countermeasures. In 2023 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (pp. 24364–24373). Vancouver, BC,
Canada: IEEE. https://doi.org/10.1109/cvpr52729.2023.02334'
chicago: 'Iofinova, Eugenia B, Elena-Alexandra Peste, and Dan-Adrian Alistarh. “Bias
in Pruned Vision Models: In-Depth Analysis and Countermeasures.” In 2023 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 24364–73. IEEE, 2023.
https://doi.org/10.1109/cvpr52729.2023.02334.'
ieee: 'E. B. Iofinova, E.-A. Peste, and D.-A. Alistarh, “Bias in pruned vision models:
In-depth analysis and countermeasures,” in 2023 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 24364–24373.'
ista: 'Iofinova EB, Peste E-A, Alistarh D-A. 2023. Bias in pruned vision models:
In-depth analysis and countermeasures. 2023 IEEE/CVF Conference on Computer Vision
and Pattern Recognition. CVPR: Conference on Computer Vision and Pattern Recognition,
24364–24373.'
mla: 'Iofinova, Eugenia B., et al. “Bias in Pruned Vision Models: In-Depth Analysis
and Countermeasures.” 2023 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, IEEE, 2023, pp. 24364–73, doi:10.1109/cvpr52729.2023.02334.'
short: E.B. Iofinova, E.-A. Peste, D.-A. Alistarh, in:, 2023 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–24373.
conference:
end_date: 2023-06-24
location: Vancouver, BC, Canada
name: 'CVPR: Conference on Computer Vision and Pattern Recognition'
start_date: 2023-06-17
date_created: 2024-01-10T08:42:40Z
date_published: 2023-08-22T00:00:00Z
date_updated: 2024-01-10T08:59:26Z
day: '22'
department:
- _id: DaAl
- _id: ChLa
doi: 10.1109/cvpr52729.2023.02334
ec_funded: 1
external_id:
arxiv:
- '2304.12622'
isi:
- '001062531308068'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2304.12622
month: '08'
oa: 1
oa_version: Preprint
page: 24364-24373
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
grant_number: ' W1260-N35'
name: Vienna Graduate School on Computational Optimization
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication_identifier:
eisbn:
- '9798350301298'
eissn:
- 2575-7075
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
link:
- relation: software
url: https://github.com/IST-DASLab/pruned-vision-model-bias
status: public
title: 'Bias in pruned vision models: In-depth analysis and countermeasures'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '12299'
abstract:
- lang: eng
text: 'Transfer learning is a classic paradigm by which models pretrained on large
“upstream” datasets are adapted to yield good results on “downstream” specialized
datasets. Generally, more accurate models on the “upstream” dataset tend to provide
better transfer accuracy “downstream”. In this work, we perform an in-depth investigation
of this phenomenon in the context of convolutional neural networks (CNNs) trained
on the ImageNet dataset, which have been pruned-that is, compressed by sparsifiying
their connections. We consider transfer using unstructured pruned models obtained
by applying several state-of-the-art pruning methods, including magnitude-based,
second-order, regrowth, lottery-ticket, and regularization approaches, in the
context of twelve standard transfer tasks. In a nutshell, our study shows that
sparse models can match or even outperform the transfer performance of dense models,
even at high sparsities, and, while doing so, can lead to significant inference
and even training speedups. At the same time, we observe and analyze significant
differences in the behaviour of different pruning methods. The code is available
at: https://github.com/IST-DASLab/sparse-imagenet-transfer.'
acknowledgement: he authors would like to sincerely thank Christoph Lampert and Nir
Shavit for fruitful discussions during the development of this work, and Eldar Kurtic
for experimental support. EI was supported in part by the FWF DK VGSCO, grant agreement
number W1260-N35, while AP and DA acknowledge generous support by the ERC, via Starting
Grant 805223 ScaleML.
article_processing_charge: No
author:
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Mark
full_name: Kurtz, Mark
last_name: Kurtz
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. How well do sparse ImageNet
models transfer? In: 2022 IEEE/CVF Conference on Computer Vision and Pattern
Recognition. Institute of Electrical and Electronics Engineers; 2022:12256-12266.
doi:10.1109/cvpr52688.2022.01195'
apa: 'Iofinova, E. B., Peste, E.-A., Kurtz, M., & Alistarh, D.-A. (2022). How
well do sparse ImageNet models transfer? In 2022 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (pp. 12256–12266). New Orleans, LA, United
States: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/cvpr52688.2022.01195'
chicago: Iofinova, Eugenia B, Elena-Alexandra Peste, Mark Kurtz, and Dan-Adrian
Alistarh. “How Well Do Sparse ImageNet Models Transfer?” In 2022 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 12256–66. Institute of Electrical
and Electronics Engineers, 2022. https://doi.org/10.1109/cvpr52688.2022.01195.
ieee: E. B. Iofinova, E.-A. Peste, M. Kurtz, and D.-A. Alistarh, “How well do sparse
ImageNet models transfer?,” in 2022 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, New Orleans, LA, United States, 2022, pp. 12256–12266.
ista: 'Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. 2022. How well do sparse ImageNet
models transfer? 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
CVPR: Computer Vision and Pattern Recognition, 12256–12266.'
mla: Iofinova, Eugenia B., et al. “How Well Do Sparse ImageNet Models Transfer?”
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Institute
of Electrical and Electronics Engineers, 2022, pp. 12256–66, doi:10.1109/cvpr52688.2022.01195.
short: E.B. Iofinova, E.-A. Peste, M. Kurtz, D.-A. Alistarh, in:, 2022 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, Institute of Electrical
and Electronics Engineers, 2022, pp. 12256–12266.
conference:
end_date: 2022-06-24
location: New Orleans, LA, United States
name: 'CVPR: Computer Vision and Pattern Recognition'
start_date: 2022-06-18
date_created: 2023-01-16T10:06:00Z
date_published: 2022-09-27T00:00:00Z
date_updated: 2023-08-04T10:33:28Z
day: '27'
department:
- _id: DaAl
- _id: ChLa
doi: 10.1109/cvpr52688.2022.01195
ec_funded: 1
external_id:
arxiv:
- '2111.13445'
isi:
- '000870759105034'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2111.13445
month: '09'
oa: 1
oa_version: Preprint
page: 12256-12266
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
grant_number: ' W1260-N35'
name: Vienna Graduate School on Computational Optimization
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication_identifier:
eissn:
- 2575-7075
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
related_material:
record:
- id: '13074'
relation: dissertation_contains
status: public
scopus_import: '1'
status: public
title: How well do sparse ImageNet models transfer?
type: conference
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
year: '2022'
...
---
_id: '10180'
abstract:
- lang: eng
text: The growing energy and performance costs of deep learning have driven the
community to reduce the size of neural networks by selectively pruning components.
Similarly to their biological counterparts, sparse networks generalize just as
well, sometimes even better than, the original dense networks. Sparsity promises
to reduce the memory footprint of regular networks to fit mobile devices, as well
as shorten training time for ever growing networks. In this paper, we survey prior
work on sparsity in deep learning and provide an extensive tutorial of sparsification
for both inference and training. We describe approaches to remove and add elements
of neural networks, different training strategies to achieve model sparsity, and
mechanisms to exploit sparsity in practice. Our work distills ideas from more
than 300 research papers and provides guidance to practitioners who wish to utilize
sparsity today, as well as to researchers whose goal is to push the frontier forward.
We include the necessary background on mathematical methods in sparsification,
describe phenomena such as early structure adaptation, the intricate relations
between sparsity and the training process, and show techniques for achieving acceleration
on real hardware. We also define a metric of pruned parameter efficiency that
could serve as a baseline for comparison of different sparse networks. We close
by speculating on how sparsity can improve future workloads and outline major
open problems in the field.
acknowledgement: "We thank Doug Burger, Steve Scott, Marco Heddes, and the respective
teams at Microsoft for inspiring discussions on the topic. We thank Angelika Steger
for uplifting debates about the connections to biological brains, Sidak Pal Singh
for his support regarding experimental results, and Utku Evci as well as Xin Wang
for comments on previous versions of this\r\nwork. Special thanks go to Bernhard
Schölkopf, our JMLR editor Samy Bengio, and the three anonymous reviewers who provided
excellent comprehensive, pointed, and deep review comments that improved the quality
of our manuscript significantly."
article_processing_charge: No
article_type: original
author:
- first_name: Torsten
full_name: Hoefler, Torsten
last_name: Hoefler
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
- first_name: Tal
full_name: Ben-Nun, Tal
last_name: Ben-Nun
- first_name: Nikoli
full_name: Dryden, Nikoli
last_name: Dryden
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
citation:
ama: 'Hoefler T, Alistarh D-A, Ben-Nun T, Dryden N, Peste E-A. Sparsity in deep
learning: Pruning and growth for efficient inference and training in neural networks.
Journal of Machine Learning Research. 2021;22(241):1-124.'
apa: 'Hoefler, T., Alistarh, D.-A., Ben-Nun, T., Dryden, N., & Peste, E.-A.
(2021). Sparsity in deep learning: Pruning and growth for efficient inference
and training in neural networks. Journal of Machine Learning Research.
Journal of Machine Learning Research.'
chicago: 'Hoefler, Torsten, Dan-Adrian Alistarh, Tal Ben-Nun, Nikoli Dryden, and
Elena-Alexandra Peste. “Sparsity in Deep Learning: Pruning and Growth for Efficient
Inference and Training in Neural Networks.” Journal of Machine Learning Research.
Journal of Machine Learning Research, 2021.'
ieee: 'T. Hoefler, D.-A. Alistarh, T. Ben-Nun, N. Dryden, and E.-A. Peste, “Sparsity
in deep learning: Pruning and growth for efficient inference and training in neural
networks,” Journal of Machine Learning Research, vol. 22, no. 241. Journal
of Machine Learning Research, pp. 1–124, 2021.'
ista: 'Hoefler T, Alistarh D-A, Ben-Nun T, Dryden N, Peste E-A. 2021. Sparsity in
deep learning: Pruning and growth for efficient inference and training in neural
networks. Journal of Machine Learning Research. 22(241), 1–124.'
mla: 'Hoefler, Torsten, et al. “Sparsity in Deep Learning: Pruning and Growth for
Efficient Inference and Training in Neural Networks.” Journal of Machine Learning
Research, vol. 22, no. 241, Journal of Machine Learning Research, 2021, pp.
1–124.'
short: T. Hoefler, D.-A. Alistarh, T. Ben-Nun, N. Dryden, E.-A. Peste, Journal of
Machine Learning Research 22 (2021) 1–124.
date_created: 2021-10-24T22:01:34Z
date_published: 2021-09-01T00:00:00Z
date_updated: 2022-05-13T09:36:08Z
day: '01'
ddc:
- '000'
department:
- _id: DaAl
external_id:
arxiv:
- '2102.00554'
file:
- access_level: open_access
checksum: 3389d9d01fc58f8fb4c1a53e14a8abbf
content_type: application/pdf
creator: cziletti
date_created: 2021-10-27T15:34:18Z
date_updated: 2021-10-27T15:34:18Z
file_id: '10192'
file_name: 2021_JMachLearnRes_Hoefler.pdf
file_size: 3527521
relation: main_file
success: 1
file_date_updated: 2021-10-27T15:34:18Z
has_accepted_license: '1'
intvolume: ' 22'
issue: '241'
language:
- iso: eng
license: https://creativecommons.org/licenses/by/4.0/
main_file_link:
- open_access: '1'
url: https://www.jmlr.org/papers/v22/21-0366.html
month: '09'
oa: 1
oa_version: Published Version
page: 1-124
publication: Journal of Machine Learning Research
publication_identifier:
eissn:
- 1533-7928
issn:
- 1532-4435
publication_status: published
publisher: Journal of Machine Learning Research
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Sparsity in deep learning: Pruning and growth for efficient inference and
training in neural networks'
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 22
year: '2021'
...
---
_id: '11458'
abstract:
- lang: eng
text: 'The increasing computational requirements of deep neural networks (DNNs)
have led to significant interest in obtaining DNN models that are sparse, yet
accurate. Recent work has investigated the even harder case of sparse training,
where the DNN weights are, for as much as possible, already sparse to reduce computational
costs during training. Existing sparse training methods are often empirical and
can have lower accuracy relative to the dense baseline. In this paper, we present
a general approach called Alternating Compressed/DeCompressed (AC/DC) training
of DNNs, demonstrate convergence for a variant of the algorithm, and show that
AC/DC outperforms existing sparse training methods in accuracy at similar computational
budgets; at high sparsity levels, AC/DC even outperforms existing methods that
rely on accurate pre-trained dense models. An important property of AC/DC is that
it allows co-training of dense and sparse models, yielding accurate sparse–dense
model pairs at the end of the training process. This is useful in practice, where
compressed variants may be desirable for deployment in resource-constrained settings
without re-doing the entire training flow, and also provides us with insights
into the accuracy gap between dense and compressed models. The code is available
at: https://github.com/IST-DASLab/ACDC.'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: This project has received funding from the European Research Council
(ERC) under the European Union’s Horizon 2020 research and innovation programme
(grant agreement No 805223 ScaleML), and a CNRS PEPS grant. This research was supported
by the Scientific Service Units (SSU) of IST Austria through resources provided
by Scientific Computing (SciComp). We would also like to thank Christoph Lampert
for his feedback on an earlier version of this work, as well as for providing hardware
for the Transformer-XL experiments.
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Adrian
full_name: Vladu, Adrian
last_name: Vladu
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Peste E-A, Iofinova EB, Vladu A, Alistarh D-A. AC/DC: Alternating Compressed/DeCompressed
training of deep neural networks. In: 35th Conference on Neural Information
Processing Systems. Vol 34. Curran Associates; 2021:8557-8570.'
apa: 'Peste, E.-A., Iofinova, E. B., Vladu, A., & Alistarh, D.-A. (2021). AC/DC:
Alternating Compressed/DeCompressed training of deep neural networks. In 35th
Conference on Neural Information Processing Systems (Vol. 34, pp. 8557–8570).
Virtual, Online: Curran Associates.'
chicago: 'Peste, Elena-Alexandra, Eugenia B Iofinova, Adrian Vladu, and Dan-Adrian
Alistarh. “AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural
Networks.” In 35th Conference on Neural Information Processing Systems,
34:8557–70. Curran Associates, 2021.'
ieee: 'E.-A. Peste, E. B. Iofinova, A. Vladu, and D.-A. Alistarh, “AC/DC: Alternating
Compressed/DeCompressed training of deep neural networks,” in 35th Conference
on Neural Information Processing Systems, Virtual, Online, 2021, vol. 34,
pp. 8557–8570.'
ista: 'Peste E-A, Iofinova EB, Vladu A, Alistarh D-A. 2021. AC/DC: Alternating Compressed/DeCompressed
training of deep neural networks. 35th Conference on Neural Information Processing
Systems. NeurIPS: Neural Information Processing Systems vol. 34, 8557–8570.'
mla: 'Peste, Elena-Alexandra, et al. “AC/DC: Alternating Compressed/DeCompressed
Training of Deep Neural Networks.” 35th Conference on Neural Information Processing
Systems, vol. 34, Curran Associates, 2021, pp. 8557–70.'
short: E.-A. Peste, E.B. Iofinova, A. Vladu, D.-A. Alistarh, in:, 35th Conference
on Neural Information Processing Systems, Curran Associates, 2021, pp. 8557–8570.
conference:
end_date: 2021-12-14
location: Virtual, Online
name: 'NeurIPS: Neural Information Processing Systems'
start_date: 2021-12-06
date_created: 2022-06-20T12:11:53Z
date_published: 2021-12-06T00:00:00Z
date_updated: 2023-06-01T12:54:45Z
day: '6'
department:
- _id: GradSch
- _id: DaAl
ec_funded: 1
external_id:
arxiv:
- '2106.12379'
intvolume: ' 34'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://proceedings.neurips.cc/paper/2021/file/48000647b315f6f00f913caa757a70b3-Paper.pdf
month: '12'
oa: 1
oa_version: Published Version
page: 8557-8570
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: 35th Conference on Neural Information Processing Systems
publication_identifier:
isbn:
- '9781713845393'
issn:
- 1049-5258
publication_status: published
publisher: Curran Associates
quality_controlled: '1'
related_material:
record:
- id: '13074'
relation: dissertation_contains
status: public
scopus_import: '1'
status: public
title: 'AC/DC: Alternating Compressed/DeCompressed training of deep neural networks'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 34
year: '2021'
...