---
_id: '15011'
abstract:
- lang: eng
text: Pruning large language models (LLMs) from the BERT family has emerged as a
standard compression benchmark, and several pruning methods have been proposed
for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question
the validity of all existing methods, exhibiting a more complex setup where many
known pruning methods appear to fail. We revisit the question of accurate BERT-pruning
during fine-tuning on downstream datasets, and propose a set of general guidelines
for successful pruning, even on the challenging SMC benchmark. First, we perform
a cost-vs-benefits analysis of pruning model components, such as the embeddings
and the classification head; second, we provide a simple-yet-general way of scaling
training, sparsification and learning rate schedules relative to the desired target
sparsity; finally, we investigate the importance of proper parametrization for
Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art
results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark,
showing that even classic gradual magnitude pruning (GMP) can yield competitive
results, with the right approach.
alternative_title:
- PMLR
article_processing_charge: No
author:
- first_name: Eldar
full_name: Kurtic, Eldar
id: 47beb3a5-07b5-11eb-9b87-b108ec578218
last_name: Kurtic
- first_name: Torsten
full_name: Hoefler, Torsten
last_name: Hoefler
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering
accuracy on the “Sparsity May Cry” benchmark. In: Proceedings of Machine Learning
Research. Vol 234. ML Research Press; 2024:542-553.'
apa: 'Kurtic, E., Hoefler, T., & Alistarh, D.-A. (2024). How to prune your language
model: Recovering accuracy on the “Sparsity May Cry” benchmark. In Proceedings
of Machine Learning Research (Vol. 234, pp. 542–553). Hongkong, China: ML
Research Press.'
chicago: 'Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune
Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.”
In Proceedings of Machine Learning Research, 234:542–53. ML Research Press,
2024.'
ieee: 'E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model:
Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in Proceedings of
Machine Learning Research, Hongkong, China, 2024, vol. 234, pp. 542–553.'
ista: 'Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model:
Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine
Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234,
542–553.'
mla: 'Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy
on the ‘Sparsity May Cry’ Benchmark.” Proceedings of Machine Learning Research,
vol. 234, ML Research Press, 2024, pp. 542–53.'
short: E. Kurtic, T. Hoefler, D.-A. Alistarh, in:, Proceedings of Machine Learning
Research, ML Research Press, 2024, pp. 542–553.
conference:
end_date: 2024-01-06
location: Hongkong, China
name: 'CPAL: Conference on Parsimony and Learning'
start_date: 2024-01-03
date_created: 2024-02-18T23:01:03Z
date_published: 2024-01-08T00:00:00Z
date_updated: 2024-02-26T10:30:52Z
day: '08'
department:
- _id: DaAl
external_id:
arxiv:
- '2312.13547'
intvolume: ' 234'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://proceedings.mlr.press/v234/kurtic24a
month: '01'
oa: 1
oa_version: Preprint
page: 542-553
publication: Proceedings of Machine Learning Research
publication_identifier:
eissn:
- 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'How to prune your language model: Recovering accuracy on the "Sparsity May
Cry" benchmark'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 234
year: '2024'
...
---
_id: '13053'
abstract:
- lang: eng
text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or
quantization, before they can be deployed in practical settings. In this work
we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization
step in a principled way, in order to produce models whose local loss behavior
is stable under compression operations such as pruning. Thus, dense models trained
via CrAM should be compressible post-training, in a single step, without significant
accuracy loss. Experimental results on standard benchmarks, such as residual networks
for ImageNet classification and BERT models for language modelling, show that
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based
baselines, but which are stable under weight pruning: specifically, we can prune
models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%
with reasonable (∼1%) accuracy loss, which is competitive with gradual compression
methods. Additionally, CrAM can produce sparse models which perform well for transfer
learning, and it also works for semi-structured 2:4 pruning patterns supported
by GPU hardware. The code for reproducing the results is available at this https
URL .'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC)
under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant
agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale
de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further
acknowledge the support from the Scientific Service Units (SSU) of ISTA through
resources provided by Scientific Computing (SciComp)-"
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Adrian
full_name: Vladu, Adrian
last_name: Vladu
- first_name: Eldar
full_name: Kurtic, Eldar
id: 47beb3a5-07b5-11eb-9b87-b108ec578218
last_name: Kurtic
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. In: 11th International Conference on Learning Representations .'
apa: 'Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., & Alistarh, D.-A. (n.d.).
CrAM: A Compression-Aware Minimizer. In 11th International Conference on Learning
Representations . Kigali, Rwanda .'
chicago: 'Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert,
and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In 11th International
Conference on Learning Representations , n.d.'
ieee: 'E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM:
A Compression-Aware Minimizer,” in 11th International Conference on Learning
Representations , Kigali, Rwanda .'
ista: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. 11th International Conference on Learning Representations . ICLR: International
Conference on Learning Representations.'
mla: 'Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” 11th
International Conference on Learning Representations .'
short: E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International
Conference on Learning Representations , n.d.
conference:
end_date: 2023-05-05
location: 'Kigali, Rwanda '
name: 'ICLR: International Conference on Learning Representations'
start_date: 2023-05-01
date_created: 2023-05-23T11:36:18Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2023-06-01T12:54:45Z
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
ec_funded: 1
external_id:
arxiv:
- '2207.14200'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://openreview.net/pdf?id=_eTZBs-yedr
month: '05'
oa: 1
oa_version: Preprint
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: '11th International Conference on Learning Representations '
publication_status: accepted
quality_controlled: '1'
related_material:
record:
- id: '13074'
relation: dissertation_contains
status: public
status: public
title: 'CrAM: A Compression-Aware Minimizer'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '14460'
abstract:
- lang: eng
text: We provide an efficient implementation of the backpropagation algorithm, specialized
to the case where the weights of the neural network being trained are sparse.
Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and
common layer types (e.g., convolutional or linear). We provide a fast vectorized
implementation on commodity CPUs, and show that it can yield speedups in end-to-end
runtime experiments, both in transfer learning using already-sparsified networks,
and in training sparse networks from scratch. Thus, our results provide the first
support for sparse training on commodity hardware.
acknowledgement: 'We would like to thank Elias Frantar for his valuable assistance
and support at the outset of this project, and the anonymous ICML and SNN reviewers
for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant
agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant
805223 ScaleML. '
alternative_title:
- PMLR
article_processing_charge: No
author:
- first_name: Mahdi
full_name: Nikdan, Mahdi
id: 66374281-f394-11eb-9cf6-869147deecc0
last_name: Nikdan
- first_name: Tommaso
full_name: Pegolotti, Tommaso
last_name: Pegolotti
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Eldar
full_name: Kurtic, Eldar
id: 47beb3a5-07b5-11eb-9b87-b108ec578218
last_name: Kurtic
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient
sparse backpropagation for faster training of neural networks at the edge. In:
Proceedings of the 40th International Conference on Machine Learning. Vol
202. ML Research Press; 2023:26215-26227.'
apa: 'Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., & Alistarh, D.-A.
(2023). SparseProp: Efficient sparse backpropagation for faster training of neural
networks at the edge. In Proceedings of the 40th International Conference on
Machine Learning (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United
States: ML Research Press.'
chicago: 'Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and
Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster
Training of Neural Networks at the Edge.” In Proceedings of the 40th International
Conference on Machine Learning, 202:26215–27. ML Research Press, 2023.'
ieee: 'M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp:
Efficient sparse backpropagation for faster training of neural networks at the
edge,” in Proceedings of the 40th International Conference on Machine Learning,
Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227.'
ista: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp:
Efficient sparse backpropagation for faster training of neural networks at the
edge. Proceedings of the 40th International Conference on Machine Learning. ICML:
International Conference on Machine Learning, PMLR, vol. 202, 26215–26227.'
mla: 'Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster
Training of Neural Networks at the Edge.” Proceedings of the 40th International
Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 26215–27.'
short: M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings
of the 40th International Conference on Machine Learning, ML Research Press, 2023,
pp. 26215–26227.
conference:
end_date: 2023-07-29
location: Honolulu, Hawaii, HI, United States
name: 'ICML: International Conference on Machine Learning'
start_date: 2023-07-23
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2023-10-31T09:33:51Z
day: '30'
department:
- _id: DaAl
ec_funded: 1
external_id:
arxiv:
- '2302.04852'
intvolume: ' 202'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2302.04852
month: '07'
oa: 1
oa_version: Preprint
page: 26215-26227
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
eissn:
- 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'SparseProp: Efficient sparse backpropagation for faster training of neural
networks at the edge'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '11463'
abstract:
- lang: eng
text: "Efficiently approximating local curvature information of the loss function
is a key tool for optimization and compression of deep neural networks. Yet, most
existing methods to approximate second-order information have high computational\r\nor
storage costs, which limits their practicality. In this work, we investigate matrix-free,
linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs)
for the case when the Hessian can be approximated as a sum of rank-one matrices,
as in the classic approximation of the Hessian by the empirical Fisher matrix.
We propose two new algorithms: the first is tailored towards network compression
and can compute the IHVP for dimension d, if the Hessian is given as a sum of
m rank-one matrices, using O(dm2) precomputation, O(dm) cost for computing the
IHVP, and query cost O(m) for any single element of the inverse Hessian. The second
algorithm targets an optimization setting, where we wish to compute the product
between the inverse Hessian, estimated over a sliding window of optimization steps,
and a given gradient direction, as required for preconditioned SGD. We give an
algorithm with cost O(dm + m2) for computing the IHVP and O(dm + m3) for adding
or removing any gradient from the sliding window. These\r\ntwo algorithms yield
state-of-the-art results for network pruning and optimization with lower computational
overhead relative to existing second-order methods. Implementations are available
at [9] and [17]."
acknowledgement: We gratefully acknowledge funding the European Research Council (ERC)
under the European Union’s Horizon 2020 research and innovation programme (grant
agreement No 805223 ScaleML), as well as computational support from Amazon Web Services
(AWS) EC2.
article_processing_charge: No
author:
- first_name: Elias
full_name: Frantar, Elias
id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f
last_name: Frantar
- first_name: Eldar
full_name: Kurtic, Eldar
id: 47beb3a5-07b5-11eb-9b87-b108ec578218
last_name: Kurtic
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Frantar E, Kurtic E, Alistarh D-A. M-FAC: Efficient matrix-free approximations
of second-order information. In: 35th Conference on Neural Information Processing
Systems. Vol 34. Curran Associates; 2021:14873-14886.'
apa: 'Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free
approximations of second-order information. In 35th Conference on Neural Information
Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Curran Associates.'
chicago: 'Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient
Matrix-Free Approximations of Second-Order Information.” In 35th Conference
on Neural Information Processing Systems, 34:14873–86. Curran Associates,
2021.'
ieee: 'E. Frantar, E. Kurtic, and D.-A. Alistarh, “M-FAC: Efficient matrix-free
approximations of second-order information,” in 35th Conference on Neural Information
Processing Systems, Virtual, Online, 2021, vol. 34, pp. 14873–14886.'
ista: 'Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations
of second-order information. 35th Conference on Neural Information Processing
Systems. NeurIPS: Neural Information Processing Systems vol. 34, 14873–14886.'
mla: 'Frantar, Elias, et al. “M-FAC: Efficient Matrix-Free Approximations of Second-Order
Information.” 35th Conference on Neural Information Processing Systems,
vol. 34, Curran Associates, 2021, pp. 14873–86.'
short: E. Frantar, E. Kurtic, D.-A. Alistarh, in:, 35th Conference on Neural Information
Processing Systems, Curran Associates, 2021, pp. 14873–14886.
conference:
end_date: 2021-12-14
location: Virtual, Online
name: 'NeurIPS: Neural Information Processing Systems'
start_date: 2021-12-06
date_created: 2022-06-26T22:01:35Z
date_published: 2021-12-06T00:00:00Z
date_updated: 2022-06-27T07:05:12Z
day: '06'
department:
- _id: DaAl
ec_funded: 1
external_id:
arxiv:
- '2010.08222'
intvolume: ' 34'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://proceedings.neurips.cc/paper/2021/file/7cfd5df443b4eb0d69886a583b33de4c-Paper.pdf
month: '12'
oa: 1
oa_version: Published Version
page: 14873-14886
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: 35th Conference on Neural Information Processing Systems
publication_identifier:
isbn:
- '9781713845393'
issn:
- 1049-5258
publication_status: published
publisher: Curran Associates
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'M-FAC: Efficient matrix-free approximations of second-order information'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 34
year: '2021'
...