How to prune your language model: Recovering accuracy on the "Sparsity May Cry" benchmark

Kurtic, Eldar; Hoefler, Torsten; Alistarh, Dan-Adrian

How to prune your language model: Recovering accuracy on the "Sparsity May Cry" benchmark

Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234, 542–553.

Download (ext.)

https://proceedings.mlr.press/v234/kurtic24a

Conference Paper | Published | English

Scopus indexed

Author

Kurtic, Eldar^ISTA; Hoefler, Torsten; Alistarh, Dan-Adrian^ISTA

Department

Alistarh Group

Series Title

PMLR

Abstract

Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach.

Publishing Year

2024

Date Published

2024-01-08

Proceedings Title

Proceedings of Machine Learning Research

Volume

234

Page

542-553

Conference

CPAL: Conference on Parsimony and Learning

Conference Location

Hongkong, China

Conference Date

2024-01-03 – 2024-01-06

eISSN

2640-3498

IST-REx-ID

15011

Cite this

Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In: Proceedings of Machine Learning Research. Vol 234. ML Research Press; 2024:542-553.

Kurtic, E., Hoefler, T., & Alistarh, D.-A. (2024). How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In Proceedings of Machine Learning Research (Vol. 234, pp. 542–553). Hongkong, China: ML Research Press.

Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” In Proceedings of Machine Learning Research, 234:542–53. ML Research Press, 2024.

E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in Proceedings of Machine Learning Research, Hongkong, China, 2024, vol. 234, pp. 542–553.

Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” Proceedings of Machine Learning Research, vol. 234, ML Research Press, 2024, pp. 542–53.

All files available under the following license(s):

Copyright Statement:

This Item is protected by copyright and/or related rights. [...]

Link(s) to Main File(s)

URL

https://proceedings.mlr.press/v234/kurtic24a

Access Level

Open Access

Export

Marked Publications

Open Data ISTA Research Explorer

Sources

arXiv 2312.13547

Search this title in

Google Scholar