--- _id: '15011' abstract: - lang: eng text: Pruning large language models (LLMs) from the BERT family has emerged as a standard compression benchmark, and several pruning methods have been proposed for this task. The recent “Sparsity May Cry” (SMC) benchmark put into question the validity of all existing methods, exhibiting a more complex setup where many known pruning methods appear to fail. We revisit the question of accurate BERT-pruning during fine-tuning on downstream datasets, and propose a set of general guidelines for successful pruning, even on the challenging SMC benchmark. First, we perform a cost-vs-benefits analysis of pruning model components, such as the embeddings and the classification head; second, we provide a simple-yet-general way of scaling training, sparsification and learning rate schedules relative to the desired target sparsity; finally, we investigate the importance of proper parametrization for Knowledge Distillation in the context of LLMs. Our simple insights lead to state-of-the-art results, both on classic BERT-pruning benchmarks, as well as on the SMC benchmark, showing that even classic gradual magnitude pruning (GMP) can yield competitive results, with the right approach. alternative_title: - PMLR article_processing_charge: No author: - first_name: Eldar full_name: Kurtic, Eldar id: 47beb3a5-07b5-11eb-9b87-b108ec578218 last_name: Kurtic - first_name: Torsten full_name: Hoefler, Torsten last_name: Hoefler - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X citation: ama: 'Kurtic E, Hoefler T, Alistarh D-A. How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In: Proceedings of Machine Learning Research. Vol 234. ML Research Press; 2024:542-553.' apa: 'Kurtic, E., Hoefler, T., & Alistarh, D.-A. (2024). How to prune your language model: Recovering accuracy on the “Sparsity May Cry” benchmark. In Proceedings of Machine Learning Research (Vol. 234, pp. 542–553). Hongkong, China: ML Research Press.' chicago: 'Kurtic, Eldar, Torsten Hoefler, and Dan-Adrian Alistarh. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” In Proceedings of Machine Learning Research, 234:542–53. ML Research Press, 2024.' ieee: 'E. Kurtic, T. Hoefler, and D.-A. Alistarh, “How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark,” in Proceedings of Machine Learning Research, Hongkong, China, 2024, vol. 234, pp. 542–553.' ista: 'Kurtic E, Hoefler T, Alistarh D-A. 2024. How to prune your language model: Recovering accuracy on the ‘Sparsity May Cry’ benchmark. Proceedings of Machine Learning Research. CPAL: Conference on Parsimony and Learning, PMLR, vol. 234, 542–553.' mla: 'Kurtic, Eldar, et al. “How to Prune Your Language Model: Recovering Accuracy on the ‘Sparsity May Cry’ Benchmark.” Proceedings of Machine Learning Research, vol. 234, ML Research Press, 2024, pp. 542–53.' short: E. Kurtic, T. Hoefler, D.-A. Alistarh, in:, Proceedings of Machine Learning Research, ML Research Press, 2024, pp. 542–553. conference: end_date: 2024-01-06 location: Hongkong, China name: 'CPAL: Conference on Parsimony and Learning' start_date: 2024-01-03 date_created: 2024-02-18T23:01:03Z date_published: 2024-01-08T00:00:00Z date_updated: 2024-02-26T10:30:52Z day: '08' department: - _id: DaAl external_id: arxiv: - '2312.13547' intvolume: ' 234' language: - iso: eng main_file_link: - open_access: '1' url: https://proceedings.mlr.press/v234/kurtic24a month: '01' oa: 1 oa_version: Preprint page: 542-553 publication: Proceedings of Machine Learning Research publication_identifier: eissn: - 2640-3498 publication_status: published publisher: ML Research Press quality_controlled: '1' scopus_import: '1' status: public title: 'How to prune your language model: Recovering accuracy on the "Sparsity May Cry" benchmark' type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 234 year: '2024' ... --- _id: '13053' abstract: - lang: eng text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90% with reasonable (∼1%) accuracy loss, which is competitive with gradual compression methods. Additionally, CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware. The code for reproducing the results is available at this https URL .' acknowledged_ssus: - _id: ScienComp acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC) under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further acknowledge the support from the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp)-" article_processing_charge: No author: - first_name: Elena-Alexandra full_name: Peste, Elena-Alexandra id: 32D78294-F248-11E8-B48F-1D18A9856A87 last_name: Peste - first_name: Adrian full_name: Vladu, Adrian last_name: Vladu - first_name: Eldar full_name: Kurtic, Eldar id: 47beb3a5-07b5-11eb-9b87-b108ec578218 last_name: Kurtic - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X citation: ama: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware Minimizer. In: 11th International Conference on Learning Representations .' apa: 'Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., & Alistarh, D.-A. (n.d.). CrAM: A Compression-Aware Minimizer. In 11th International Conference on Learning Representations . Kigali, Rwanda .' chicago: 'Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert, and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In 11th International Conference on Learning Representations , n.d.' ieee: 'E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM: A Compression-Aware Minimizer,” in 11th International Conference on Learning Representations , Kigali, Rwanda .' ista: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware Minimizer. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations.' mla: 'Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” 11th International Conference on Learning Representations .' short: E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International Conference on Learning Representations , n.d. conference: end_date: 2023-05-05 location: 'Kigali, Rwanda ' name: 'ICLR: International Conference on Learning Representations' start_date: 2023-05-01 date_created: 2023-05-23T11:36:18Z date_published: 2023-05-01T00:00:00Z date_updated: 2023-06-01T12:54:45Z department: - _id: GradSch - _id: DaAl - _id: ChLa ec_funded: 1 external_id: arxiv: - '2207.14200' language: - iso: eng main_file_link: - open_access: '1' url: https://openreview.net/pdf?id=_eTZBs-yedr month: '05' oa: 1 oa_version: Preprint project: - _id: 268A44D6-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '805223' name: Elastic Coordination for Scalable Machine Learning publication: '11th International Conference on Learning Representations ' publication_status: accepted quality_controlled: '1' related_material: record: - id: '13074' relation: dissertation_contains status: public status: public title: 'CrAM: A Compression-Aware Minimizer' type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2023' ... --- _id: '14460' abstract: - lang: eng text: We provide an efficient implementation of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware. acknowledgement: 'We would like to thank Elias Frantar for his valuable assistance and support at the outset of this project, and the anonymous ICML and SNN reviewers for very constructive feedback. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. DA acknowledges generous ERC support, via Starting Grant 805223 ScaleML. ' alternative_title: - PMLR article_processing_charge: No author: - first_name: Mahdi full_name: Nikdan, Mahdi id: 66374281-f394-11eb-9cf6-869147deecc0 last_name: Nikdan - first_name: Tommaso full_name: Pegolotti, Tommaso last_name: Pegolotti - first_name: Eugenia B full_name: Iofinova, Eugenia B id: f9a17499-f6e0-11ea-865d-fdf9a3f77117 last_name: Iofinova orcid: 0000-0002-7778-3221 - first_name: Eldar full_name: Kurtic, Eldar id: 47beb3a5-07b5-11eb-9b87-b108ec578218 last_name: Kurtic - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X citation: ama: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In: Proceedings of the 40th International Conference on Machine Learning. Vol 202. ML Research Press; 2023:26215-26227.' apa: 'Nikdan, M., Pegolotti, T., Iofinova, E. B., Kurtic, E., & Alistarh, D.-A. (2023). SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. In Proceedings of the 40th International Conference on Machine Learning (Vol. 202, pp. 26215–26227). Honolulu, Hawaii, HI, United States: ML Research Press.' chicago: 'Nikdan, Mahdi, Tommaso Pegolotti, Eugenia B Iofinova, Eldar Kurtic, and Dan-Adrian Alistarh. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” In Proceedings of the 40th International Conference on Machine Learning, 202:26215–27. ML Research Press, 2023.' ieee: 'M. Nikdan, T. Pegolotti, E. B. Iofinova, E. Kurtic, and D.-A. Alistarh, “SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge,” in Proceedings of the 40th International Conference on Machine Learning, Honolulu, Hawaii, HI, United States, 2023, vol. 202, pp. 26215–26227.' ista: 'Nikdan M, Pegolotti T, Iofinova EB, Kurtic E, Alistarh D-A. 2023. SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge. Proceedings of the 40th International Conference on Machine Learning. ICML: International Conference on Machine Learning, PMLR, vol. 202, 26215–26227.' mla: 'Nikdan, Mahdi, et al. “SparseProp: Efficient Sparse Backpropagation for Faster Training of Neural Networks at the Edge.” Proceedings of the 40th International Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 26215–27.' short: M. Nikdan, T. Pegolotti, E.B. Iofinova, E. Kurtic, D.-A. Alistarh, in:, Proceedings of the 40th International Conference on Machine Learning, ML Research Press, 2023, pp. 26215–26227. conference: end_date: 2023-07-29 location: Honolulu, Hawaii, HI, United States name: 'ICML: International Conference on Machine Learning' start_date: 2023-07-23 date_created: 2023-10-29T23:01:17Z date_published: 2023-07-30T00:00:00Z date_updated: 2023-10-31T09:33:51Z day: '30' department: - _id: DaAl ec_funded: 1 external_id: arxiv: - '2302.04852' intvolume: ' 202' language: - iso: eng main_file_link: - open_access: '1' url: https://doi.org/10.48550/arXiv.2302.04852 month: '07' oa: 1 oa_version: Preprint page: 26215-26227 project: - _id: 268A44D6-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '805223' name: Elastic Coordination for Scalable Machine Learning publication: Proceedings of the 40th International Conference on Machine Learning publication_identifier: eissn: - 2640-3498 publication_status: published publisher: ML Research Press quality_controlled: '1' scopus_import: '1' status: public title: 'SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge' type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 202 year: '2023' ... --- _id: '11463' abstract: - lang: eng text: "Efficiently approximating local curvature information of the loss function is a key tool for optimization and compression of deep neural networks. Yet, most existing methods to approximate second-order information have high computational\r\nor storage costs, which limits their practicality. In this work, we investigate matrix-free, linear-time approaches for estimating Inverse-Hessian Vector Products (IHVPs) for the case when the Hessian can be approximated as a sum of rank-one matrices, as in the classic approximation of the Hessian by the empirical Fisher matrix. We propose two new algorithms: the first is tailored towards network compression and can compute the IHVP for dimension d, if the Hessian is given as a sum of m rank-one matrices, using O(dm2) precomputation, O(dm) cost for computing the IHVP, and query cost O(m) for any single element of the inverse Hessian. The second algorithm targets an optimization setting, where we wish to compute the product between the inverse Hessian, estimated over a sliding window of optimization steps, and a given gradient direction, as required for preconditioned SGD. We give an algorithm with cost O(dm + m2) for computing the IHVP and O(dm + m3) for adding or removing any gradient from the sliding window. These\r\ntwo algorithms yield state-of-the-art results for network pruning and optimization with lower computational overhead relative to existing second-order methods. Implementations are available at [9] and [17]." acknowledgement: We gratefully acknowledge funding the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML), as well as computational support from Amazon Web Services (AWS) EC2. article_processing_charge: No author: - first_name: Elias full_name: Frantar, Elias id: 09a8f98d-ec99-11ea-ae11-c063a7b7fe5f last_name: Frantar - first_name: Eldar full_name: Kurtic, Eldar id: 47beb3a5-07b5-11eb-9b87-b108ec578218 last_name: Kurtic - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X citation: ama: 'Frantar E, Kurtic E, Alistarh D-A. M-FAC: Efficient matrix-free approximations of second-order information. In: 35th Conference on Neural Information Processing Systems. Vol 34. Curran Associates; 2021:14873-14886.' apa: 'Frantar, E., Kurtic, E., & Alistarh, D.-A. (2021). M-FAC: Efficient matrix-free approximations of second-order information. In 35th Conference on Neural Information Processing Systems (Vol. 34, pp. 14873–14886). Virtual, Online: Curran Associates.' chicago: 'Frantar, Elias, Eldar Kurtic, and Dan-Adrian Alistarh. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” In 35th Conference on Neural Information Processing Systems, 34:14873–86. Curran Associates, 2021.' ieee: 'E. Frantar, E. Kurtic, and D.-A. Alistarh, “M-FAC: Efficient matrix-free approximations of second-order information,” in 35th Conference on Neural Information Processing Systems, Virtual, Online, 2021, vol. 34, pp. 14873–14886.' ista: 'Frantar E, Kurtic E, Alistarh D-A. 2021. M-FAC: Efficient matrix-free approximations of second-order information. 35th Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems vol. 34, 14873–14886.' mla: 'Frantar, Elias, et al. “M-FAC: Efficient Matrix-Free Approximations of Second-Order Information.” 35th Conference on Neural Information Processing Systems, vol. 34, Curran Associates, 2021, pp. 14873–86.' short: E. Frantar, E. Kurtic, D.-A. Alistarh, in:, 35th Conference on Neural Information Processing Systems, Curran Associates, 2021, pp. 14873–14886. conference: end_date: 2021-12-14 location: Virtual, Online name: 'NeurIPS: Neural Information Processing Systems' start_date: 2021-12-06 date_created: 2022-06-26T22:01:35Z date_published: 2021-12-06T00:00:00Z date_updated: 2022-06-27T07:05:12Z day: '06' department: - _id: DaAl ec_funded: 1 external_id: arxiv: - '2010.08222' intvolume: ' 34' language: - iso: eng main_file_link: - open_access: '1' url: https://proceedings.neurips.cc/paper/2021/file/7cfd5df443b4eb0d69886a583b33de4c-Paper.pdf month: '12' oa: 1 oa_version: Published Version page: 14873-14886 project: - _id: 268A44D6-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '805223' name: Elastic Coordination for Scalable Machine Learning publication: 35th Conference on Neural Information Processing Systems publication_identifier: isbn: - '9781713845393' issn: - 1049-5258 publication_status: published publisher: Curran Associates quality_controlled: '1' scopus_import: '1' status: public title: 'M-FAC: Efficient matrix-free approximations of second-order information' type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 34 year: '2021' ...