---
_id: '13053'
abstract:
- lang: eng
text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or
quantization, before they can be deployed in practical settings. In this work
we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization
step in a principled way, in order to produce models whose local loss behavior
is stable under compression operations such as pruning. Thus, dense models trained
via CrAM should be compressible post-training, in a single step, without significant
accuracy loss. Experimental results on standard benchmarks, such as residual networks
for ImageNet classification and BERT models for language modelling, show that
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based
baselines, but which are stable under weight pruning: specifically, we can prune
models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%
with reasonable (∼1%) accuracy loss, which is competitive with gradual compression
methods. Additionally, CrAM can produce sparse models which perform well for transfer
learning, and it also works for semi-structured 2:4 pruning patterns supported
by GPU hardware. The code for reproducing the results is available at this https
URL .'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC)
under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant
agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale
de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further
acknowledge the support from the Scientific Service Units (SSU) of ISTA through
resources provided by Scientific Computing (SciComp)-"
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Adrian
full_name: Vladu, Adrian
last_name: Vladu
- first_name: Eldar
full_name: Kurtic, Eldar
id: 47beb3a5-07b5-11eb-9b87-b108ec578218
last_name: Kurtic
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. In: 11th International Conference on Learning Representations .'
apa: 'Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., & Alistarh, D.-A. (n.d.).
CrAM: A Compression-Aware Minimizer. In 11th International Conference on Learning
Representations . Kigali, Rwanda .'
chicago: 'Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert,
and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In 11th International
Conference on Learning Representations , n.d.'
ieee: 'E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM:
A Compression-Aware Minimizer,” in 11th International Conference on Learning
Representations , Kigali, Rwanda .'
ista: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. 11th International Conference on Learning Representations . ICLR: International
Conference on Learning Representations.'
mla: 'Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” 11th
International Conference on Learning Representations .'
short: E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International
Conference on Learning Representations , n.d.
conference:
end_date: 2023-05-05
location: 'Kigali, Rwanda '
name: 'ICLR: International Conference on Learning Representations'
start_date: 2023-05-01
date_created: 2023-05-23T11:36:18Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2023-06-01T12:54:45Z
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
ec_funded: 1
external_id:
arxiv:
- '2207.14200'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://openreview.net/pdf?id=_eTZBs-yedr
month: '05'
oa: 1
oa_version: Preprint
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: '11th International Conference on Learning Representations '
publication_status: accepted
quality_controlled: '1'
related_material:
record:
- id: '13074'
relation: dissertation_contains
status: public
status: public
title: 'CrAM: A Compression-Aware Minimizer'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '13074'
abstract:
- lang: eng
text: "Deep learning has become an integral part of a large number of important
applications, and many of the recent breakthroughs have been enabled by the ability
to train very large models, capable to capture complex patterns and relationships
from the data. At the same time, the massive sizes of modern deep learning models
have made their deployment to smaller devices more challenging; this is particularly
important, as in many applications the users rely on accurate deep learning predictions,
but they only have access to devices with limited memory and compute power. One
solution to this problem is to prune neural networks, by setting as many of their
parameters as possible to zero, to obtain accurate sparse models with lower memory
footprint. Despite the great research progress in obtaining sparse models that
preserve accuracy, while satisfying memory and computational constraints, there
are still many challenges associated with efficiently training sparse models,
as well as understanding their generalization properties.\r\n\r\nThe focus of
this thesis is to investigate how the training process of sparse models can be
made more efficient, and to understand the differences between sparse and dense
models in terms of how well they can generalize to changes in the data distribution.
We first study a method for co-training sparse and dense models, at a lower cost
compared to regular training. With our method we can obtain very accurate sparse
networks, and dense models that can recover the baseline accuracy. Furthermore,
we are able to more easily analyze the differences, at prediction level, between
the sparse-dense model pairs. Next, we investigate the generalization properties
of sparse neural networks in more detail, by studying how well different sparse
models trained on a larger task can adapt to smaller, more specialized tasks,
in a transfer learning scenario. Our analysis across multiple pruning methods
and sparsity levels reveals that sparse models provide features that can transfer
similarly to or better than the dense baseline. However, the choice of the pruning
method plays an important role, and can influence the results when the features
are fixed (linear finetuning), or when they are allowed to adapt to the new task
(full finetuning). Using sparse models with fixed masks for finetuning on new
tasks has an important practical advantage, as it enables training neural networks
on smaller devices. However, one drawback of current pruning methods is that the
entire training cycle has to be repeated to obtain the initial sparse model, for
every sparsity target; in consequence, the entire training process is costly and
also multiple models need to be stored. In the last part of the thesis we propose
a method that can train accurate dense models that are compressible in a single
step, to multiple sparsity levels, without additional finetuning. Our method results
in sparse models that can be competitive with existing pruning methods, and which
can also successfully generalize to new tasks."
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
citation:
ama: Peste E-A. Efficiency and generalization of sparse neural networks. 2023. doi:10.15479/at:ista:13074
apa: Peste, E.-A. (2023). Efficiency and generalization of sparse neural networks.
Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:13074
chicago: Peste, Elena-Alexandra. “Efficiency and Generalization of Sparse Neural
Networks.” Institute of Science and Technology Austria, 2023. https://doi.org/10.15479/at:ista:13074.
ieee: E.-A. Peste, “Efficiency and generalization of sparse neural networks,” Institute
of Science and Technology Austria, 2023.
ista: Peste E-A. 2023. Efficiency and generalization of sparse neural networks.
Institute of Science and Technology Austria.
mla: Peste, Elena-Alexandra. Efficiency and Generalization of Sparse Neural Networks.
Institute of Science and Technology Austria, 2023, doi:10.15479/at:ista:13074.
short: E.-A. Peste, Efficiency and Generalization of Sparse Neural Networks, Institute
of Science and Technology Austria, 2023.
date_created: 2023-05-23T17:07:53Z
date_published: 2023-05-23T00:00:00Z
date_updated: 2023-08-04T10:33:27Z
day: '23'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
doi: 10.15479/at:ista:13074
ec_funded: 1
file:
- access_level: open_access
checksum: 6b3354968403cb9d48cc5a83611fb571
content_type: application/pdf
creator: epeste
date_created: 2023-05-24T16:11:16Z
date_updated: 2023-05-24T16:11:16Z
file_id: '13087'
file_name: PhD_Thesis_Alexandra_Peste_final.pdf
file_size: 2152072
relation: main_file
success: 1
- access_level: closed
checksum: 8d0df94bbcf4db72c991f22503b3fd60
content_type: application/zip
creator: epeste
date_created: 2023-05-24T16:12:59Z
date_updated: 2023-05-24T16:12:59Z
file_id: '13088'
file_name: PhD_Thesis_APeste.zip
file_size: 1658293
relation: source_file
file_date_updated: 2023-05-24T16:12:59Z
has_accepted_license: '1'
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
page: '147'
project:
- _id: 2564DBCA-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '665385'
name: International IST Doctoral Program
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
issn:
- 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
record:
- id: '11458'
relation: part_of_dissertation
status: public
- id: '13053'
relation: part_of_dissertation
status: public
- id: '12299'
relation: part_of_dissertation
status: public
status: public
supervisor:
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
title: Efficiency and generalization of sparse neural networks
type: dissertation
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2023'
...
---
_id: '14320'
abstract:
- lang: eng
text: The development of two-dimensional materials has resulted in a diverse range
of novel, high-quality compounds with increasing complexity. A key requirement
for a comprehensive quantitative theory is the accurate determination of these
materials' band structure parameters. However, this task is challenging due to
the intricate band structures and the indirect nature of experimental probes.
In this work, we introduce a general framework to derive band structure parameters
from experimental data using deep neural networks. We applied our method to the
penetration field capacitance measurement of trilayer graphene, an effective probe
of its density of states. First, we demonstrate that a trained deep network gives
accurate predictions for the penetration field capacitance as a function of tight-binding
parameters. Next, we use the fast and accurate predictions from the trained network
to automatically determine tight-binding parameters directly from experimental
data, with extracted parameters being in a good agreement with values in the literature.
We conclude by discussing potential applications of our method to other materials
and experimental techniques beyond penetration field capacitance.
acknowledgement: A.F.Y. acknowledges primary support from the Department of Energy
under award DE-SC0020043, and additional support from the Gordon and Betty Moore
Foundation under award GBMF9471 for group operations.
article_number: '125411'
article_processing_charge: No
article_type: original
author:
- first_name: Paul M
full_name: Henderson, Paul M
id: 13C09E74-18D9-11E9-8878-32CFE5697425
last_name: Henderson
orcid: 0000-0002-5198-7445
- first_name: Areg
full_name: Ghazaryan, Areg
id: 4AF46FD6-F248-11E8-B48F-1D18A9856A87
last_name: Ghazaryan
orcid: 0000-0001-9666-3543
- first_name: Alexander A.
full_name: Zibrov, Alexander A.
last_name: Zibrov
- first_name: Andrea F.
full_name: Young, Andrea F.
last_name: Young
- first_name: Maksym
full_name: Serbyn, Maksym
id: 47809E7E-F248-11E8-B48F-1D18A9856A87
last_name: Serbyn
orcid: 0000-0002-2399-5827
citation:
ama: 'Henderson PM, Ghazaryan A, Zibrov AA, Young AF, Serbyn M. Deep learning extraction
of band structure parameters from density of states: A case study on trilayer
graphene. Physical Review B. 2023;108(12). doi:10.1103/physrevb.108.125411'
apa: 'Henderson, P. M., Ghazaryan, A., Zibrov, A. A., Young, A. F., & Serbyn,
M. (2023). Deep learning extraction of band structure parameters from density
of states: A case study on trilayer graphene. Physical Review B. American
Physical Society. https://doi.org/10.1103/physrevb.108.125411'
chicago: 'Henderson, Paul M, Areg Ghazaryan, Alexander A. Zibrov, Andrea F. Young,
and Maksym Serbyn. “Deep Learning Extraction of Band Structure Parameters from
Density of States: A Case Study on Trilayer Graphene.” Physical Review B.
American Physical Society, 2023. https://doi.org/10.1103/physrevb.108.125411.'
ieee: 'P. M. Henderson, A. Ghazaryan, A. A. Zibrov, A. F. Young, and M. Serbyn,
“Deep learning extraction of band structure parameters from density of states:
A case study on trilayer graphene,” Physical Review B, vol. 108, no. 12.
American Physical Society, 2023.'
ista: 'Henderson PM, Ghazaryan A, Zibrov AA, Young AF, Serbyn M. 2023. Deep learning
extraction of band structure parameters from density of states: A case study on
trilayer graphene. Physical Review B. 108(12), 125411.'
mla: 'Henderson, Paul M., et al. “Deep Learning Extraction of Band Structure Parameters
from Density of States: A Case Study on Trilayer Graphene.” Physical Review
B, vol. 108, no. 12, 125411, American Physical Society, 2023, doi:10.1103/physrevb.108.125411.'
short: P.M. Henderson, A. Ghazaryan, A.A. Zibrov, A.F. Young, M. Serbyn, Physical
Review B 108 (2023).
date_created: 2023-09-12T07:12:12Z
date_published: 2023-09-15T00:00:00Z
date_updated: 2023-09-20T09:38:24Z
day: '15'
department:
- _id: MaSe
- _id: ChLa
- _id: MiLe
doi: 10.1103/physrevb.108.125411
external_id:
arxiv:
- '2210.06310'
intvolume: ' 108'
issue: '12'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2210.06310
month: '09'
oa: 1
oa_version: Preprint
publication: Physical Review B
publication_identifier:
eissn:
- 2469-9969
issn:
- 2469-9950
publication_status: published
publisher: American Physical Society
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Deep learning extraction of band structure parameters from density of states:
A case study on trilayer graphene'
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 108
year: '2023'
...
---
_id: '14410'
abstract:
- lang: eng
text: This paper focuses on the implementation details of the baseline methods and
a recent lightweight conditional model extrapolation algorithm LIMES [5] for streaming
data under class-prior shift. LIMES achieves superior performance over the baseline
methods, especially concerning the minimum-across-day accuracy, which is important
for the users of the system. In this work, the key measures to facilitate reproducibility
and enhance the credibility of the results are described.
alternative_title:
- LNCS
article_processing_charge: No
author:
- first_name: Paulina
full_name: Tomaszewska, Paulina
last_name: Tomaszewska
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Tomaszewska P, Lampert C. On the implementation of baselines and lightweight
conditional model extrapolation (LIMES) under class-prior shift. In: International
Workshop on Reproducible Research in Pattern Recognition. Vol 14068. Springer
Nature; 2023:67-73. doi:10.1007/978-3-031-40773-4_6'
apa: 'Tomaszewska, P., & Lampert, C. (2023). On the implementation of baselines
and lightweight conditional model extrapolation (LIMES) under class-prior shift.
In International Workshop on Reproducible Research in Pattern Recognition
(Vol. 14068, pp. 67–73). Montreal, Canada: Springer Nature. https://doi.org/10.1007/978-3-031-40773-4_6'
chicago: Tomaszewska, Paulina, and Christoph Lampert. “On the Implementation of Baselines
and Lightweight Conditional Model Extrapolation (LIMES) under Class-Prior Shift.”
In International Workshop on Reproducible Research in Pattern Recognition,
14068:67–73. Springer Nature, 2023. https://doi.org/10.1007/978-3-031-40773-4_6.
ieee: P. Tomaszewska and C. Lampert, “On the implementation of baselines and lightweight
conditional model extrapolation (LIMES) under class-prior shift,” in International
Workshop on Reproducible Research in Pattern Recognition, Montreal, Canada,
2023, vol. 14068, pp. 67–73.
ista: 'Tomaszewska P, Lampert C. 2023. On the implementation of baselines and lightweight
conditional model extrapolation (LIMES) under class-prior shift. International
Workshop on Reproducible Research in Pattern Recognition. RRPR: Reproducible Research
in Pattern Recognition, LNCS, vol. 14068, 67–73.'
mla: Tomaszewska, Paulina, and Christoph Lampert. “On the Implementation of Baselines
and Lightweight Conditional Model Extrapolation (LIMES) under Class-Prior Shift.”
International Workshop on Reproducible Research in Pattern Recognition,
vol. 14068, Springer Nature, 2023, pp. 67–73, doi:10.1007/978-3-031-40773-4_6.
short: P. Tomaszewska, C. Lampert, in:, International Workshop on Reproducible Research
in Pattern Recognition, Springer Nature, 2023, pp. 67–73.
conference:
end_date: 2022-08-21
location: Montreal, Canada
name: 'RRPR: Reproducible Research in Pattern Recognition'
start_date: 2022-08-21
date_created: 2023-10-08T22:01:18Z
date_published: 2023-08-20T00:00:00Z
date_updated: 2023-10-09T06:48:02Z
day: '20'
department:
- _id: ChLa
doi: 10.1007/978-3-031-40773-4_6
intvolume: ' 14068'
language:
- iso: eng
month: '08'
oa_version: None
page: 67-73
publication: International Workshop on Reproducible Research in Pattern Recognition
publication_identifier:
eissn:
- 1611-3349
isbn:
- '9783031407727'
issn:
- 0302-9743
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
scopus_import: '1'
status: public
title: On the implementation of baselines and lightweight conditional model extrapolation
(LIMES) under class-prior shift
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 14068
year: '2023'
...
---
_id: '14446'
abstract:
- lang: eng
text: Recent work has paid close attention to the first principle of Granger causality,
according to which cause precedes effect. In this context, the question may arise
whether the detected direction of causality also reverses after the time reversal
of unidirectionally coupled data. Recently, it has been shown that for unidirectionally
causally connected autoregressive (AR) processes X → Y, after time reversal of
data, the opposite causal direction Y → X is indeed detected, although typically
as part of the bidirectional X↔ Y link. As we argue here, the answer is different
when the measured data are not from AR processes but from linked deterministic
systems. When the goal is the usual forward data analysis, cross-mapping-like
approaches correctly detect X → Y, while Granger causality-like approaches, which
should not be used for deterministic time series, detect causal independence X
→ Y. The results of backward causal analysis depend on the predictability of the
reversed data. Unlike AR processes, observables from deterministic dynamical systems,
even complex nonlinear ones, can be predicted well forward, while backward predictions
can be difficult (notably when the time reversal of a function leads to one-to-many
relations). To address this problem, we propose an approach based on models that
provide multiple candidate predictions for the target, combined with a loss function
that consideres only the best candidate. The resulting good forward and backward
predictability supports the view that unidirectionally causally linked deterministic
dynamical systems X → Y can be expected to detect the same link both before and
after time reversal.
acknowledgement: The work was supported by the Scientific Grant Agency of the Ministry
of Education of the Slovak Republic and the Slovak Academy of Sciences, projects
APVV-21-0216, VEGA2-0096-21 and VEGA 2-0023-22.
article_processing_charge: Yes
article_type: original
author:
- first_name: Jozef
full_name: Jakubík, Jozef
last_name: Jakubík
- first_name: Phuong
full_name: Bui Thi Mai, Phuong
id: 3EC6EE64-F248-11E8-B48F-1D18A9856A87
last_name: Bui Thi Mai
- first_name: Martina
full_name: Chvosteková, Martina
last_name: Chvosteková
- first_name: Anna
full_name: Krakovská, Anna
last_name: Krakovská
citation:
ama: Jakubík J, Phuong M, Chvosteková M, Krakovská A. Against the flow of time with
multi-output models. Measurement Science Review. 2023;23(4):175-183. doi:10.2478/msr-2023-0023
apa: Jakubík, J., Phuong, M., Chvosteková, M., & Krakovská, A. (2023). Against
the flow of time with multi-output models. Measurement Science Review.
Sciendo. https://doi.org/10.2478/msr-2023-0023
chicago: Jakubík, Jozef, Mary Phuong, Martina Chvosteková, and Anna Krakovská. “Against
the Flow of Time with Multi-Output Models.” Measurement Science Review.
Sciendo, 2023. https://doi.org/10.2478/msr-2023-0023.
ieee: J. Jakubík, M. Phuong, M. Chvosteková, and A. Krakovská, “Against the flow
of time with multi-output models,” Measurement Science Review, vol. 23,
no. 4. Sciendo, pp. 175–183, 2023.
ista: Jakubík J, Phuong M, Chvosteková M, Krakovská A. 2023. Against the flow of
time with multi-output models. Measurement Science Review. 23(4), 175–183.
mla: Jakubík, Jozef, et al. “Against the Flow of Time with Multi-Output Models.”
Measurement Science Review, vol. 23, no. 4, Sciendo, 2023, pp. 175–83,
doi:10.2478/msr-2023-0023.
short: J. Jakubík, M. Phuong, M. Chvosteková, A. Krakovská, Measurement Science
Review 23 (2023) 175–183.
date_created: 2023-10-22T22:01:15Z
date_published: 2023-08-01T00:00:00Z
date_updated: 2023-10-31T12:12:47Z
day: '01'
ddc:
- '510'
department:
- _id: ChLa
doi: 10.2478/msr-2023-0023
file:
- access_level: open_access
checksum: b069cc10fa6a7c96b2bc9f728165f9e6
content_type: application/pdf
creator: dernst
date_created: 2023-10-31T12:07:23Z
date_updated: 2023-10-31T12:07:23Z
file_id: '14476'
file_name: 2023_MeasurementScienceRev_Jakubik.pdf
file_size: 2639783
relation: main_file
success: 1
file_date_updated: 2023-10-31T12:07:23Z
has_accepted_license: '1'
intvolume: ' 23'
issue: '4'
language:
- iso: eng
license: https://creativecommons.org/licenses/by-nc-nd/4.0/
month: '08'
oa: 1
oa_version: Published Version
page: 175-183
publication: Measurement Science Review
publication_identifier:
eissn:
- 1335-8871
publication_status: published
publisher: Sciendo
quality_controlled: '1'
scopus_import: '1'
status: public
title: Against the flow of time with multi-output models
tmp:
image: /images/cc_by_nc_nd.png
legal_code_url: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
name: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
(CC BY-NC-ND 4.0)
short: CC BY-NC-ND (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 23
year: '2023'
...
---
_id: '14771'
abstract:
- lang: eng
text: Pruning—that is, setting a significant subset of the parameters of a neural
network to zero—is one of the most popular methods of model compression. Yet,
several recent works have raised the issue that pruning may induce or exacerbate
bias in the output of the compressed model. Despite existing evidence for this
phenomenon, the relationship between neural network pruning and induced bias is
not well-understood. In this work, we systematically investigate and characterize
this phenomenon in Convolutional Neural Networks for computer vision. First, we
show that it is in fact possible to obtain highly-sparse models, e.g. with less
than 10% remaining weights, which do not decrease in accuracy nor substantially
increase in bias when compared to dense models. At the same time, we also find
that, at higher sparsities, pruned models exhibit higher uncertainty in their
outputs, as well as increased correlations, which we directly link to increased
bias. We propose easy-to-use criteria which, based only on the uncompressed model,
establish whether bias will increase with pruning, and identify the samples most
susceptible to biased predictions post-compression. Our code can be found at https://github.com/IST-DASLab/pruned-vision-model-bias.
acknowledgement: The authors would like to sincerely thank Sara Hooker for her feedback
during the development of this work. EI was supported in part by the FWF DK VGSCO,
grant agreement number W1260-N35. AP and DA acknowledge generous ERC support, via
Starting Grant 805223 ScaleML.
article_processing_charge: No
author:
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Iofinova EB, Peste E-A, Alistarh D-A. Bias in pruned vision models: In-depth
analysis and countermeasures. In: 2023 IEEE/CVF Conference on Computer Vision
and Pattern Recognition. IEEE; 2023:24364-24373. doi:10.1109/cvpr52729.2023.02334'
apa: 'Iofinova, E. B., Peste, E.-A., & Alistarh, D.-A. (2023). Bias in pruned
vision models: In-depth analysis and countermeasures. In 2023 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (pp. 24364–24373). Vancouver, BC,
Canada: IEEE. https://doi.org/10.1109/cvpr52729.2023.02334'
chicago: 'Iofinova, Eugenia B, Elena-Alexandra Peste, and Dan-Adrian Alistarh. “Bias
in Pruned Vision Models: In-Depth Analysis and Countermeasures.” In 2023 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 24364–73. IEEE, 2023.
https://doi.org/10.1109/cvpr52729.2023.02334.'
ieee: 'E. B. Iofinova, E.-A. Peste, and D.-A. Alistarh, “Bias in pruned vision models:
In-depth analysis and countermeasures,” in 2023 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 24364–24373.'
ista: 'Iofinova EB, Peste E-A, Alistarh D-A. 2023. Bias in pruned vision models:
In-depth analysis and countermeasures. 2023 IEEE/CVF Conference on Computer Vision
and Pattern Recognition. CVPR: Conference on Computer Vision and Pattern Recognition,
24364–24373.'
mla: 'Iofinova, Eugenia B., et al. “Bias in Pruned Vision Models: In-Depth Analysis
and Countermeasures.” 2023 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, IEEE, 2023, pp. 24364–73, doi:10.1109/cvpr52729.2023.02334.'
short: E.B. Iofinova, E.-A. Peste, D.-A. Alistarh, in:, 2023 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–24373.
conference:
end_date: 2023-06-24
location: Vancouver, BC, Canada
name: 'CVPR: Conference on Computer Vision and Pattern Recognition'
start_date: 2023-06-17
date_created: 2024-01-10T08:42:40Z
date_published: 2023-08-22T00:00:00Z
date_updated: 2024-01-10T08:59:26Z
day: '22'
department:
- _id: DaAl
- _id: ChLa
doi: 10.1109/cvpr52729.2023.02334
ec_funded: 1
external_id:
arxiv:
- '2304.12622'
isi:
- '001062531308068'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2304.12622
month: '08'
oa: 1
oa_version: Preprint
page: 24364-24373
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
grant_number: ' W1260-N35'
name: Vienna Graduate School on Computational Optimization
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication_identifier:
eisbn:
- '9798350301298'
eissn:
- 2575-7075
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
link:
- relation: software
url: https://github.com/IST-DASLab/pruned-vision-model-bias
status: public
title: 'Bias in pruned vision models: In-depth analysis and countermeasures'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '14921'
abstract:
- lang: eng
text: Neural collapse (NC) refers to the surprising structure of the last layer
of deep neural networks in the terminal phase of gradient descent training. Recently,
an increasing amount of experimental evidence has pointed to the propagation of
NC to earlier layers of neural networks. However, while the NC in the last layer
is well studied theoretically, much less is known about its multi-layered counterpart
- deep neural collapse (DNC). In particular, existing work focuses either on linear
layers or only on the last two layers at the price of an extra assumption. Our
paper fills this gap by generalizing the established analytical framework for
NC - the unconstrained features model - to multiple non-linear layers. Our key
technical contribution is to show that, in a deep unconstrained features model,
the unique global optimum for binary classification exhibits all the properties
typical of DNC. This explains the existing experimental evidence of DNC. We also
empirically show that (i) by optimizing deep unconstrained features models via
gradient descent, the resulting solution agrees well with our theory, and (ii)
trained networks recover the unconstrained features suitable for the occurrence
of DNC, thus supporting the validity of this modeling principle.
acknowledgement: M. M. is partially supported by the 2019 Lopez-Loreta Prize. The
authors would like to thank Eugenia Iofinova, Bernd Prach and Simone Bombari for
valuable feedback on the manuscript.
alternative_title:
- NeurIPS
article_processing_charge: No
author:
- first_name: Peter
full_name: Súkeník, Peter
id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
last_name: Súkeník
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
for the deep unconstrained features model. In: 37th Annual Conference on Neural
Information Processing Systems.'
apa: Súkeník, P., Mondelli, M., & Lampert, C. (n.d.). Deep neural collapse is
provably optimal for the deep unconstrained features model. In 37th Annual
Conference on Neural Information Processing Systems. New Orleans, LA, United
States.
chicago: Súkeník, Peter, Marco Mondelli, and Christoph Lampert. “Deep Neural Collapse
Is Provably Optimal for the Deep Unconstrained Features Model.” In 37th Annual
Conference on Neural Information Processing Systems, n.d.
ieee: P. Súkeník, M. Mondelli, and C. Lampert, “Deep neural collapse is provably
optimal for the deep unconstrained features model,” in 37th Annual Conference
on Neural Information Processing Systems, New Orleans, LA, United States.
ista: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
for the deep unconstrained features model. 37th Annual Conference on Neural Information
Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, .'
mla: Súkeník, Peter, et al. “Deep Neural Collapse Is Provably Optimal for the Deep
Unconstrained Features Model.” 37th Annual Conference on Neural Information
Processing Systems.
short: P. Súkeník, M. Mondelli, C. Lampert, in:, 37th Annual Conference on Neural
Information Processing Systems, n.d.
conference:
end_date: 2023-12-16
location: New Orleans, LA, United States
name: 'NeurIPS: Neural Information Processing Systems'
start_date: 2023-12-10
date_created: 2024-02-02T11:17:41Z
date_published: 2023-12-15T00:00:00Z
date_updated: 2024-02-06T07:53:26Z
day: '15'
department:
- _id: MaMo
- _id: ChLa
external_id:
arxiv:
- '2305.13165'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2305.13165'
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 37th Annual Conference on Neural Information Processing Systems
publication_status: inpress
quality_controlled: '1'
status: public
title: Deep neural collapse is provably optimal for the deep unconstrained features
model
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '15039'
abstract:
- lang: eng
text: 'A crucial property for achieving secure, trustworthy and interpretable deep
learning systems is their robustness: small changes to a system''s inputs should
not result in large changes to its outputs. Mathematically, this means one strives
for networks with a small Lipschitz constant. Several recent works have focused
on how to construct such Lipschitz networks, typically by imposing constraints
on the weight matrices. In this work, we study an orthogonal aspect, namely the
role of the activation function. We show that commonly used activation functions,
such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily
restrict the class of representable functions, even in the simplest one-dimensional
setting. We furthermore introduce the new N-activation function that is provably
more expressive than currently popular activation functions. We provide code at
this https URL.'
article_number: '2311.06103'
article_processing_charge: No
author:
- first_name: Bernd
full_name: Prach, Bernd
id: 2D561D42-C427-11E9-89B4-9C1AE6697425
last_name: Prach
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: Prach B, Lampert C. 1-Lipschitz neural networks are more expressive with N-activations.
arXiv. doi:10.48550/ARXIV.2311.06103
apa: Prach, B., & Lampert, C. (n.d.). 1-Lipschitz neural networks are more expressive
with N-activations. arXiv. https://doi.org/10.48550/ARXIV.2311.06103
chicago: Prach, Bernd, and Christoph Lampert. “1-Lipschitz Neural Networks Are More
Expressive with N-Activations.” ArXiv, n.d. https://doi.org/10.48550/ARXIV.2311.06103.
ieee: B. Prach and C. Lampert, “1-Lipschitz neural networks are more expressive
with N-activations,” arXiv. .
ista: Prach B, Lampert C. 1-Lipschitz neural networks are more expressive with N-activations.
arXiv, 2311.06103.
mla: Prach, Bernd, and Christoph Lampert. “1-Lipschitz Neural Networks Are More
Expressive with N-Activations.” ArXiv, 2311.06103, doi:10.48550/ARXIV.2311.06103.
short: B. Prach, C. Lampert, ArXiv (n.d.).
date_created: 2024-02-28T17:59:32Z
date_published: 2023-11-10T00:00:00Z
date_updated: 2024-03-04T07:02:39Z
day: '10'
department:
- _id: GradSch
- _id: ChLa
doi: 10.48550/ARXIV.2311.06103
external_id:
arxiv:
- '2311.06103'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2311.06103
month: '11'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: 1-Lipschitz neural networks are more expressive with N-activations
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '12660'
abstract:
- lang: eng
text: 'We present Cross-Client Label Propagation(XCLP), a new method for transductive
federated learning. XCLP estimates a data graph jointly from the data of multiple
clients and computes labels for the unlabeled data by propagating label information
across the graph. To avoid clients having to share their data with anyone, XCLP
employs two cryptographically secure protocols: secure Hamming distance computation
and secure summation. We demonstrate two distinct applications of XCLP within
federated learning. In the first, we use it in a one-shot way to predict labels
for unseen test points. In the second, we use it to repeatedly pseudo-label unlabeled
training data in a federated semi-supervised setting. Experiments on both real
federated and standard benchmark datasets show that in both applications XCLP
achieves higher classification accuracy than alternative approaches.'
article_number: '2210.06434'
article_processing_charge: No
author:
- first_name: Jonathan A
full_name: Scott, Jonathan A
id: e499926b-f6e0-11ea-865d-9c63db0031e8
last_name: Scott
- first_name: Michelle X
full_name: Yeo, Michelle X
id: 2D82B818-F248-11E8-B48F-1D18A9856A87
last_name: Yeo
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: Scott JA, Yeo MX, Lampert C. Cross-client Label Propagation for transductive
federated learning. arXiv. doi:10.48550/arXiv.2210.06434
apa: Scott, J. A., Yeo, M. X., & Lampert, C. (n.d.). Cross-client Label Propagation
for transductive federated learning. arXiv. https://doi.org/10.48550/arXiv.2210.06434
chicago: Scott, Jonathan A, Michelle X Yeo, and Christoph Lampert. “Cross-Client
Label Propagation for Transductive Federated Learning.” ArXiv, n.d. https://doi.org/10.48550/arXiv.2210.06434.
ieee: J. A. Scott, M. X. Yeo, and C. Lampert, “Cross-client Label Propagation for
transductive federated learning,” arXiv. .
ista: Scott JA, Yeo MX, Lampert C. Cross-client Label Propagation for transductive
federated learning. arXiv, 2210.06434.
mla: Scott, Jonathan A., et al. “Cross-Client Label Propagation for Transductive
Federated Learning.” ArXiv, 2210.06434, doi:10.48550/arXiv.2210.06434.
short: J.A. Scott, M.X. Yeo, C. Lampert, ArXiv (n.d.).
date_created: 2023-02-20T08:21:50Z
date_published: 2022-10-12T00:00:00Z
date_updated: 2023-02-21T08:20:18Z
day: '12'
ddc:
- '004'
department:
- _id: ChLa
doi: 10.48550/arXiv.2210.06434
external_id:
arxiv:
- '2210.06434'
file:
- access_level: open_access
checksum: 7ab20543fd4393f14fb857ce2e4f03c6
content_type: application/pdf
creator: chl
date_created: 2023-02-20T08:21:35Z
date_updated: 2023-02-20T08:21:35Z
file_id: '12661'
file_name: 2210.06434.pdf
file_size: 291893
relation: main_file
success: 1
file_date_updated: 2023-02-20T08:21:35Z
has_accepted_license: '1'
language:
- iso: eng
license: https://creativecommons.org/licenses/by/4.0/
month: '10'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: Cross-client Label Propagation for transductive federated learning
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...
---
_id: '12662'
abstract:
- lang: eng
text: 'Modern machine learning tasks often require considering not just one but
multiple objectives. For example, besides the prediction quality, this could be
the efficiency, robustness or fairness of the learned models, or any of their
combinations. Multi-objective learning offers a natural framework for handling
such problems without having to commit to early trade-offs. Surprisingly, statistical
learning theory so far offers almost no insight into the generalization properties
of multi-objective learning. In this work, we make first steps to fill this gap:
we establish foundational generalization bounds for the multi-objective setting
as well as generalization and excess bounds for learning with scalarizations.
We also provide the first theoretical analysis of the relation between the Pareto-optimal
sets of the true objectives and the Pareto-optimal sets of their empirical approximations
from training data. In particular, we show a surprising asymmetry: all Pareto-optimal
solutions can be approximated by empirically Pareto-optimal ones, but not vice
versa.'
article_number: '2208.13499'
article_processing_charge: No
author:
- first_name: Peter
full_name: Súkeník, Peter
id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
last_name: Súkeník
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: Súkeník P, Lampert C. Generalization in Multi-objective machine learning. arXiv.
doi:10.48550/arXiv.2208.13499
apa: Súkeník, P., & Lampert, C. (n.d.). Generalization in Multi-objective machine
learning. arXiv. https://doi.org/10.48550/arXiv.2208.13499
chicago: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective
Machine Learning.” ArXiv, n.d. https://doi.org/10.48550/arXiv.2208.13499.
ieee: P. Súkeník and C. Lampert, “Generalization in Multi-objective machine learning,”
arXiv. .
ista: Súkeník P, Lampert C. Generalization in Multi-objective machine learning.
arXiv, 2208.13499.
mla: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective Machine
Learning.” ArXiv, 2208.13499, doi:10.48550/arXiv.2208.13499.
short: P. Súkeník, C. Lampert, ArXiv (n.d.).
date_created: 2023-02-20T08:23:06Z
date_published: 2022-08-29T00:00:00Z
date_updated: 2023-02-21T08:24:55Z
day: '29'
ddc:
- '004'
department:
- _id: ChLa
doi: 10.48550/arXiv.2208.13499
external_id:
arxiv:
- '2208.13499'
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2208.13499'
month: '08'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: Generalization in Multi-objective machine learning
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...