---
_id: '13053'
abstract:
- lang: eng
text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or
quantization, before they can be deployed in practical settings. In this work
we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization
step in a principled way, in order to produce models whose local loss behavior
is stable under compression operations such as pruning. Thus, dense models trained
via CrAM should be compressible post-training, in a single step, without significant
accuracy loss. Experimental results on standard benchmarks, such as residual networks
for ImageNet classification and BERT models for language modelling, show that
CrAM produces dense models that can be more accurate than the standard SGD/Adam-based
baselines, but which are stable under weight pruning: specifically, we can prune
models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90%
with reasonable (∼1%) accuracy loss, which is competitive with gradual compression
methods. Additionally, CrAM can produce sparse models which perform well for transfer
learning, and it also works for semi-structured 2:4 pruning patterns supported
by GPU hardware. The code for reproducing the results is available at this https
URL .'
acknowledged_ssus:
- _id: ScienComp
acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC)
under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant
agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale
de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further
acknowledge the support from the Scientific Service Units (SSU) of ISTA through
resources provided by Scientific Computing (SciComp)-"
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Adrian
full_name: Vladu, Adrian
last_name: Vladu
- first_name: Eldar
full_name: Kurtic, Eldar
id: 47beb3a5-07b5-11eb-9b87-b108ec578218
last_name: Kurtic
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. In: 11th International Conference on Learning Representations .'
apa: 'Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., & Alistarh, D.-A. (n.d.).
CrAM: A Compression-Aware Minimizer. In 11th International Conference on Learning
Representations . Kigali, Rwanda .'
chicago: 'Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert,
and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In 11th International
Conference on Learning Representations , n.d.'
ieee: 'E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM:
A Compression-Aware Minimizer,” in 11th International Conference on Learning
Representations , Kigali, Rwanda .'
ista: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware
Minimizer. 11th International Conference on Learning Representations . ICLR: International
Conference on Learning Representations.'
mla: 'Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” 11th
International Conference on Learning Representations .'
short: E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International
Conference on Learning Representations , n.d.
conference:
end_date: 2023-05-05
location: 'Kigali, Rwanda '
name: 'ICLR: International Conference on Learning Representations'
start_date: 2023-05-01
date_created: 2023-05-23T11:36:18Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2023-06-01T12:54:45Z
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
ec_funded: 1
external_id:
arxiv:
- '2207.14200'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://openreview.net/pdf?id=_eTZBs-yedr
month: '05'
oa: 1
oa_version: Preprint
project:
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: '11th International Conference on Learning Representations '
publication_status: accepted
quality_controlled: '1'
related_material:
record:
- id: '13074'
relation: dissertation_contains
status: public
status: public
title: 'CrAM: A Compression-Aware Minimizer'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '13074'
abstract:
- lang: eng
text: "Deep learning has become an integral part of a large number of important
applications, and many of the recent breakthroughs have been enabled by the ability
to train very large models, capable to capture complex patterns and relationships
from the data. At the same time, the massive sizes of modern deep learning models
have made their deployment to smaller devices more challenging; this is particularly
important, as in many applications the users rely on accurate deep learning predictions,
but they only have access to devices with limited memory and compute power. One
solution to this problem is to prune neural networks, by setting as many of their
parameters as possible to zero, to obtain accurate sparse models with lower memory
footprint. Despite the great research progress in obtaining sparse models that
preserve accuracy, while satisfying memory and computational constraints, there
are still many challenges associated with efficiently training sparse models,
as well as understanding their generalization properties.\r\n\r\nThe focus of
this thesis is to investigate how the training process of sparse models can be
made more efficient, and to understand the differences between sparse and dense
models in terms of how well they can generalize to changes in the data distribution.
We first study a method for co-training sparse and dense models, at a lower cost
compared to regular training. With our method we can obtain very accurate sparse
networks, and dense models that can recover the baseline accuracy. Furthermore,
we are able to more easily analyze the differences, at prediction level, between
the sparse-dense model pairs. Next, we investigate the generalization properties
of sparse neural networks in more detail, by studying how well different sparse
models trained on a larger task can adapt to smaller, more specialized tasks,
in a transfer learning scenario. Our analysis across multiple pruning methods
and sparsity levels reveals that sparse models provide features that can transfer
similarly to or better than the dense baseline. However, the choice of the pruning
method plays an important role, and can influence the results when the features
are fixed (linear finetuning), or when they are allowed to adapt to the new task
(full finetuning). Using sparse models with fixed masks for finetuning on new
tasks has an important practical advantage, as it enables training neural networks
on smaller devices. However, one drawback of current pruning methods is that the
entire training cycle has to be repeated to obtain the initial sparse model, for
every sparsity target; in consequence, the entire training process is costly and
also multiple models need to be stored. In the last part of the thesis we propose
a method that can train accurate dense models that are compressible in a single
step, to multiple sparsity levels, without additional finetuning. Our method results
in sparse models that can be competitive with existing pruning methods, and which
can also successfully generalize to new tasks."
acknowledged_ssus:
- _id: ScienComp
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
citation:
ama: Peste E-A. Efficiency and generalization of sparse neural networks. 2023. doi:10.15479/at:ista:13074
apa: Peste, E.-A. (2023). Efficiency and generalization of sparse neural networks.
Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:13074
chicago: Peste, Elena-Alexandra. “Efficiency and Generalization of Sparse Neural
Networks.” Institute of Science and Technology Austria, 2023. https://doi.org/10.15479/at:ista:13074.
ieee: E.-A. Peste, “Efficiency and generalization of sparse neural networks,” Institute
of Science and Technology Austria, 2023.
ista: Peste E-A. 2023. Efficiency and generalization of sparse neural networks.
Institute of Science and Technology Austria.
mla: Peste, Elena-Alexandra. Efficiency and Generalization of Sparse Neural Networks.
Institute of Science and Technology Austria, 2023, doi:10.15479/at:ista:13074.
short: E.-A. Peste, Efficiency and Generalization of Sparse Neural Networks, Institute
of Science and Technology Austria, 2023.
date_created: 2023-05-23T17:07:53Z
date_published: 2023-05-23T00:00:00Z
date_updated: 2023-08-04T10:33:27Z
day: '23'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: DaAl
- _id: ChLa
doi: 10.15479/at:ista:13074
ec_funded: 1
file:
- access_level: open_access
checksum: 6b3354968403cb9d48cc5a83611fb571
content_type: application/pdf
creator: epeste
date_created: 2023-05-24T16:11:16Z
date_updated: 2023-05-24T16:11:16Z
file_id: '13087'
file_name: PhD_Thesis_Alexandra_Peste_final.pdf
file_size: 2152072
relation: main_file
success: 1
- access_level: closed
checksum: 8d0df94bbcf4db72c991f22503b3fd60
content_type: application/zip
creator: epeste
date_created: 2023-05-24T16:12:59Z
date_updated: 2023-05-24T16:12:59Z
file_id: '13088'
file_name: PhD_Thesis_APeste.zip
file_size: 1658293
relation: source_file
file_date_updated: 2023-05-24T16:12:59Z
has_accepted_license: '1'
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
page: '147'
project:
- _id: 2564DBCA-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '665385'
name: International IST Doctoral Program
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication_identifier:
issn:
- 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
record:
- id: '11458'
relation: part_of_dissertation
status: public
- id: '13053'
relation: part_of_dissertation
status: public
- id: '12299'
relation: part_of_dissertation
status: public
status: public
supervisor:
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
title: Efficiency and generalization of sparse neural networks
type: dissertation
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
year: '2023'
...
---
_id: '14320'
abstract:
- lang: eng
text: The development of two-dimensional materials has resulted in a diverse range
of novel, high-quality compounds with increasing complexity. A key requirement
for a comprehensive quantitative theory is the accurate determination of these
materials' band structure parameters. However, this task is challenging due to
the intricate band structures and the indirect nature of experimental probes.
In this work, we introduce a general framework to derive band structure parameters
from experimental data using deep neural networks. We applied our method to the
penetration field capacitance measurement of trilayer graphene, an effective probe
of its density of states. First, we demonstrate that a trained deep network gives
accurate predictions for the penetration field capacitance as a function of tight-binding
parameters. Next, we use the fast and accurate predictions from the trained network
to automatically determine tight-binding parameters directly from experimental
data, with extracted parameters being in a good agreement with values in the literature.
We conclude by discussing potential applications of our method to other materials
and experimental techniques beyond penetration field capacitance.
acknowledgement: A.F.Y. acknowledges primary support from the Department of Energy
under award DE-SC0020043, and additional support from the Gordon and Betty Moore
Foundation under award GBMF9471 for group operations.
article_number: '125411'
article_processing_charge: No
article_type: original
author:
- first_name: Paul M
full_name: Henderson, Paul M
id: 13C09E74-18D9-11E9-8878-32CFE5697425
last_name: Henderson
orcid: 0000-0002-5198-7445
- first_name: Areg
full_name: Ghazaryan, Areg
id: 4AF46FD6-F248-11E8-B48F-1D18A9856A87
last_name: Ghazaryan
orcid: 0000-0001-9666-3543
- first_name: Alexander A.
full_name: Zibrov, Alexander A.
last_name: Zibrov
- first_name: Andrea F.
full_name: Young, Andrea F.
last_name: Young
- first_name: Maksym
full_name: Serbyn, Maksym
id: 47809E7E-F248-11E8-B48F-1D18A9856A87
last_name: Serbyn
orcid: 0000-0002-2399-5827
citation:
ama: 'Henderson PM, Ghazaryan A, Zibrov AA, Young AF, Serbyn M. Deep learning extraction
of band structure parameters from density of states: A case study on trilayer
graphene. Physical Review B. 2023;108(12). doi:10.1103/physrevb.108.125411'
apa: 'Henderson, P. M., Ghazaryan, A., Zibrov, A. A., Young, A. F., & Serbyn,
M. (2023). Deep learning extraction of band structure parameters from density
of states: A case study on trilayer graphene. Physical Review B. American
Physical Society. https://doi.org/10.1103/physrevb.108.125411'
chicago: 'Henderson, Paul M, Areg Ghazaryan, Alexander A. Zibrov, Andrea F. Young,
and Maksym Serbyn. “Deep Learning Extraction of Band Structure Parameters from
Density of States: A Case Study on Trilayer Graphene.” Physical Review B.
American Physical Society, 2023. https://doi.org/10.1103/physrevb.108.125411.'
ieee: 'P. M. Henderson, A. Ghazaryan, A. A. Zibrov, A. F. Young, and M. Serbyn,
“Deep learning extraction of band structure parameters from density of states:
A case study on trilayer graphene,” Physical Review B, vol. 108, no. 12.
American Physical Society, 2023.'
ista: 'Henderson PM, Ghazaryan A, Zibrov AA, Young AF, Serbyn M. 2023. Deep learning
extraction of band structure parameters from density of states: A case study on
trilayer graphene. Physical Review B. 108(12), 125411.'
mla: 'Henderson, Paul M., et al. “Deep Learning Extraction of Band Structure Parameters
from Density of States: A Case Study on Trilayer Graphene.” Physical Review
B, vol. 108, no. 12, 125411, American Physical Society, 2023, doi:10.1103/physrevb.108.125411.'
short: P.M. Henderson, A. Ghazaryan, A.A. Zibrov, A.F. Young, M. Serbyn, Physical
Review B 108 (2023).
date_created: 2023-09-12T07:12:12Z
date_published: 2023-09-15T00:00:00Z
date_updated: 2023-09-20T09:38:24Z
day: '15'
department:
- _id: MaSe
- _id: ChLa
- _id: MiLe
doi: 10.1103/physrevb.108.125411
external_id:
arxiv:
- '2210.06310'
intvolume: ' 108'
issue: '12'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2210.06310
month: '09'
oa: 1
oa_version: Preprint
publication: Physical Review B
publication_identifier:
eissn:
- 2469-9969
issn:
- 2469-9950
publication_status: published
publisher: American Physical Society
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Deep learning extraction of band structure parameters from density of states:
A case study on trilayer graphene'
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 108
year: '2023'
...
---
_id: '14410'
abstract:
- lang: eng
text: This paper focuses on the implementation details of the baseline methods and
a recent lightweight conditional model extrapolation algorithm LIMES [5] for streaming
data under class-prior shift. LIMES achieves superior performance over the baseline
methods, especially concerning the minimum-across-day accuracy, which is important
for the users of the system. In this work, the key measures to facilitate reproducibility
and enhance the credibility of the results are described.
alternative_title:
- LNCS
article_processing_charge: No
author:
- first_name: Paulina
full_name: Tomaszewska, Paulina
last_name: Tomaszewska
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Tomaszewska P, Lampert C. On the implementation of baselines and lightweight
conditional model extrapolation (LIMES) under class-prior shift. In: International
Workshop on Reproducible Research in Pattern Recognition. Vol 14068. Springer
Nature; 2023:67-73. doi:10.1007/978-3-031-40773-4_6'
apa: 'Tomaszewska, P., & Lampert, C. (2023). On the implementation of baselines
and lightweight conditional model extrapolation (LIMES) under class-prior shift.
In International Workshop on Reproducible Research in Pattern Recognition
(Vol. 14068, pp. 67–73). Montreal, Canada: Springer Nature. https://doi.org/10.1007/978-3-031-40773-4_6'
chicago: Tomaszewska, Paulina, and Christoph Lampert. “On the Implementation of Baselines
and Lightweight Conditional Model Extrapolation (LIMES) under Class-Prior Shift.”
In International Workshop on Reproducible Research in Pattern Recognition,
14068:67–73. Springer Nature, 2023. https://doi.org/10.1007/978-3-031-40773-4_6.
ieee: P. Tomaszewska and C. Lampert, “On the implementation of baselines and lightweight
conditional model extrapolation (LIMES) under class-prior shift,” in International
Workshop on Reproducible Research in Pattern Recognition, Montreal, Canada,
2023, vol. 14068, pp. 67–73.
ista: 'Tomaszewska P, Lampert C. 2023. On the implementation of baselines and lightweight
conditional model extrapolation (LIMES) under class-prior shift. International
Workshop on Reproducible Research in Pattern Recognition. RRPR: Reproducible Research
in Pattern Recognition, LNCS, vol. 14068, 67–73.'
mla: Tomaszewska, Paulina, and Christoph Lampert. “On the Implementation of Baselines
and Lightweight Conditional Model Extrapolation (LIMES) under Class-Prior Shift.”
International Workshop on Reproducible Research in Pattern Recognition,
vol. 14068, Springer Nature, 2023, pp. 67–73, doi:10.1007/978-3-031-40773-4_6.
short: P. Tomaszewska, C. Lampert, in:, International Workshop on Reproducible Research
in Pattern Recognition, Springer Nature, 2023, pp. 67–73.
conference:
end_date: 2022-08-21
location: Montreal, Canada
name: 'RRPR: Reproducible Research in Pattern Recognition'
start_date: 2022-08-21
date_created: 2023-10-08T22:01:18Z
date_published: 2023-08-20T00:00:00Z
date_updated: 2023-10-09T06:48:02Z
day: '20'
department:
- _id: ChLa
doi: 10.1007/978-3-031-40773-4_6
intvolume: ' 14068'
language:
- iso: eng
month: '08'
oa_version: None
page: 67-73
publication: International Workshop on Reproducible Research in Pattern Recognition
publication_identifier:
eissn:
- 1611-3349
isbn:
- '9783031407727'
issn:
- 0302-9743
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
scopus_import: '1'
status: public
title: On the implementation of baselines and lightweight conditional model extrapolation
(LIMES) under class-prior shift
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 14068
year: '2023'
...
---
_id: '14446'
abstract:
- lang: eng
text: Recent work has paid close attention to the first principle of Granger causality,
according to which cause precedes effect. In this context, the question may arise
whether the detected direction of causality also reverses after the time reversal
of unidirectionally coupled data. Recently, it has been shown that for unidirectionally
causally connected autoregressive (AR) processes X → Y, after time reversal of
data, the opposite causal direction Y → X is indeed detected, although typically
as part of the bidirectional X↔ Y link. As we argue here, the answer is different
when the measured data are not from AR processes but from linked deterministic
systems. When the goal is the usual forward data analysis, cross-mapping-like
approaches correctly detect X → Y, while Granger causality-like approaches, which
should not be used for deterministic time series, detect causal independence X
→ Y. The results of backward causal analysis depend on the predictability of the
reversed data. Unlike AR processes, observables from deterministic dynamical systems,
even complex nonlinear ones, can be predicted well forward, while backward predictions
can be difficult (notably when the time reversal of a function leads to one-to-many
relations). To address this problem, we propose an approach based on models that
provide multiple candidate predictions for the target, combined with a loss function
that consideres only the best candidate. The resulting good forward and backward
predictability supports the view that unidirectionally causally linked deterministic
dynamical systems X → Y can be expected to detect the same link both before and
after time reversal.
acknowledgement: The work was supported by the Scientific Grant Agency of the Ministry
of Education of the Slovak Republic and the Slovak Academy of Sciences, projects
APVV-21-0216, VEGA2-0096-21 and VEGA 2-0023-22.
article_processing_charge: Yes
article_type: original
author:
- first_name: Jozef
full_name: Jakubík, Jozef
last_name: Jakubík
- first_name: Phuong
full_name: Bui Thi Mai, Phuong
id: 3EC6EE64-F248-11E8-B48F-1D18A9856A87
last_name: Bui Thi Mai
- first_name: Martina
full_name: Chvosteková, Martina
last_name: Chvosteková
- first_name: Anna
full_name: Krakovská, Anna
last_name: Krakovská
citation:
ama: Jakubík J, Phuong M, Chvosteková M, Krakovská A. Against the flow of time with
multi-output models. Measurement Science Review. 2023;23(4):175-183. doi:10.2478/msr-2023-0023
apa: Jakubík, J., Phuong, M., Chvosteková, M., & Krakovská, A. (2023). Against
the flow of time with multi-output models. Measurement Science Review.
Sciendo. https://doi.org/10.2478/msr-2023-0023
chicago: Jakubík, Jozef, Mary Phuong, Martina Chvosteková, and Anna Krakovská. “Against
the Flow of Time with Multi-Output Models.” Measurement Science Review.
Sciendo, 2023. https://doi.org/10.2478/msr-2023-0023.
ieee: J. Jakubík, M. Phuong, M. Chvosteková, and A. Krakovská, “Against the flow
of time with multi-output models,” Measurement Science Review, vol. 23,
no. 4. Sciendo, pp. 175–183, 2023.
ista: Jakubík J, Phuong M, Chvosteková M, Krakovská A. 2023. Against the flow of
time with multi-output models. Measurement Science Review. 23(4), 175–183.
mla: Jakubík, Jozef, et al. “Against the Flow of Time with Multi-Output Models.”
Measurement Science Review, vol. 23, no. 4, Sciendo, 2023, pp. 175–83,
doi:10.2478/msr-2023-0023.
short: J. Jakubík, M. Phuong, M. Chvosteková, A. Krakovská, Measurement Science
Review 23 (2023) 175–183.
date_created: 2023-10-22T22:01:15Z
date_published: 2023-08-01T00:00:00Z
date_updated: 2023-10-31T12:12:47Z
day: '01'
ddc:
- '510'
department:
- _id: ChLa
doi: 10.2478/msr-2023-0023
file:
- access_level: open_access
checksum: b069cc10fa6a7c96b2bc9f728165f9e6
content_type: application/pdf
creator: dernst
date_created: 2023-10-31T12:07:23Z
date_updated: 2023-10-31T12:07:23Z
file_id: '14476'
file_name: 2023_MeasurementScienceRev_Jakubik.pdf
file_size: 2639783
relation: main_file
success: 1
file_date_updated: 2023-10-31T12:07:23Z
has_accepted_license: '1'
intvolume: ' 23'
issue: '4'
language:
- iso: eng
license: https://creativecommons.org/licenses/by-nc-nd/4.0/
month: '08'
oa: 1
oa_version: Published Version
page: 175-183
publication: Measurement Science Review
publication_identifier:
eissn:
- 1335-8871
publication_status: published
publisher: Sciendo
quality_controlled: '1'
scopus_import: '1'
status: public
title: Against the flow of time with multi-output models
tmp:
image: /images/cc_by_nc_nd.png
legal_code_url: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode
name: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
(CC BY-NC-ND 4.0)
short: CC BY-NC-ND (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 23
year: '2023'
...
---
_id: '14771'
abstract:
- lang: eng
text: Pruning—that is, setting a significant subset of the parameters of a neural
network to zero—is one of the most popular methods of model compression. Yet,
several recent works have raised the issue that pruning may induce or exacerbate
bias in the output of the compressed model. Despite existing evidence for this
phenomenon, the relationship between neural network pruning and induced bias is
not well-understood. In this work, we systematically investigate and characterize
this phenomenon in Convolutional Neural Networks for computer vision. First, we
show that it is in fact possible to obtain highly-sparse models, e.g. with less
than 10% remaining weights, which do not decrease in accuracy nor substantially
increase in bias when compared to dense models. At the same time, we also find
that, at higher sparsities, pruned models exhibit higher uncertainty in their
outputs, as well as increased correlations, which we directly link to increased
bias. We propose easy-to-use criteria which, based only on the uncompressed model,
establish whether bias will increase with pruning, and identify the samples most
susceptible to biased predictions post-compression. Our code can be found at https://github.com/IST-DASLab/pruned-vision-model-bias.
acknowledgement: The authors would like to sincerely thank Sara Hooker for her feedback
during the development of this work. EI was supported in part by the FWF DK VGSCO,
grant agreement number W1260-N35. AP and DA acknowledge generous ERC support, via
Starting Grant 805223 ScaleML.
article_processing_charge: No
author:
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Iofinova EB, Peste E-A, Alistarh D-A. Bias in pruned vision models: In-depth
analysis and countermeasures. In: 2023 IEEE/CVF Conference on Computer Vision
and Pattern Recognition. IEEE; 2023:24364-24373. doi:10.1109/cvpr52729.2023.02334'
apa: 'Iofinova, E. B., Peste, E.-A., & Alistarh, D.-A. (2023). Bias in pruned
vision models: In-depth analysis and countermeasures. In 2023 IEEE/CVF Conference
on Computer Vision and Pattern Recognition (pp. 24364–24373). Vancouver, BC,
Canada: IEEE. https://doi.org/10.1109/cvpr52729.2023.02334'
chicago: 'Iofinova, Eugenia B, Elena-Alexandra Peste, and Dan-Adrian Alistarh. “Bias
in Pruned Vision Models: In-Depth Analysis and Countermeasures.” In 2023 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 24364–73. IEEE, 2023.
https://doi.org/10.1109/cvpr52729.2023.02334.'
ieee: 'E. B. Iofinova, E.-A. Peste, and D.-A. Alistarh, “Bias in pruned vision models:
In-depth analysis and countermeasures,” in 2023 IEEE/CVF Conference on Computer
Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 24364–24373.'
ista: 'Iofinova EB, Peste E-A, Alistarh D-A. 2023. Bias in pruned vision models:
In-depth analysis and countermeasures. 2023 IEEE/CVF Conference on Computer Vision
and Pattern Recognition. CVPR: Conference on Computer Vision and Pattern Recognition,
24364–24373.'
mla: 'Iofinova, Eugenia B., et al. “Bias in Pruned Vision Models: In-Depth Analysis
and Countermeasures.” 2023 IEEE/CVF Conference on Computer Vision and Pattern
Recognition, IEEE, 2023, pp. 24364–73, doi:10.1109/cvpr52729.2023.02334.'
short: E.B. Iofinova, E.-A. Peste, D.-A. Alistarh, in:, 2023 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–24373.
conference:
end_date: 2023-06-24
location: Vancouver, BC, Canada
name: 'CVPR: Conference on Computer Vision and Pattern Recognition'
start_date: 2023-06-17
date_created: 2024-01-10T08:42:40Z
date_published: 2023-08-22T00:00:00Z
date_updated: 2024-01-10T08:59:26Z
day: '22'
department:
- _id: DaAl
- _id: ChLa
doi: 10.1109/cvpr52729.2023.02334
ec_funded: 1
external_id:
arxiv:
- '2304.12622'
isi:
- '001062531308068'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2304.12622
month: '08'
oa: 1
oa_version: Preprint
page: 24364-24373
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
grant_number: ' W1260-N35'
name: Vienna Graduate School on Computational Optimization
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication_identifier:
eisbn:
- '9798350301298'
eissn:
- 2575-7075
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
link:
- relation: software
url: https://github.com/IST-DASLab/pruned-vision-model-bias
status: public
title: 'Bias in pruned vision models: In-depth analysis and countermeasures'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '14921'
abstract:
- lang: eng
text: Neural collapse (NC) refers to the surprising structure of the last layer
of deep neural networks in the terminal phase of gradient descent training. Recently,
an increasing amount of experimental evidence has pointed to the propagation of
NC to earlier layers of neural networks. However, while the NC in the last layer
is well studied theoretically, much less is known about its multi-layered counterpart
- deep neural collapse (DNC). In particular, existing work focuses either on linear
layers or only on the last two layers at the price of an extra assumption. Our
paper fills this gap by generalizing the established analytical framework for
NC - the unconstrained features model - to multiple non-linear layers. Our key
technical contribution is to show that, in a deep unconstrained features model,
the unique global optimum for binary classification exhibits all the properties
typical of DNC. This explains the existing experimental evidence of DNC. We also
empirically show that (i) by optimizing deep unconstrained features models via
gradient descent, the resulting solution agrees well with our theory, and (ii)
trained networks recover the unconstrained features suitable for the occurrence
of DNC, thus supporting the validity of this modeling principle.
acknowledgement: M. M. is partially supported by the 2019 Lopez-Loreta Prize. The
authors would like to thank Eugenia Iofinova, Bernd Prach and Simone Bombari for
valuable feedback on the manuscript.
alternative_title:
- NeurIPS
article_processing_charge: No
author:
- first_name: Peter
full_name: Súkeník, Peter
id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
last_name: Súkeník
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
for the deep unconstrained features model. In: 37th Annual Conference on Neural
Information Processing Systems.'
apa: Súkeník, P., Mondelli, M., & Lampert, C. (n.d.). Deep neural collapse is
provably optimal for the deep unconstrained features model. In 37th Annual
Conference on Neural Information Processing Systems. New Orleans, LA, United
States.
chicago: Súkeník, Peter, Marco Mondelli, and Christoph Lampert. “Deep Neural Collapse
Is Provably Optimal for the Deep Unconstrained Features Model.” In 37th Annual
Conference on Neural Information Processing Systems, n.d.
ieee: P. Súkeník, M. Mondelli, and C. Lampert, “Deep neural collapse is provably
optimal for the deep unconstrained features model,” in 37th Annual Conference
on Neural Information Processing Systems, New Orleans, LA, United States.
ista: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
for the deep unconstrained features model. 37th Annual Conference on Neural Information
Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, .'
mla: Súkeník, Peter, et al. “Deep Neural Collapse Is Provably Optimal for the Deep
Unconstrained Features Model.” 37th Annual Conference on Neural Information
Processing Systems.
short: P. Súkeník, M. Mondelli, C. Lampert, in:, 37th Annual Conference on Neural
Information Processing Systems, n.d.
conference:
end_date: 2023-12-16
location: New Orleans, LA, United States
name: 'NeurIPS: Neural Information Processing Systems'
start_date: 2023-12-10
date_created: 2024-02-02T11:17:41Z
date_published: 2023-12-15T00:00:00Z
date_updated: 2024-02-06T07:53:26Z
day: '15'
department:
- _id: MaMo
- _id: ChLa
external_id:
arxiv:
- '2305.13165'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2305.13165'
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 37th Annual Conference on Neural Information Processing Systems
publication_status: inpress
quality_controlled: '1'
status: public
title: Deep neural collapse is provably optimal for the deep unconstrained features
model
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '15039'
abstract:
- lang: eng
text: 'A crucial property for achieving secure, trustworthy and interpretable deep
learning systems is their robustness: small changes to a system''s inputs should
not result in large changes to its outputs. Mathematically, this means one strives
for networks with a small Lipschitz constant. Several recent works have focused
on how to construct such Lipschitz networks, typically by imposing constraints
on the weight matrices. In this work, we study an orthogonal aspect, namely the
role of the activation function. We show that commonly used activation functions,
such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily
restrict the class of representable functions, even in the simplest one-dimensional
setting. We furthermore introduce the new N-activation function that is provably
more expressive than currently popular activation functions. We provide code at
this https URL.'
article_number: '2311.06103'
article_processing_charge: No
author:
- first_name: Bernd
full_name: Prach, Bernd
id: 2D561D42-C427-11E9-89B4-9C1AE6697425
last_name: Prach
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: Prach B, Lampert C. 1-Lipschitz neural networks are more expressive with N-activations.
arXiv. doi:10.48550/ARXIV.2311.06103
apa: Prach, B., & Lampert, C. (n.d.). 1-Lipschitz neural networks are more expressive
with N-activations. arXiv. https://doi.org/10.48550/ARXIV.2311.06103
chicago: Prach, Bernd, and Christoph Lampert. “1-Lipschitz Neural Networks Are More
Expressive with N-Activations.” ArXiv, n.d. https://doi.org/10.48550/ARXIV.2311.06103.
ieee: B. Prach and C. Lampert, “1-Lipschitz neural networks are more expressive
with N-activations,” arXiv. .
ista: Prach B, Lampert C. 1-Lipschitz neural networks are more expressive with N-activations.
arXiv, 2311.06103.
mla: Prach, Bernd, and Christoph Lampert. “1-Lipschitz Neural Networks Are More
Expressive with N-Activations.” ArXiv, 2311.06103, doi:10.48550/ARXIV.2311.06103.
short: B. Prach, C. Lampert, ArXiv (n.d.).
date_created: 2024-02-28T17:59:32Z
date_published: 2023-11-10T00:00:00Z
date_updated: 2024-03-04T07:02:39Z
day: '10'
department:
- _id: GradSch
- _id: ChLa
doi: 10.48550/ARXIV.2311.06103
external_id:
arxiv:
- '2311.06103'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2311.06103
month: '11'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: 1-Lipschitz neural networks are more expressive with N-activations
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '12660'
abstract:
- lang: eng
text: 'We present Cross-Client Label Propagation(XCLP), a new method for transductive
federated learning. XCLP estimates a data graph jointly from the data of multiple
clients and computes labels for the unlabeled data by propagating label information
across the graph. To avoid clients having to share their data with anyone, XCLP
employs two cryptographically secure protocols: secure Hamming distance computation
and secure summation. We demonstrate two distinct applications of XCLP within
federated learning. In the first, we use it in a one-shot way to predict labels
for unseen test points. In the second, we use it to repeatedly pseudo-label unlabeled
training data in a federated semi-supervised setting. Experiments on both real
federated and standard benchmark datasets show that in both applications XCLP
achieves higher classification accuracy than alternative approaches.'
article_number: '2210.06434'
article_processing_charge: No
author:
- first_name: Jonathan A
full_name: Scott, Jonathan A
id: e499926b-f6e0-11ea-865d-9c63db0031e8
last_name: Scott
- first_name: Michelle X
full_name: Yeo, Michelle X
id: 2D82B818-F248-11E8-B48F-1D18A9856A87
last_name: Yeo
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: Scott JA, Yeo MX, Lampert C. Cross-client Label Propagation for transductive
federated learning. arXiv. doi:10.48550/arXiv.2210.06434
apa: Scott, J. A., Yeo, M. X., & Lampert, C. (n.d.). Cross-client Label Propagation
for transductive federated learning. arXiv. https://doi.org/10.48550/arXiv.2210.06434
chicago: Scott, Jonathan A, Michelle X Yeo, and Christoph Lampert. “Cross-Client
Label Propagation for Transductive Federated Learning.” ArXiv, n.d. https://doi.org/10.48550/arXiv.2210.06434.
ieee: J. A. Scott, M. X. Yeo, and C. Lampert, “Cross-client Label Propagation for
transductive federated learning,” arXiv. .
ista: Scott JA, Yeo MX, Lampert C. Cross-client Label Propagation for transductive
federated learning. arXiv, 2210.06434.
mla: Scott, Jonathan A., et al. “Cross-Client Label Propagation for Transductive
Federated Learning.” ArXiv, 2210.06434, doi:10.48550/arXiv.2210.06434.
short: J.A. Scott, M.X. Yeo, C. Lampert, ArXiv (n.d.).
date_created: 2023-02-20T08:21:50Z
date_published: 2022-10-12T00:00:00Z
date_updated: 2023-02-21T08:20:18Z
day: '12'
ddc:
- '004'
department:
- _id: ChLa
doi: 10.48550/arXiv.2210.06434
external_id:
arxiv:
- '2210.06434'
file:
- access_level: open_access
checksum: 7ab20543fd4393f14fb857ce2e4f03c6
content_type: application/pdf
creator: chl
date_created: 2023-02-20T08:21:35Z
date_updated: 2023-02-20T08:21:35Z
file_id: '12661'
file_name: 2210.06434.pdf
file_size: 291893
relation: main_file
success: 1
file_date_updated: 2023-02-20T08:21:35Z
has_accepted_license: '1'
language:
- iso: eng
month: '10'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: Cross-client Label Propagation for transductive federated learning
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...
---
_id: '12662'
abstract:
- lang: eng
text: 'Modern machine learning tasks often require considering not just one but
multiple objectives. For example, besides the prediction quality, this could be
the efficiency, robustness or fairness of the learned models, or any of their
combinations. Multi-objective learning offers a natural framework for handling
such problems without having to commit to early trade-offs. Surprisingly, statistical
learning theory so far offers almost no insight into the generalization properties
of multi-objective learning. In this work, we make first steps to fill this gap:
we establish foundational generalization bounds for the multi-objective setting
as well as generalization and excess bounds for learning with scalarizations.
We also provide the first theoretical analysis of the relation between the Pareto-optimal
sets of the true objectives and the Pareto-optimal sets of their empirical approximations
from training data. In particular, we show a surprising asymmetry: all Pareto-optimal
solutions can be approximated by empirically Pareto-optimal ones, but not vice
versa.'
article_number: '2208.13499'
article_processing_charge: No
author:
- first_name: Peter
full_name: Súkeník, Peter
id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
last_name: Súkeník
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: Súkeník P, Lampert C. Generalization in Multi-objective machine learning. arXiv.
doi:10.48550/arXiv.2208.13499
apa: Súkeník, P., & Lampert, C. (n.d.). Generalization in Multi-objective machine
learning. arXiv. https://doi.org/10.48550/arXiv.2208.13499
chicago: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective
Machine Learning.” ArXiv, n.d. https://doi.org/10.48550/arXiv.2208.13499.
ieee: P. Súkeník and C. Lampert, “Generalization in Multi-objective machine learning,”
arXiv. .
ista: Súkeník P, Lampert C. Generalization in Multi-objective machine learning.
arXiv, 2208.13499.
mla: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective Machine
Learning.” ArXiv, 2208.13499, doi:10.48550/arXiv.2208.13499.
short: P. Súkeník, C. Lampert, ArXiv (n.d.).
date_created: 2023-02-20T08:23:06Z
date_published: 2022-08-29T00:00:00Z
date_updated: 2023-02-21T08:24:55Z
day: '29'
ddc:
- '004'
department:
- _id: ChLa
doi: 10.48550/arXiv.2208.13499
external_id:
arxiv:
- '2208.13499'
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2208.13499'
month: '08'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: submitted
status: public
title: Generalization in Multi-objective machine learning
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...
---
_id: '12495'
abstract:
- lang: eng
text: "Fairness-aware learning aims at constructing classifiers that not only make
accurate predictions, but also do not discriminate against specific groups. It
is a fast-growing area of\r\nmachine learning with far-reaching societal impact.
However, existing fair learning methods\r\nare vulnerable to accidental or malicious
artifacts in the training data, which can cause\r\nthem to unknowingly produce
unfair classifiers. In this work we address the problem of\r\nfair learning from
unreliable training data in the robust multisource setting, where the\r\navailable
training data comes from multiple sources, a fraction of which might not be representative
of the true data distribution. We introduce FLEA, a filtering-based algorithm\r\nthat
identifies and suppresses those data sources that would have a negative impact
on\r\nfairness or accuracy if they were used for training. As such, FLEA is not
a replacement of\r\nprior fairness-aware learning methods but rather an augmentation
that makes any of them\r\nrobust against unreliable training data. We show the
effectiveness of our approach by a\r\ndiverse range of experiments on multiple
datasets. Additionally, we prove formally that\r\n–given enough data– FLEA protects
the learner against corruptions as long as the fraction of\r\naffected data sources
is less than half. Our source code and documentation are available at\r\nhttps://github.com/ISTAustria-CVML/FLEA."
acknowledged_ssus:
- _id: ScienComp
acknowledgement: 'The authors would like to thank Bernd Prach, Elias Frantar, Alexandra
Peste, Mahdi Nikdan, and Peter Súkeník for their helpful feedback. This research
was supported by the Scientific Service Units (SSU) of IST Austria through resources
provided by Scientific Computing (SciComp). This publication was made possible by
an ETH AI Center postdoctoral fellowship granted to Nikola Konstantinov. Eugenia
Iofinova was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. '
article_processing_charge: No
article_type: original
author:
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Nikola H
full_name: Konstantinov, Nikola H
id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87
last_name: Konstantinov
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Iofinova EB, Konstantinov NH, Lampert C. FLEA: Provably robust fair multisource
learning from unreliable training data. Transactions on Machine Learning Research.
2022.'
apa: 'Iofinova, E. B., Konstantinov, N. H., & Lampert, C. (2022). FLEA: Provably
robust fair multisource learning from unreliable training data. Transactions
on Machine Learning Research. ML Research Press.'
chicago: 'Iofinova, Eugenia B, Nikola H Konstantinov, and Christoph Lampert. “FLEA:
Provably Robust Fair Multisource Learning from Unreliable Training Data.” Transactions
on Machine Learning Research. ML Research Press, 2022.'
ieee: 'E. B. Iofinova, N. H. Konstantinov, and C. Lampert, “FLEA: Provably robust
fair multisource learning from unreliable training data,” Transactions on Machine
Learning Research. ML Research Press, 2022.'
ista: 'Iofinova EB, Konstantinov NH, Lampert C. 2022. FLEA: Provably robust fair
multisource learning from unreliable training data. Transactions on Machine Learning
Research.'
mla: 'Iofinova, Eugenia B., et al. “FLEA: Provably Robust Fair Multisource Learning
from Unreliable Training Data.” Transactions on Machine Learning Research,
ML Research Press, 2022.'
short: E.B. Iofinova, N.H. Konstantinov, C. Lampert, Transactions on Machine Learning
Research (2022).
date_created: 2023-02-02T20:29:57Z
date_published: 2022-12-22T00:00:00Z
date_updated: 2023-02-23T10:30:54Z
day: '22'
ddc:
- '000'
department:
- _id: ChLa
external_id:
arxiv:
- '2106.11732'
file:
- access_level: open_access
checksum: 97c8a8470759cab597abb973ca137a3b
content_type: application/pdf
creator: dernst
date_created: 2023-02-23T10:30:04Z
date_updated: 2023-02-23T10:30:04Z
file_id: '12673'
file_name: 2022_TMLR_Iofinova.pdf
file_size: 1948063
relation: main_file
success: 1
file_date_updated: 2023-02-23T10:30:04Z
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://openreview.net/forum?id=XsPopigZXV
month: '12'
oa: 1
oa_version: Published Version
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
grant_number: ' W1260-N35'
name: Vienna Graduate School on Computational Optimization
publication: Transactions on Machine Learning Research
publication_identifier:
issn:
- 2835-8856
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
link:
- description: source code
relation: software
url: https://github.com/ISTAustria-CVML/FLEA
status: public
title: 'FLEA: Provably robust fair multisource learning from unreliable training data'
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...
---
_id: '11839'
abstract:
- lang: eng
text: "It is a highly desirable property for deep networks to be robust against\r\nsmall
input changes. One popular way to achieve this property is by designing\r\nnetworks
with a small Lipschitz constant. In this work, we propose a new\r\ntechnique for
constructing such Lipschitz networks that has a number of\r\ndesirable properties:
it can be applied to any linear network layer\r\n(fully-connected or convolutional),
it provides formal guarantees on the\r\nLipschitz constant, it is easy to implement
and efficient to run, and it can be\r\ncombined with any training objective and
optimization method. In fact, our\r\ntechnique is the first one in the literature
that achieves all of these\r\nproperties simultaneously. Our main contribution
is a rescaling-based weight\r\nmatrix parametrization that guarantees each network
layer to have a Lipschitz\r\nconstant of at most 1 and results in the learned
weight matrices to be close to\r\northogonal. Hence we call such layers almost-orthogonal
Lipschitz (AOL).\r\nExperiments and ablation studies in the context of image classification
with\r\ncertified robust accuracy confirm that AOL layers achieve results that
are on\r\npar with most existing methods. Yet, they are simpler to implement and
more\r\nbroadly applicable, because they do not require computationally expensive\r\nmatrix
orthogonalization or inversion steps as part of the network\r\narchitecture. We
provide code at https://github.com/berndprach/AOL."
alternative_title:
- LNCS
article_processing_charge: No
author:
- first_name: Bernd
full_name: Prach, Bernd
id: 2D561D42-C427-11E9-89B4-9C1AE6697425
last_name: Prach
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Prach B, Lampert C. Almost-orthogonal layers for efficient general-purpose
Lipschitz networks. In: Computer Vision – ECCV 2022. Vol 13681. Springer
Nature; 2022:350-365. doi:10.1007/978-3-031-19803-8_21'
apa: 'Prach, B., & Lampert, C. (2022). Almost-orthogonal layers for efficient
general-purpose Lipschitz networks. In Computer Vision – ECCV 2022 (Vol.
13681, pp. 350–365). Tel Aviv, Israel: Springer Nature. https://doi.org/10.1007/978-3-031-19803-8_21'
chicago: Prach, Bernd, and Christoph Lampert. “Almost-Orthogonal Layers for Efficient
General-Purpose Lipschitz Networks.” In Computer Vision – ECCV 2022, 13681:350–65.
Springer Nature, 2022. https://doi.org/10.1007/978-3-031-19803-8_21.
ieee: B. Prach and C. Lampert, “Almost-orthogonal layers for efficient general-purpose
Lipschitz networks,” in Computer Vision – ECCV 2022, Tel Aviv, Israel,
2022, vol. 13681, pp. 350–365.
ista: 'Prach B, Lampert C. 2022. Almost-orthogonal layers for efficient general-purpose
Lipschitz networks. Computer Vision – ECCV 2022. ECCV: European Conference on
Computer Vision, LNCS, vol. 13681, 350–365.'
mla: Prach, Bernd, and Christoph Lampert. “Almost-Orthogonal Layers for Efficient
General-Purpose Lipschitz Networks.” Computer Vision – ECCV 2022, vol.
13681, Springer Nature, 2022, pp. 350–65, doi:10.1007/978-3-031-19803-8_21.
short: B. Prach, C. Lampert, in:, Computer Vision – ECCV 2022, Springer Nature,
2022, pp. 350–365.
conference:
end_date: 2022-10-27
location: Tel Aviv, Israel
name: 'ECCV: European Conference on Computer Vision'
start_date: 2022-10-23
date_created: 2022-08-12T15:09:47Z
date_published: 2022-10-23T00:00:00Z
date_updated: 2023-05-03T08:00:46Z
day: '23'
department:
- _id: GradSch
- _id: ChLa
doi: 10.1007/978-3-031-19803-8_21
external_id:
arxiv:
- '2208.03160'
intvolume: ' 13681'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2208.03160'
month: '10'
oa: 1
oa_version: Preprint
page: 350-365
publication: Computer Vision – ECCV 2022
publication_identifier:
eisbn:
- '9783031198038'
isbn:
- '9783031198021'
publication_status: published
publisher: Springer Nature
quality_controlled: '1'
scopus_import: '1'
status: public
title: Almost-orthogonal layers for efficient general-purpose Lipschitz networks
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 13681
year: '2022'
...
---
_id: '10752'
abstract:
- lang: eng
text: 'The digitalization of almost all aspects of our everyday lives has led to
unprecedented amounts of data being freely available on the Internet. In particular
social media platforms provide rich sources of user-generated data, though typically
in unstructured form, and with high diversity, such as written in many different
languages. Automatically identifying meaningful information in such big data resources
and extracting it efficiently is one of the ongoing challenges of our time. A
common step for this is sentiment analysis, which forms the foundation for tasks
such as opinion mining or trend prediction. Unfortunately, publicly available
tools for this task are almost exclusively available for English-language texts.
Consequently, a large fraction of the Internet users, who do not communicate in
English, are ignored in automatized studies, a phenomenon called rare-language
discrimination.In this work we propose a technique to overcome this problem by
a truly multi-lingual model, which can be trained automatically without linguistic
knowledge or even the ability to read the many target languages. The main step
is to combine self-annotation, specifically the use of emoticons as a proxy for
labels, with multi-lingual sentence representations.To evaluate our method we
curated several large datasets from data obtained via the free Twitter streaming
API. The results show that our proposed multi-lingual training is able to achieve
sentiment predictions at the same quality level for rare languages as for frequent
ones, and in particular clearly better than what mono-lingual training achieves
on the same data. '
article_processing_charge: No
author:
- first_name: Jasmin
full_name: Lampert, Jasmin
last_name: Lampert
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0002-4561-241X
citation:
ama: 'Lampert J, Lampert C. Overcoming rare-language discrimination in multi-lingual
sentiment analysis. In: 2021 IEEE International Conference on Big Data.
IEEE; 2022:5185-5192. doi:10.1109/bigdata52589.2021.9672003'
apa: 'Lampert, J., & Lampert, C. (2022). Overcoming rare-language discrimination
in multi-lingual sentiment analysis. In 2021 IEEE International Conference
on Big Data (pp. 5185–5192). Orlando, FL, United States: IEEE. https://doi.org/10.1109/bigdata52589.2021.9672003'
chicago: Lampert, Jasmin, and Christoph Lampert. “Overcoming Rare-Language Discrimination
in Multi-Lingual Sentiment Analysis.” In 2021 IEEE International Conference
on Big Data, 5185–92. IEEE, 2022. https://doi.org/10.1109/bigdata52589.2021.9672003.
ieee: J. Lampert and C. Lampert, “Overcoming rare-language discrimination in multi-lingual
sentiment analysis,” in 2021 IEEE International Conference on Big Data,
Orlando, FL, United States, 2022, pp. 5185–5192.
ista: 'Lampert J, Lampert C. 2022. Overcoming rare-language discrimination in multi-lingual
sentiment analysis. 2021 IEEE International Conference on Big Data. Big Data:
International Conference on Big Data, 5185–5192.'
mla: Lampert, Jasmin, and Christoph Lampert. “Overcoming Rare-Language Discrimination
in Multi-Lingual Sentiment Analysis.” 2021 IEEE International Conference on
Big Data, IEEE, 2022, pp. 5185–92, doi:10.1109/bigdata52589.2021.9672003.
short: J. Lampert, C. Lampert, in:, 2021 IEEE International Conference on Big Data,
IEEE, 2022, pp. 5185–5192.
conference:
end_date: 2021-12-18
location: Orlando, FL, United States
name: 'Big Data: International Conference on Big Data'
start_date: 2021-12-15
date_created: 2022-02-10T14:08:23Z
date_published: 2022-01-13T00:00:00Z
date_updated: 2023-08-02T14:27:50Z
day: '13'
department:
- _id: ChLa
doi: 10.1109/bigdata52589.2021.9672003
external_id:
isi:
- '000800559505036'
isi: 1
language:
- iso: eng
month: '01'
oa_version: None
page: 5185-5192
publication: 2021 IEEE International Conference on Big Data
publication_identifier:
isbn:
- '9781665439022'
publication_status: published
publisher: IEEE
quality_controlled: '1'
status: public
title: Overcoming rare-language discrimination in multi-lingual sentiment analysis
type: conference
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
year: '2022'
...
---
_id: '12161'
abstract:
- lang: eng
text: 'We introduce LIMES, a new method for learning with non-stationary streaming
data, inspired by the recent success of meta-learning. The main idea is not to
attempt to learn a single classifier that would have to work well across all occurring
data distributions, nor many separate classifiers, but to exploit a hybrid strategy:
we learn a single set of model parameters from which a specific classifier for
any specific data distribution is derived via classifier adaptation. Assuming
a multiclass classification setting with class-prior shift, the adaptation step
can be performed analytically with only the classifier’s bias terms being affected.
Another contribution of our work is an extrapolation step that predicts suitable
adaptation parameters for future time steps based on the previous data. In combination,
we obtain a lightweight procedure for learning from streaming data with varying
class distribution that adds no trainable parameters and almost no memory or computational
overhead compared to training a single model. Experiments on a set of exemplary
tasks using Twitter data show that LIMES achieves higher accuracy than alternative
approaches, especially with respect to the relevant real-world metric of lowest
within-day accuracy.'
article_processing_charge: No
author:
- first_name: Paulina
full_name: Tomaszewska, Paulina
last_name: Tomaszewska
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Tomaszewska P, Lampert C. Lightweight conditional model extrapolation for
streaming data under class-prior shift. In: 26th International Conference on
Pattern Recognition. Vol 2022. Institute of Electrical and Electronics Engineers;
2022:2128-2134. doi:10.1109/icpr56361.2022.9956195'
apa: 'Tomaszewska, P., & Lampert, C. (2022). Lightweight conditional model extrapolation
for streaming data under class-prior shift. In 26th International Conference
on Pattern Recognition (Vol. 2022, pp. 2128–2134). Montreal, Canada: Institute
of Electrical and Electronics Engineers. https://doi.org/10.1109/icpr56361.2022.9956195'
chicago: Tomaszewska, Paulina, and Christoph Lampert. “Lightweight Conditional Model
Extrapolation for Streaming Data under Class-Prior Shift.” In 26th International
Conference on Pattern Recognition, 2022:2128–34. Institute of Electrical and
Electronics Engineers, 2022. https://doi.org/10.1109/icpr56361.2022.9956195.
ieee: P. Tomaszewska and C. Lampert, “Lightweight conditional model extrapolation
for streaming data under class-prior shift,” in 26th International Conference
on Pattern Recognition, Montreal, Canada, 2022, vol. 2022, pp. 2128–2134.
ista: 'Tomaszewska P, Lampert C. 2022. Lightweight conditional model extrapolation
for streaming data under class-prior shift. 26th International Conference on Pattern
Recognition. ICPR: International Conference on Pattern Recognition vol. 2022,
2128–2134.'
mla: Tomaszewska, Paulina, and Christoph Lampert. “Lightweight Conditional Model
Extrapolation for Streaming Data under Class-Prior Shift.” 26th International
Conference on Pattern Recognition, vol. 2022, Institute of Electrical and
Electronics Engineers, 2022, pp. 2128–34, doi:10.1109/icpr56361.2022.9956195.
short: P. Tomaszewska, C. Lampert, in:, 26th International Conference on Pattern
Recognition, Institute of Electrical and Electronics Engineers, 2022, pp. 2128–2134.
conference:
end_date: 2022-08-25
location: Montreal, Canada
name: 'ICPR: International Conference on Pattern Recognition'
start_date: 2022-08-21
date_created: 2023-01-12T12:09:38Z
date_published: 2022-11-29T00:00:00Z
date_updated: 2023-08-04T09:06:34Z
day: '29'
department:
- _id: ChLa
doi: 10.1109/icpr56361.2022.9956195
external_id:
arxiv:
- '2206.05181'
isi:
- '000897707602018'
intvolume: ' 2022'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2206.05181
month: '11'
oa: 1
oa_version: Preprint
page: 2128-2134
publication: 26th International Conference on Pattern Recognition
publication_identifier:
eisbn:
- '9781665490627'
eissn:
- 2831-7475
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Lightweight conditional model extrapolation for streaming data under class-prior
shift
type: conference
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
volume: 2022
year: '2022'
...
---
_id: '12299'
abstract:
- lang: eng
text: 'Transfer learning is a classic paradigm by which models pretrained on large
“upstream” datasets are adapted to yield good results on “downstream” specialized
datasets. Generally, more accurate models on the “upstream” dataset tend to provide
better transfer accuracy “downstream”. In this work, we perform an in-depth investigation
of this phenomenon in the context of convolutional neural networks (CNNs) trained
on the ImageNet dataset, which have been pruned-that is, compressed by sparsifiying
their connections. We consider transfer using unstructured pruned models obtained
by applying several state-of-the-art pruning methods, including magnitude-based,
second-order, regrowth, lottery-ticket, and regularization approaches, in the
context of twelve standard transfer tasks. In a nutshell, our study shows that
sparse models can match or even outperform the transfer performance of dense models,
even at high sparsities, and, while doing so, can lead to significant inference
and even training speedups. At the same time, we observe and analyze significant
differences in the behaviour of different pruning methods. The code is available
at: https://github.com/IST-DASLab/sparse-imagenet-transfer.'
acknowledgement: he authors would like to sincerely thank Christoph Lampert and Nir
Shavit for fruitful discussions during the development of this work, and Eldar Kurtic
for experimental support. EI was supported in part by the FWF DK VGSCO, grant agreement
number W1260-N35, while AP and DA acknowledge generous support by the ERC, via Starting
Grant 805223 ScaleML.
article_processing_charge: No
author:
- first_name: Eugenia B
full_name: Iofinova, Eugenia B
id: f9a17499-f6e0-11ea-865d-fdf9a3f77117
last_name: Iofinova
orcid: 0000-0002-7778-3221
- first_name: Elena-Alexandra
full_name: Peste, Elena-Alexandra
id: 32D78294-F248-11E8-B48F-1D18A9856A87
last_name: Peste
- first_name: Mark
full_name: Kurtz, Mark
last_name: Kurtz
- first_name: Dan-Adrian
full_name: Alistarh, Dan-Adrian
id: 4A899BFC-F248-11E8-B48F-1D18A9856A87
last_name: Alistarh
orcid: 0000-0003-3650-940X
citation:
ama: 'Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. How well do sparse ImageNet
models transfer? In: 2022 IEEE/CVF Conference on Computer Vision and Pattern
Recognition. Institute of Electrical and Electronics Engineers; 2022:12256-12266.
doi:10.1109/cvpr52688.2022.01195'
apa: 'Iofinova, E. B., Peste, E.-A., Kurtz, M., & Alistarh, D.-A. (2022). How
well do sparse ImageNet models transfer? In 2022 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (pp. 12256–12266). New Orleans, LA, United
States: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/cvpr52688.2022.01195'
chicago: Iofinova, Eugenia B, Elena-Alexandra Peste, Mark Kurtz, and Dan-Adrian
Alistarh. “How Well Do Sparse ImageNet Models Transfer?” In 2022 IEEE/CVF Conference
on Computer Vision and Pattern Recognition, 12256–66. Institute of Electrical
and Electronics Engineers, 2022. https://doi.org/10.1109/cvpr52688.2022.01195.
ieee: E. B. Iofinova, E.-A. Peste, M. Kurtz, and D.-A. Alistarh, “How well do sparse
ImageNet models transfer?,” in 2022 IEEE/CVF Conference on Computer Vision
and Pattern Recognition, New Orleans, LA, United States, 2022, pp. 12256–12266.
ista: 'Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. 2022. How well do sparse ImageNet
models transfer? 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
CVPR: Computer Vision and Pattern Recognition, 12256–12266.'
mla: Iofinova, Eugenia B., et al. “How Well Do Sparse ImageNet Models Transfer?”
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Institute
of Electrical and Electronics Engineers, 2022, pp. 12256–66, doi:10.1109/cvpr52688.2022.01195.
short: E.B. Iofinova, E.-A. Peste, M. Kurtz, D.-A. Alistarh, in:, 2022 IEEE/CVF
Conference on Computer Vision and Pattern Recognition, Institute of Electrical
and Electronics Engineers, 2022, pp. 12256–12266.
conference:
end_date: 2022-06-24
location: New Orleans, LA, United States
name: 'CVPR: Computer Vision and Pattern Recognition'
start_date: 2022-06-18
date_created: 2023-01-16T10:06:00Z
date_published: 2022-09-27T00:00:00Z
date_updated: 2023-08-04T10:33:28Z
day: '27'
department:
- _id: DaAl
- _id: ChLa
doi: 10.1109/cvpr52688.2022.01195
ec_funded: 1
external_id:
arxiv:
- '2111.13445'
isi:
- '000870759105034'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2111.13445
month: '09'
oa: 1
oa_version: Preprint
page: 12256-12266
project:
- _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A
grant_number: ' W1260-N35'
name: Vienna Graduate School on Computational Optimization
- _id: 268A44D6-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '805223'
name: Elastic Coordination for Scalable Machine Learning
publication: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition
publication_identifier:
eissn:
- 2575-7075
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
related_material:
record:
- id: '13074'
relation: dissertation_contains
status: public
scopus_import: '1'
status: public
title: How well do sparse ImageNet models transfer?
type: conference
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
year: '2022'
...
---
_id: '10802'
abstract:
- lang: eng
text: "Addressing fairness concerns about machine learning models is a crucial step
towards their long-term adoption in real-world automated systems. While many approaches
have been developed for training fair models from data, little is known about
the robustness of these methods to data corruption. In this work we consider fairness-aware
learning under worst-case data manipulations. We show that an adversary can in
some situations force any learner to return an overly biased classifier, regardless
of the sample size and with or without degrading\r\naccuracy, and that the strength
of the excess bias increases for learning problems with underrepresented protected
groups in the data. We also prove that our hardness results are tight up to constant
factors. To this end, we study two natural learning algorithms that optimize for
both accuracy and fairness and show that these algorithms enjoy guarantees that
are order-optimal in terms of the corruption ratio and the protected groups frequencies
in the large data\r\nlimit."
acknowledgement: The authors thank Eugenia Iofinova and Bernd Prach for providing
feedback on early versions of this paper. This publication was made possible by
an ETH AI Center postdoctoral fellowship to Nikola Konstantinov.
article_processing_charge: No
article_type: original
author:
- first_name: Nikola H
full_name: Konstantinov, Nikola H
id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87
last_name: Konstantinov
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0002-4561-241X
citation:
ama: Konstantinov NH, Lampert C. Fairness-aware PAC learning from corrupted data.
Journal of Machine Learning Research. 2022;23:1-60.
apa: Konstantinov, N. H., & Lampert, C. (2022). Fairness-aware PAC learning
from corrupted data. Journal of Machine Learning Research. ML Research
Press.
chicago: Konstantinov, Nikola H, and Christoph Lampert. “Fairness-Aware PAC Learning
from Corrupted Data.” Journal of Machine Learning Research. ML Research
Press, 2022.
ieee: N. H. Konstantinov and C. Lampert, “Fairness-aware PAC learning from corrupted
data,” Journal of Machine Learning Research, vol. 23. ML Research Press,
pp. 1–60, 2022.
ista: Konstantinov NH, Lampert C. 2022. Fairness-aware PAC learning from corrupted
data. Journal of Machine Learning Research. 23, 1–60.
mla: Konstantinov, Nikola H., and Christoph Lampert. “Fairness-Aware PAC Learning
from Corrupted Data.” Journal of Machine Learning Research, vol. 23, ML
Research Press, 2022, pp. 1–60.
short: N.H. Konstantinov, C. Lampert, Journal of Machine Learning Research 23 (2022)
1–60.
date_created: 2022-02-28T14:05:42Z
date_published: 2022-05-01T00:00:00Z
date_updated: 2023-09-26T10:44:37Z
day: '01'
ddc:
- '004'
department:
- _id: ChLa
external_id:
arxiv:
- '2102.06004'
file:
- access_level: open_access
checksum: 9cac897b54a0ddf3a553a2c33e88cfda
content_type: application/pdf
creator: kschuh
date_created: 2022-07-12T15:08:28Z
date_updated: 2022-07-12T15:08:28Z
file_id: '11570'
file_name: 2022_JournalMachineLearningResearch_Konstantinov.pdf
file_size: 551862
relation: main_file
success: 1
file_date_updated: 2022-07-12T15:08:28Z
has_accepted_license: '1'
intvolume: ' 23'
keyword:
- Fairness
- robustness
- data poisoning
- trustworthy machine learning
- PAC learning
language:
- iso: eng
month: '05'
oa: 1
oa_version: Published Version
page: 1-60
publication: Journal of Machine Learning Research
publication_identifier:
eissn:
- 1533-7928
issn:
- 1532-4435
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
record:
- id: '10799'
relation: dissertation_contains
status: public
- id: '13241'
relation: shorter_version
status: public
scopus_import: '1'
status: public
title: Fairness-aware PAC learning from corrupted data
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 23
year: '2022'
...
---
_id: '13241'
abstract:
- lang: eng
text: Addressing fairness concerns about machine learning models is a crucial step
towards their long-term adoption in real-world automated systems. Many approaches
for training fair models from data have been developed and an implicit assumption
about such algorithms is that they are able to recover a fair model, despite potential
historical biases in the data. In this work we show a number of impossibility
results that indicate that there is no learning algorithm that can recover a fair
model when a proportion of the dataset is subject to arbitrary manipulations.
Specifically, we prove that there are situations in which an adversary can force
any learner to return a biased classifier, with or without degrading accuracy,
and that the strength of this bias increases for learning problems with underrepresented
protected groups in the data. Our results emphasize on the importance of studying
further data corruption models of various strength and of establishing stricter
data collection practices for fairness-aware learning.
acknowledgement: "This paper is a shortened, workshop version of Konstantinov and
Lampert (2021),\r\nhttps://arxiv.org/abs/2102.06004. For further results, including
an analysis of algorithms achieving the lower bounds from this paper, we refer to
the full version."
article_processing_charge: No
author:
- first_name: Nikola H
full_name: Konstantinov, Nikola H
id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87
last_name: Konstantinov
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Konstantinov NH, Lampert C. On the impossibility of fairness-aware learning
from corrupted data. In: Proceedings of Machine Learning Research. Vol
171. ML Research Press; 2022:59-83.'
apa: Konstantinov, N. H., & Lampert, C. (2022). On the impossibility of fairness-aware
learning from corrupted data. In Proceedings of Machine Learning Research
(Vol. 171, pp. 59–83). ML Research Press.
chicago: Konstantinov, Nikola H, and Christoph Lampert. “On the Impossibility of
Fairness-Aware Learning from Corrupted Data.” In Proceedings of Machine Learning
Research, 171:59–83. ML Research Press, 2022.
ieee: N. H. Konstantinov and C. Lampert, “On the impossibility of fairness-aware
learning from corrupted data,” in Proceedings of Machine Learning Research,
2022, vol. 171, pp. 59–83.
ista: Konstantinov NH, Lampert C. 2022. On the impossibility of fairness-aware learning
from corrupted data. Proceedings of Machine Learning Research. vol. 171, 59–83.
mla: Konstantinov, Nikola H., and Christoph Lampert. “On the Impossibility of Fairness-Aware
Learning from Corrupted Data.” Proceedings of Machine Learning Research,
vol. 171, ML Research Press, 2022, pp. 59–83.
short: N.H. Konstantinov, C. Lampert, in:, Proceedings of Machine Learning Research,
ML Research Press, 2022, pp. 59–83.
date_created: 2023-07-16T22:01:13Z
date_published: 2022-12-01T00:00:00Z
date_updated: 2023-09-26T10:44:37Z
day: '01'
department:
- _id: ChLa
external_id:
arxiv:
- '2102.06004'
intvolume: ' 171'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://arxiv.org/abs/2102.06004
month: '12'
oa: 1
oa_version: Preprint
page: 59-83
publication: Proceedings of Machine Learning Research
publication_identifier:
eissn:
- 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
record:
- id: '10802'
relation: extended_version
status: public
scopus_import: '1'
status: public
title: On the impossibility of fairness-aware learning from corrupted data
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 171
year: '2022'
...
---
_id: '10799'
abstract:
- lang: eng
text: "Because of the increasing popularity of machine learning methods, it is becoming
important to understand the impact of learned components on automated decision-making
systems and to guarantee that their consequences are beneficial to society. In
other words, it is necessary to ensure that machine learning is sufficiently trustworthy
to be used in real-world applications. This thesis studies two properties of machine
learning models that are highly desirable for the\r\nsake of reliability: robustness
and fairness. In the first part of the thesis we study the robustness of learning
algorithms to training data corruption. Previous work has shown that machine learning
models are vulnerable to a range\r\nof training set issues, varying from label
noise through systematic biases to worst-case data manipulations. This is an especially
relevant problem from a present perspective, since modern machine learning methods
are particularly data hungry and therefore practitioners often have to rely on
data collected from various external sources, e.g. from the Internet, from app
users or via crowdsourcing. Naturally, such sources vary greatly in the quality
and reliability of the\r\ndata they provide. With these considerations in mind,
we study the problem of designing machine learning algorithms that are robust
to corruptions in data coming from multiple sources. We show that, in contrast
to the case of a single dataset with outliers, successful learning within this
model is possible both theoretically and practically, even under worst-case data
corruptions. The second part of this thesis deals with fairness-aware machine
learning. There are multiple areas where machine learning models have shown promising
results, but where careful considerations are required, in order to avoid discrimanative
decisions taken by such learned components. Ensuring fairness can be particularly
challenging, because real-world training datasets are expected to contain various
forms of historical bias that may affect the learning process. In this thesis
we show that data corruption can indeed render the problem of achieving fairness
impossible, by tightly characterizing the theoretical limits of fair learning
under worst-case data manipulations. However, assuming access to clean data, we
also show how fairness-aware learning can be made practical in contexts beyond
binary classification, in particular in the challenging learning to rank setting."
alternative_title:
- ISTA Thesis
article_processing_charge: No
author:
- first_name: Nikola H
full_name: Konstantinov, Nikola H
id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87
last_name: Konstantinov
citation:
ama: Konstantinov NH. Robustness and fairness in machine learning. 2022. doi:10.15479/at:ista:10799
apa: Konstantinov, N. H. (2022). Robustness and fairness in machine learning.
Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:10799
chicago: Konstantinov, Nikola H. “Robustness and Fairness in Machine Learning.”
Institute of Science and Technology Austria, 2022. https://doi.org/10.15479/at:ista:10799.
ieee: N. H. Konstantinov, “Robustness and fairness in machine learning,” Institute
of Science and Technology Austria, 2022.
ista: Konstantinov NH. 2022. Robustness and fairness in machine learning. Institute
of Science and Technology Austria.
mla: Konstantinov, Nikola H. Robustness and Fairness in Machine Learning.
Institute of Science and Technology Austria, 2022, doi:10.15479/at:ista:10799.
short: N.H. Konstantinov, Robustness and Fairness in Machine Learning, Institute
of Science and Technology Austria, 2022.
date_created: 2022-02-28T13:03:49Z
date_published: 2022-03-08T00:00:00Z
date_updated: 2023-10-17T12:31:54Z
day: '08'
ddc:
- '000'
degree_awarded: PhD
department:
- _id: GradSch
- _id: ChLa
doi: 10.15479/at:ista:10799
ec_funded: 1
file:
- access_level: open_access
checksum: 626bc523ae8822d20e635d0e2d95182e
content_type: application/pdf
creator: nkonstan
date_created: 2022-03-06T11:42:54Z
date_updated: 2022-03-06T11:42:54Z
file_id: '10823'
file_name: thesis.pdf
file_size: 4204905
relation: main_file
success: 1
- access_level: closed
checksum: e2ca2b88350ac8ea1515b948885cbcb1
content_type: application/x-zip-compressed
creator: nkonstan
date_created: 2022-03-06T11:42:57Z
date_updated: 2022-03-10T12:11:48Z
file_id: '10824'
file_name: thesis.zip
file_size: 22841103
relation: source_file
file_date_updated: 2022-03-10T12:11:48Z
has_accepted_license: '1'
keyword:
- robustness
- fairness
- machine learning
- PAC learning
- adversarial learning
language:
- iso: eng
month: '03'
oa: 1
oa_version: Published Version
page: '176'
project:
- _id: 2564DBCA-B435-11E9-9278-68D0E5697425
call_identifier: H2020
grant_number: '665385'
name: International IST Doctoral Program
publication_identifier:
isbn:
- 978-3-99078-015-2
issn:
- 2663-337X
publication_status: published
publisher: Institute of Science and Technology Austria
related_material:
record:
- id: '8724'
relation: part_of_dissertation
status: public
- id: '10803'
relation: part_of_dissertation
status: public
- id: '10802'
relation: part_of_dissertation
status: public
- id: '6590'
relation: part_of_dissertation
status: public
status: public
supervisor:
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
title: Robustness and fairness in machine learning
type: dissertation
user_id: c635000d-4b10-11ee-a964-aac5a93f6ac1
year: '2022'
...
---
_id: '9210'
abstract:
- lang: eng
text: "Modern neural networks can easily fit their training set perfectly. Surprisingly,
despite being “overfit” in this way, they tend to generalize well to future data,
thereby defying the classic bias–variance trade-off of machine learning theory.
Of the many possible explanations, a prevalent one is that training by stochastic
gradient descent (SGD) imposes an implicit bias that leads it to learn simple
functions, and these simple functions generalize well. However, the specifics
of this implicit bias are not well understood.\r\nIn this work, we explore the
smoothness conjecture which states that SGD is implicitly biased towards learning
functions that are smooth. We propose several measures to formalize the intuitive
notion of smoothness, and we conduct experiments to determine whether SGD indeed
implicitly optimizes for these measures. Our findings rule out the possibility
that smoothness measures based on first-order derivatives are being implicitly
enforced. They are supportive, though, of the smoothness conjecture for measures
based on second-order derivatives."
article_processing_charge: No
author:
- first_name: Vaclav
full_name: Volhejn, Vaclav
id: d5235fb4-7a6d-11eb-b254-f25d12d631a8
last_name: Volhejn
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Volhejn V, Lampert C. Does SGD implicitly optimize for smoothness? In: 42nd
German Conference on Pattern Recognition. Vol 12544. LNCS. Springer; 2021:246-259.
doi:10.1007/978-3-030-71278-5_18'
apa: 'Volhejn, V., & Lampert, C. (2021). Does SGD implicitly optimize for smoothness?
In 42nd German Conference on Pattern Recognition (Vol. 12544, pp. 246–259).
Tübingen, Germany: Springer. https://doi.org/10.1007/978-3-030-71278-5_18'
chicago: Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for
Smoothness?” In 42nd German Conference on Pattern Recognition, 12544:246–59.
LNCS. Springer, 2021. https://doi.org/10.1007/978-3-030-71278-5_18.
ieee: V. Volhejn and C. Lampert, “Does SGD implicitly optimize for smoothness?,”
in 42nd German Conference on Pattern Recognition, Tübingen, Germany, 2021,
vol. 12544, pp. 246–259.
ista: 'Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness?
42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on
Pattern Recognition LNCS vol. 12544, 246–259.'
mla: Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?”
42nd German Conference on Pattern Recognition, vol. 12544, Springer, 2021,
pp. 246–59, doi:10.1007/978-3-030-71278-5_18.
short: V. Volhejn, C. Lampert, in:, 42nd German Conference on Pattern Recognition,
Springer, 2021, pp. 246–259.
conference:
end_date: 2020-10-01
location: Tübingen, Germany
name: 'DAGM GCPR: German Conference on Pattern Recognition '
start_date: 2020-09-28
date_created: 2021-03-01T09:01:16Z
date_published: 2021-03-17T00:00:00Z
date_updated: 2022-08-12T07:28:47Z
day: '17'
ddc:
- '510'
department:
- _id: ChLa
doi: 10.1007/978-3-030-71278-5_18
file:
- access_level: open_access
checksum: 3e3628ab1cf658d82524963f808004ea
content_type: application/pdf
creator: dernst
date_created: 2022-08-12T07:27:58Z
date_updated: 2022-08-12T07:27:58Z
file_id: '11820'
file_name: 2020_GCPR_submitted_Volhejn.pdf
file_size: 420234
relation: main_file
success: 1
file_date_updated: 2022-08-12T07:27:58Z
has_accepted_license: '1'
intvolume: ' 12544'
language:
- iso: eng
month: '03'
oa: 1
oa_version: Submitted Version
page: 246-259
publication: 42nd German Conference on Pattern Recognition
publication_identifier:
eissn:
- 1611-3349
isbn:
- '9783030712778'
issn:
- 0302-9743
publication_status: published
publisher: Springer
quality_controlled: '1'
scopus_import: '1'
series_title: LNCS
status: public
title: Does SGD implicitly optimize for smoothness?
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 12544
year: '2021'
...
---
_id: '9416'
abstract:
- lang: eng
text: 'We study the inductive bias of two-layer ReLU networks trained by gradient
flow. We identify a class of easy-to-learn (`orthogonally separable'') datasets,
and characterise the solution that ReLU networks trained on such datasets converge
to. Irrespective of network width, the solution turns out to be a combination
of two max-margin classifiers: one corresponding to the positive data subset and
one corresponding to the negative data subset. The proof is based on the recently
introduced concept of extremal sectors, for which we prove a number of properties
in the context of orthogonal separability. In particular, we prove stationarity
of activation patterns from some time onwards, which enables a reduction of the
ReLU network to an ensemble of linear subnetworks.'
article_processing_charge: No
author:
- first_name: Phuong
full_name: Bui Thi Mai, Phuong
id: 3EC6EE64-F248-11E8-B48F-1D18A9856A87
last_name: Bui Thi Mai
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Phuong M, Lampert C. The inductive bias of ReLU networks on orthogonally separable
data. In: 9th International Conference on Learning Representations. ; 2021.'
apa: Phuong, M., & Lampert, C. (2021). The inductive bias of ReLU networks on
orthogonally separable data. In 9th International Conference on Learning Representations.
Virtual.
chicago: Phuong, Mary, and Christoph Lampert. “The Inductive Bias of ReLU Networks
on Orthogonally Separable Data.” In 9th International Conference on Learning
Representations, 2021.
ieee: M. Phuong and C. Lampert, “The inductive bias of ReLU networks on orthogonally
separable data,” in 9th International Conference on Learning Representations,
Virtual, 2021.
ista: 'Phuong M, Lampert C. 2021. The inductive bias of ReLU networks on orthogonally
separable data. 9th International Conference on Learning Representations. ICLR:
International Conference on Learning Representations.'
mla: Phuong, Mary, and Christoph Lampert. “The Inductive Bias of ReLU Networks on
Orthogonally Separable Data.” 9th International Conference on Learning Representations,
2021.
short: M. Phuong, C. Lampert, in:, 9th International Conference on Learning Representations,
2021.
conference:
end_date: 2021-05-07
location: Virtual
name: ' ICLR: International Conference on Learning Representations'
start_date: 2021-05-03
date_created: 2021-05-24T11:16:46Z
date_published: 2021-05-01T00:00:00Z
date_updated: 2023-09-07T13:29:50Z
day: '01'
ddc:
- '000'
department:
- _id: GradSch
- _id: ChLa
file:
- access_level: open_access
checksum: f34ff17017527db5ba6927f817bdd125
content_type: application/pdf
creator: bphuong
date_created: 2021-05-24T11:15:57Z
date_updated: 2021-05-24T11:15:57Z
file_id: '9417'
file_name: iclr2021_conference.pdf
file_size: 502356
relation: main_file
file_date_updated: 2021-05-24T11:15:57Z
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://openreview.net/pdf?id=krz7T0xU9Z_
month: '05'
oa: 1
oa_version: Published Version
publication: 9th International Conference on Learning Representations
publication_status: published
quality_controlled: '1'
related_material:
record:
- id: '9418'
relation: dissertation_contains
status: public
scopus_import: '1'
status: public
title: The inductive bias of ReLU networks on orthogonally separable data
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2021'
...