--- _id: '13053' abstract: - lang: eng text: 'Deep neural networks (DNNs) often have to be compressed, via pruning and/or quantization, before they can be deployed in practical settings. In this work we propose a new compression-aware minimizer dubbed CrAM that modifies the optimization step in a principled way, in order to produce models whose local loss behavior is stable under compression operations such as pruning. Thus, dense models trained via CrAM should be compressible post-training, in a single step, without significant accuracy loss. Experimental results on standard benchmarks, such as residual networks for ImageNet classification and BERT models for language modelling, show that CrAM produces dense models that can be more accurate than the standard SGD/Adam-based baselines, but which are stable under weight pruning: specifically, we can prune models in one-shot to 70-80% sparsity with almost no accuracy loss, and to 90% with reasonable (∼1%) accuracy loss, which is competitive with gradual compression methods. Additionally, CrAM can produce sparse models which perform well for transfer learning, and it also works for semi-structured 2:4 pruning patterns supported by GPU hardware. The code for reproducing the results is available at this https URL .' acknowledged_ssus: - _id: ScienComp acknowledgement: "AP, EK, DA received funding from the European Research Council (ERC) under the European\r\nUnion’s Horizon 2020 research and innovation programme (grant agreement No 805223 ScaleML). AV acknowledges the support of the French Agence Nationale de la Recherche (ANR), under grant ANR-21-CE48-0016 (project COMCOPT). We further acknowledge the support from the Scientific Service Units (SSU) of ISTA through resources provided by Scientific Computing (SciComp)-" article_processing_charge: No author: - first_name: Elena-Alexandra full_name: Peste, Elena-Alexandra id: 32D78294-F248-11E8-B48F-1D18A9856A87 last_name: Peste - first_name: Adrian full_name: Vladu, Adrian last_name: Vladu - first_name: Eldar full_name: Kurtic, Eldar id: 47beb3a5-07b5-11eb-9b87-b108ec578218 last_name: Kurtic - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X citation: ama: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware Minimizer. In: 11th International Conference on Learning Representations .' apa: 'Peste, E.-A., Vladu, A., Kurtic, E., Lampert, C., & Alistarh, D.-A. (n.d.). CrAM: A Compression-Aware Minimizer. In 11th International Conference on Learning Representations . Kigali, Rwanda .' chicago: 'Peste, Elena-Alexandra, Adrian Vladu, Eldar Kurtic, Christoph Lampert, and Dan-Adrian Alistarh. “CrAM: A Compression-Aware Minimizer.” In 11th International Conference on Learning Representations , n.d.' ieee: 'E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, and D.-A. Alistarh, “CrAM: A Compression-Aware Minimizer,” in 11th International Conference on Learning Representations , Kigali, Rwanda .' ista: 'Peste E-A, Vladu A, Kurtic E, Lampert C, Alistarh D-A. CrAM: A Compression-Aware Minimizer. 11th International Conference on Learning Representations . ICLR: International Conference on Learning Representations.' mla: 'Peste, Elena-Alexandra, et al. “CrAM: A Compression-Aware Minimizer.” 11th International Conference on Learning Representations .' short: E.-A. Peste, A. Vladu, E. Kurtic, C. Lampert, D.-A. Alistarh, in:, 11th International Conference on Learning Representations , n.d. conference: end_date: 2023-05-05 location: 'Kigali, Rwanda ' name: 'ICLR: International Conference on Learning Representations' start_date: 2023-05-01 date_created: 2023-05-23T11:36:18Z date_published: 2023-05-01T00:00:00Z date_updated: 2023-06-01T12:54:45Z department: - _id: GradSch - _id: DaAl - _id: ChLa ec_funded: 1 external_id: arxiv: - '2207.14200' language: - iso: eng main_file_link: - open_access: '1' url: https://openreview.net/pdf?id=_eTZBs-yedr month: '05' oa: 1 oa_version: Preprint project: - _id: 268A44D6-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '805223' name: Elastic Coordination for Scalable Machine Learning publication: '11th International Conference on Learning Representations ' publication_status: accepted quality_controlled: '1' related_material: record: - id: '13074' relation: dissertation_contains status: public status: public title: 'CrAM: A Compression-Aware Minimizer' type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2023' ... --- _id: '13074' abstract: - lang: eng text: "Deep learning has become an integral part of a large number of important applications, and many of the recent breakthroughs have been enabled by the ability to train very large models, capable to capture complex patterns and relationships from the data. At the same time, the massive sizes of modern deep learning models have made their deployment to smaller devices more challenging; this is particularly important, as in many applications the users rely on accurate deep learning predictions, but they only have access to devices with limited memory and compute power. One solution to this problem is to prune neural networks, by setting as many of their parameters as possible to zero, to obtain accurate sparse models with lower memory footprint. Despite the great research progress in obtaining sparse models that preserve accuracy, while satisfying memory and computational constraints, there are still many challenges associated with efficiently training sparse models, as well as understanding their generalization properties.\r\n\r\nThe focus of this thesis is to investigate how the training process of sparse models can be made more efficient, and to understand the differences between sparse and dense models in terms of how well they can generalize to changes in the data distribution. We first study a method for co-training sparse and dense models, at a lower cost compared to regular training. With our method we can obtain very accurate sparse networks, and dense models that can recover the baseline accuracy. Furthermore, we are able to more easily analyze the differences, at prediction level, between the sparse-dense model pairs. Next, we investigate the generalization properties of sparse neural networks in more detail, by studying how well different sparse models trained on a larger task can adapt to smaller, more specialized tasks, in a transfer learning scenario. Our analysis across multiple pruning methods and sparsity levels reveals that sparse models provide features that can transfer similarly to or better than the dense baseline. However, the choice of the pruning method plays an important role, and can influence the results when the features are fixed (linear finetuning), or when they are allowed to adapt to the new task (full finetuning). Using sparse models with fixed masks for finetuning on new tasks has an important practical advantage, as it enables training neural networks on smaller devices. However, one drawback of current pruning methods is that the entire training cycle has to be repeated to obtain the initial sparse model, for every sparsity target; in consequence, the entire training process is costly and also multiple models need to be stored. In the last part of the thesis we propose a method that can train accurate dense models that are compressible in a single step, to multiple sparsity levels, without additional finetuning. Our method results in sparse models that can be competitive with existing pruning methods, and which can also successfully generalize to new tasks." acknowledged_ssus: - _id: ScienComp alternative_title: - ISTA Thesis article_processing_charge: No author: - first_name: Elena-Alexandra full_name: Peste, Elena-Alexandra id: 32D78294-F248-11E8-B48F-1D18A9856A87 last_name: Peste citation: ama: Peste E-A. Efficiency and generalization of sparse neural networks. 2023. doi:10.15479/at:ista:13074 apa: Peste, E.-A. (2023). Efficiency and generalization of sparse neural networks. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:13074 chicago: Peste, Elena-Alexandra. “Efficiency and Generalization of Sparse Neural Networks.” Institute of Science and Technology Austria, 2023. https://doi.org/10.15479/at:ista:13074. ieee: E.-A. Peste, “Efficiency and generalization of sparse neural networks,” Institute of Science and Technology Austria, 2023. ista: Peste E-A. 2023. Efficiency and generalization of sparse neural networks. Institute of Science and Technology Austria. mla: Peste, Elena-Alexandra. Efficiency and Generalization of Sparse Neural Networks. Institute of Science and Technology Austria, 2023, doi:10.15479/at:ista:13074. short: E.-A. Peste, Efficiency and Generalization of Sparse Neural Networks, Institute of Science and Technology Austria, 2023. date_created: 2023-05-23T17:07:53Z date_published: 2023-05-23T00:00:00Z date_updated: 2023-08-04T10:33:27Z day: '23' ddc: - '000' degree_awarded: PhD department: - _id: GradSch - _id: DaAl - _id: ChLa doi: 10.15479/at:ista:13074 ec_funded: 1 file: - access_level: open_access checksum: 6b3354968403cb9d48cc5a83611fb571 content_type: application/pdf creator: epeste date_created: 2023-05-24T16:11:16Z date_updated: 2023-05-24T16:11:16Z file_id: '13087' file_name: PhD_Thesis_Alexandra_Peste_final.pdf file_size: 2152072 relation: main_file success: 1 - access_level: closed checksum: 8d0df94bbcf4db72c991f22503b3fd60 content_type: application/zip creator: epeste date_created: 2023-05-24T16:12:59Z date_updated: 2023-05-24T16:12:59Z file_id: '13088' file_name: PhD_Thesis_APeste.zip file_size: 1658293 relation: source_file file_date_updated: 2023-05-24T16:12:59Z has_accepted_license: '1' language: - iso: eng month: '05' oa: 1 oa_version: Published Version page: '147' project: - _id: 2564DBCA-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '665385' name: International IST Doctoral Program - _id: 268A44D6-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '805223' name: Elastic Coordination for Scalable Machine Learning publication_identifier: issn: - 2663-337X publication_status: published publisher: Institute of Science and Technology Austria related_material: record: - id: '11458' relation: part_of_dissertation status: public - id: '13053' relation: part_of_dissertation status: public - id: '12299' relation: part_of_dissertation status: public status: public supervisor: - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X title: Efficiency and generalization of sparse neural networks type: dissertation user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9 year: '2023' ... --- _id: '14320' abstract: - lang: eng text: The development of two-dimensional materials has resulted in a diverse range of novel, high-quality compounds with increasing complexity. A key requirement for a comprehensive quantitative theory is the accurate determination of these materials' band structure parameters. However, this task is challenging due to the intricate band structures and the indirect nature of experimental probes. In this work, we introduce a general framework to derive band structure parameters from experimental data using deep neural networks. We applied our method to the penetration field capacitance measurement of trilayer graphene, an effective probe of its density of states. First, we demonstrate that a trained deep network gives accurate predictions for the penetration field capacitance as a function of tight-binding parameters. Next, we use the fast and accurate predictions from the trained network to automatically determine tight-binding parameters directly from experimental data, with extracted parameters being in a good agreement with values in the literature. We conclude by discussing potential applications of our method to other materials and experimental techniques beyond penetration field capacitance. acknowledgement: A.F.Y. acknowledges primary support from the Department of Energy under award DE-SC0020043, and additional support from the Gordon and Betty Moore Foundation under award GBMF9471 for group operations. article_number: '125411' article_processing_charge: No article_type: original author: - first_name: Paul M full_name: Henderson, Paul M id: 13C09E74-18D9-11E9-8878-32CFE5697425 last_name: Henderson orcid: 0000-0002-5198-7445 - first_name: Areg full_name: Ghazaryan, Areg id: 4AF46FD6-F248-11E8-B48F-1D18A9856A87 last_name: Ghazaryan orcid: 0000-0001-9666-3543 - first_name: Alexander A. full_name: Zibrov, Alexander A. last_name: Zibrov - first_name: Andrea F. full_name: Young, Andrea F. last_name: Young - first_name: Maksym full_name: Serbyn, Maksym id: 47809E7E-F248-11E8-B48F-1D18A9856A87 last_name: Serbyn orcid: 0000-0002-2399-5827 citation: ama: 'Henderson PM, Ghazaryan A, Zibrov AA, Young AF, Serbyn M. Deep learning extraction of band structure parameters from density of states: A case study on trilayer graphene. Physical Review B. 2023;108(12). doi:10.1103/physrevb.108.125411' apa: 'Henderson, P. M., Ghazaryan, A., Zibrov, A. A., Young, A. F., & Serbyn, M. (2023). Deep learning extraction of band structure parameters from density of states: A case study on trilayer graphene. Physical Review B. American Physical Society. https://doi.org/10.1103/physrevb.108.125411' chicago: 'Henderson, Paul M, Areg Ghazaryan, Alexander A. Zibrov, Andrea F. Young, and Maksym Serbyn. “Deep Learning Extraction of Band Structure Parameters from Density of States: A Case Study on Trilayer Graphene.” Physical Review B. American Physical Society, 2023. https://doi.org/10.1103/physrevb.108.125411.' ieee: 'P. M. Henderson, A. Ghazaryan, A. A. Zibrov, A. F. Young, and M. Serbyn, “Deep learning extraction of band structure parameters from density of states: A case study on trilayer graphene,” Physical Review B, vol. 108, no. 12. American Physical Society, 2023.' ista: 'Henderson PM, Ghazaryan A, Zibrov AA, Young AF, Serbyn M. 2023. Deep learning extraction of band structure parameters from density of states: A case study on trilayer graphene. Physical Review B. 108(12), 125411.' mla: 'Henderson, Paul M., et al. “Deep Learning Extraction of Band Structure Parameters from Density of States: A Case Study on Trilayer Graphene.” Physical Review B, vol. 108, no. 12, 125411, American Physical Society, 2023, doi:10.1103/physrevb.108.125411.' short: P.M. Henderson, A. Ghazaryan, A.A. Zibrov, A.F. Young, M. Serbyn, Physical Review B 108 (2023). date_created: 2023-09-12T07:12:12Z date_published: 2023-09-15T00:00:00Z date_updated: 2023-09-20T09:38:24Z day: '15' department: - _id: MaSe - _id: ChLa - _id: MiLe doi: 10.1103/physrevb.108.125411 external_id: arxiv: - '2210.06310' intvolume: ' 108' issue: '12' language: - iso: eng main_file_link: - open_access: '1' url: https://doi.org/10.48550/arXiv.2210.06310 month: '09' oa: 1 oa_version: Preprint publication: Physical Review B publication_identifier: eissn: - 2469-9969 issn: - 2469-9950 publication_status: published publisher: American Physical Society quality_controlled: '1' scopus_import: '1' status: public title: 'Deep learning extraction of band structure parameters from density of states: A case study on trilayer graphene' type: journal_article user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 108 year: '2023' ... --- _id: '14410' abstract: - lang: eng text: This paper focuses on the implementation details of the baseline methods and a recent lightweight conditional model extrapolation algorithm LIMES [5] for streaming data under class-prior shift. LIMES achieves superior performance over the baseline methods, especially concerning the minimum-across-day accuracy, which is important for the users of the system. In this work, the key measures to facilitate reproducibility and enhance the credibility of the results are described. alternative_title: - LNCS article_processing_charge: No author: - first_name: Paulina full_name: Tomaszewska, Paulina last_name: Tomaszewska - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Tomaszewska P, Lampert C. On the implementation of baselines and lightweight conditional model extrapolation (LIMES) under class-prior shift. In: International Workshop on Reproducible Research in Pattern Recognition. Vol 14068. Springer Nature; 2023:67-73. doi:10.1007/978-3-031-40773-4_6' apa: 'Tomaszewska, P., & Lampert, C. (2023). On the implementation of baselines and lightweight conditional model extrapolation (LIMES) under class-prior shift. In International Workshop on Reproducible Research in Pattern Recognition (Vol. 14068, pp. 67–73). Montreal, Canada: Springer Nature. https://doi.org/10.1007/978-3-031-40773-4_6' chicago: Tomaszewska, Paulina, and Christoph Lampert. “On the Implementation of Baselines and Lightweight Conditional Model Extrapolation (LIMES) under Class-Prior Shift.” In International Workshop on Reproducible Research in Pattern Recognition, 14068:67–73. Springer Nature, 2023. https://doi.org/10.1007/978-3-031-40773-4_6. ieee: P. Tomaszewska and C. Lampert, “On the implementation of baselines and lightweight conditional model extrapolation (LIMES) under class-prior shift,” in International Workshop on Reproducible Research in Pattern Recognition, Montreal, Canada, 2023, vol. 14068, pp. 67–73. ista: 'Tomaszewska P, Lampert C. 2023. On the implementation of baselines and lightweight conditional model extrapolation (LIMES) under class-prior shift. International Workshop on Reproducible Research in Pattern Recognition. RRPR: Reproducible Research in Pattern Recognition, LNCS, vol. 14068, 67–73.' mla: Tomaszewska, Paulina, and Christoph Lampert. “On the Implementation of Baselines and Lightweight Conditional Model Extrapolation (LIMES) under Class-Prior Shift.” International Workshop on Reproducible Research in Pattern Recognition, vol. 14068, Springer Nature, 2023, pp. 67–73, doi:10.1007/978-3-031-40773-4_6. short: P. Tomaszewska, C. Lampert, in:, International Workshop on Reproducible Research in Pattern Recognition, Springer Nature, 2023, pp. 67–73. conference: end_date: 2022-08-21 location: Montreal, Canada name: 'RRPR: Reproducible Research in Pattern Recognition' start_date: 2022-08-21 date_created: 2023-10-08T22:01:18Z date_published: 2023-08-20T00:00:00Z date_updated: 2023-10-09T06:48:02Z day: '20' department: - _id: ChLa doi: 10.1007/978-3-031-40773-4_6 intvolume: ' 14068' language: - iso: eng month: '08' oa_version: None page: 67-73 publication: International Workshop on Reproducible Research in Pattern Recognition publication_identifier: eissn: - 1611-3349 isbn: - '9783031407727' issn: - 0302-9743 publication_status: published publisher: Springer Nature quality_controlled: '1' scopus_import: '1' status: public title: On the implementation of baselines and lightweight conditional model extrapolation (LIMES) under class-prior shift type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 14068 year: '2023' ... --- _id: '14446' abstract: - lang: eng text: Recent work has paid close attention to the first principle of Granger causality, according to which cause precedes effect. In this context, the question may arise whether the detected direction of causality also reverses after the time reversal of unidirectionally coupled data. Recently, it has been shown that for unidirectionally causally connected autoregressive (AR) processes X → Y, after time reversal of data, the opposite causal direction Y → X is indeed detected, although typically as part of the bidirectional X↔ Y link. As we argue here, the answer is different when the measured data are not from AR processes but from linked deterministic systems. When the goal is the usual forward data analysis, cross-mapping-like approaches correctly detect X → Y, while Granger causality-like approaches, which should not be used for deterministic time series, detect causal independence X → Y. The results of backward causal analysis depend on the predictability of the reversed data. Unlike AR processes, observables from deterministic dynamical systems, even complex nonlinear ones, can be predicted well forward, while backward predictions can be difficult (notably when the time reversal of a function leads to one-to-many relations). To address this problem, we propose an approach based on models that provide multiple candidate predictions for the target, combined with a loss function that consideres only the best candidate. The resulting good forward and backward predictability supports the view that unidirectionally causally linked deterministic dynamical systems X → Y can be expected to detect the same link both before and after time reversal. acknowledgement: The work was supported by the Scientific Grant Agency of the Ministry of Education of the Slovak Republic and the Slovak Academy of Sciences, projects APVV-21-0216, VEGA2-0096-21 and VEGA 2-0023-22. article_processing_charge: Yes article_type: original author: - first_name: Jozef full_name: Jakubík, Jozef last_name: Jakubík - first_name: Phuong full_name: Bui Thi Mai, Phuong id: 3EC6EE64-F248-11E8-B48F-1D18A9856A87 last_name: Bui Thi Mai - first_name: Martina full_name: Chvosteková, Martina last_name: Chvosteková - first_name: Anna full_name: Krakovská, Anna last_name: Krakovská citation: ama: Jakubík J, Phuong M, Chvosteková M, Krakovská A. Against the flow of time with multi-output models. Measurement Science Review. 2023;23(4):175-183. doi:10.2478/msr-2023-0023 apa: Jakubík, J., Phuong, M., Chvosteková, M., & Krakovská, A. (2023). Against the flow of time with multi-output models. Measurement Science Review. Sciendo. https://doi.org/10.2478/msr-2023-0023 chicago: Jakubík, Jozef, Mary Phuong, Martina Chvosteková, and Anna Krakovská. “Against the Flow of Time with Multi-Output Models.” Measurement Science Review. Sciendo, 2023. https://doi.org/10.2478/msr-2023-0023. ieee: J. Jakubík, M. Phuong, M. Chvosteková, and A. Krakovská, “Against the flow of time with multi-output models,” Measurement Science Review, vol. 23, no. 4. Sciendo, pp. 175–183, 2023. ista: Jakubík J, Phuong M, Chvosteková M, Krakovská A. 2023. Against the flow of time with multi-output models. Measurement Science Review. 23(4), 175–183. mla: Jakubík, Jozef, et al. “Against the Flow of Time with Multi-Output Models.” Measurement Science Review, vol. 23, no. 4, Sciendo, 2023, pp. 175–83, doi:10.2478/msr-2023-0023. short: J. Jakubík, M. Phuong, M. Chvosteková, A. Krakovská, Measurement Science Review 23 (2023) 175–183. date_created: 2023-10-22T22:01:15Z date_published: 2023-08-01T00:00:00Z date_updated: 2023-10-31T12:12:47Z day: '01' ddc: - '510' department: - _id: ChLa doi: 10.2478/msr-2023-0023 file: - access_level: open_access checksum: b069cc10fa6a7c96b2bc9f728165f9e6 content_type: application/pdf creator: dernst date_created: 2023-10-31T12:07:23Z date_updated: 2023-10-31T12:07:23Z file_id: '14476' file_name: 2023_MeasurementScienceRev_Jakubik.pdf file_size: 2639783 relation: main_file success: 1 file_date_updated: 2023-10-31T12:07:23Z has_accepted_license: '1' intvolume: ' 23' issue: '4' language: - iso: eng license: https://creativecommons.org/licenses/by-nc-nd/4.0/ month: '08' oa: 1 oa_version: Published Version page: 175-183 publication: Measurement Science Review publication_identifier: eissn: - 1335-8871 publication_status: published publisher: Sciendo quality_controlled: '1' scopus_import: '1' status: public title: Against the flow of time with multi-output models tmp: image: /images/cc_by_nc_nd.png legal_code_url: https://creativecommons.org/licenses/by-nc-nd/4.0/legalcode name: Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) short: CC BY-NC-ND (4.0) type: journal_article user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 23 year: '2023' ... --- _id: '14771' abstract: - lang: eng text: Pruning—that is, setting a significant subset of the parameters of a neural network to zero—is one of the most popular methods of model compression. Yet, several recent works have raised the issue that pruning may induce or exacerbate bias in the output of the compressed model. Despite existing evidence for this phenomenon, the relationship between neural network pruning and induced bias is not well-understood. In this work, we systematically investigate and characterize this phenomenon in Convolutional Neural Networks for computer vision. First, we show that it is in fact possible to obtain highly-sparse models, e.g. with less than 10% remaining weights, which do not decrease in accuracy nor substantially increase in bias when compared to dense models. At the same time, we also find that, at higher sparsities, pruned models exhibit higher uncertainty in their outputs, as well as increased correlations, which we directly link to increased bias. We propose easy-to-use criteria which, based only on the uncompressed model, establish whether bias will increase with pruning, and identify the samples most susceptible to biased predictions post-compression. Our code can be found at https://github.com/IST-DASLab/pruned-vision-model-bias. acknowledgement: The authors would like to sincerely thank Sara Hooker for her feedback during the development of this work. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. AP and DA acknowledge generous ERC support, via Starting Grant 805223 ScaleML. article_processing_charge: No author: - first_name: Eugenia B full_name: Iofinova, Eugenia B id: f9a17499-f6e0-11ea-865d-fdf9a3f77117 last_name: Iofinova orcid: 0000-0002-7778-3221 - first_name: Elena-Alexandra full_name: Peste, Elena-Alexandra id: 32D78294-F248-11E8-B48F-1D18A9856A87 last_name: Peste - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X citation: ama: 'Iofinova EB, Peste E-A, Alistarh D-A. Bias in pruned vision models: In-depth analysis and countermeasures. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE; 2023:24364-24373. doi:10.1109/cvpr52729.2023.02334' apa: 'Iofinova, E. B., Peste, E.-A., & Alistarh, D.-A. (2023). Bias in pruned vision models: In-depth analysis and countermeasures. In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 24364–24373). Vancouver, BC, Canada: IEEE. https://doi.org/10.1109/cvpr52729.2023.02334' chicago: 'Iofinova, Eugenia B, Elena-Alexandra Peste, and Dan-Adrian Alistarh. “Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures.” In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 24364–73. IEEE, 2023. https://doi.org/10.1109/cvpr52729.2023.02334.' ieee: 'E. B. Iofinova, E.-A. Peste, and D.-A. Alistarh, “Bias in pruned vision models: In-depth analysis and countermeasures,” in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 2023, pp. 24364–24373.' ista: 'Iofinova EB, Peste E-A, Alistarh D-A. 2023. Bias in pruned vision models: In-depth analysis and countermeasures. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR: Conference on Computer Vision and Pattern Recognition, 24364–24373.' mla: 'Iofinova, Eugenia B., et al. “Bias in Pruned Vision Models: In-Depth Analysis and Countermeasures.” 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–73, doi:10.1109/cvpr52729.2023.02334.' short: E.B. Iofinova, E.-A. Peste, D.-A. Alistarh, in:, 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, 2023, pp. 24364–24373. conference: end_date: 2023-06-24 location: Vancouver, BC, Canada name: 'CVPR: Conference on Computer Vision and Pattern Recognition' start_date: 2023-06-17 date_created: 2024-01-10T08:42:40Z date_published: 2023-08-22T00:00:00Z date_updated: 2024-01-10T08:59:26Z day: '22' department: - _id: DaAl - _id: ChLa doi: 10.1109/cvpr52729.2023.02334 ec_funded: 1 external_id: arxiv: - '2304.12622' isi: - '001062531308068' isi: 1 language: - iso: eng main_file_link: - open_access: '1' url: https://doi.org/10.48550/arXiv.2304.12622 month: '08' oa: 1 oa_version: Preprint page: 24364-24373 project: - _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A grant_number: ' W1260-N35' name: Vienna Graduate School on Computational Optimization - _id: 268A44D6-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '805223' name: Elastic Coordination for Scalable Machine Learning publication: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition publication_identifier: eisbn: - '9798350301298' eissn: - 2575-7075 publication_status: published publisher: IEEE quality_controlled: '1' related_material: link: - relation: software url: https://github.com/IST-DASLab/pruned-vision-model-bias status: public title: 'Bias in pruned vision models: In-depth analysis and countermeasures' type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2023' ... --- _id: '14921' abstract: - lang: eng text: Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our paper fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle. acknowledgement: M. M. is partially supported by the 2019 Lopez-Loreta Prize. The authors would like to thank Eugenia Iofinova, Bernd Prach and Simone Bombari for valuable feedback on the manuscript. alternative_title: - NeurIPS article_processing_charge: No author: - first_name: Peter full_name: Súkeník, Peter id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c last_name: Súkeník - first_name: Marco full_name: Mondelli, Marco id: 27EB676C-8706-11E9-9510-7717E6697425 last_name: Mondelli orcid: 0000-0002-3242-7020 - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal for the deep unconstrained features model. In: 37th Annual Conference on Neural Information Processing Systems.' apa: Súkeník, P., Mondelli, M., & Lampert, C. (n.d.). Deep neural collapse is provably optimal for the deep unconstrained features model. In 37th Annual Conference on Neural Information Processing Systems. New Orleans, LA, United States. chicago: Súkeník, Peter, Marco Mondelli, and Christoph Lampert. “Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model.” In 37th Annual Conference on Neural Information Processing Systems, n.d. ieee: P. Súkeník, M. Mondelli, and C. Lampert, “Deep neural collapse is provably optimal for the deep unconstrained features model,” in 37th Annual Conference on Neural Information Processing Systems, New Orleans, LA, United States. ista: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal for the deep unconstrained features model. 37th Annual Conference on Neural Information Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, .' mla: Súkeník, Peter, et al. “Deep Neural Collapse Is Provably Optimal for the Deep Unconstrained Features Model.” 37th Annual Conference on Neural Information Processing Systems. short: P. Súkeník, M. Mondelli, C. Lampert, in:, 37th Annual Conference on Neural Information Processing Systems, n.d. conference: end_date: 2023-12-16 location: New Orleans, LA, United States name: 'NeurIPS: Neural Information Processing Systems' start_date: 2023-12-10 date_created: 2024-02-02T11:17:41Z date_published: 2023-12-15T00:00:00Z date_updated: 2024-02-06T07:53:26Z day: '15' department: - _id: MaMo - _id: ChLa external_id: arxiv: - '2305.13165' language: - iso: eng main_file_link: - open_access: '1' url: ' https://doi.org/10.48550/arXiv.2305.13165' month: '12' oa: 1 oa_version: Preprint project: - _id: 059876FA-7A3F-11EA-A408-12923DDC885E name: Prix Lopez-Loretta 2019 - Marco Mondelli publication: 37th Annual Conference on Neural Information Processing Systems publication_status: inpress quality_controlled: '1' status: public title: Deep neural collapse is provably optimal for the deep unconstrained features model type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2023' ... --- _id: '15039' abstract: - lang: eng text: 'A crucial property for achieving secure, trustworthy and interpretable deep learning systems is their robustness: small changes to a system''s inputs should not result in large changes to its outputs. Mathematically, this means one strives for networks with a small Lipschitz constant. Several recent works have focused on how to construct such Lipschitz networks, typically by imposing constraints on the weight matrices. In this work, we study an orthogonal aspect, namely the role of the activation function. We show that commonly used activation functions, such as MaxMin, as well as all piece-wise linear ones with two segments unnecessarily restrict the class of representable functions, even in the simplest one-dimensional setting. We furthermore introduce the new N-activation function that is provably more expressive than currently popular activation functions. We provide code at this https URL.' article_number: '2311.06103' article_processing_charge: No author: - first_name: Bernd full_name: Prach, Bernd id: 2D561D42-C427-11E9-89B4-9C1AE6697425 last_name: Prach - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: Prach B, Lampert C. 1-Lipschitz neural networks are more expressive with N-activations. arXiv. doi:10.48550/ARXIV.2311.06103 apa: Prach, B., & Lampert, C. (n.d.). 1-Lipschitz neural networks are more expressive with N-activations. arXiv. https://doi.org/10.48550/ARXIV.2311.06103 chicago: Prach, Bernd, and Christoph Lampert. “1-Lipschitz Neural Networks Are More Expressive with N-Activations.” ArXiv, n.d. https://doi.org/10.48550/ARXIV.2311.06103. ieee: B. Prach and C. Lampert, “1-Lipschitz neural networks are more expressive with N-activations,” arXiv. . ista: Prach B, Lampert C. 1-Lipschitz neural networks are more expressive with N-activations. arXiv, 2311.06103. mla: Prach, Bernd, and Christoph Lampert. “1-Lipschitz Neural Networks Are More Expressive with N-Activations.” ArXiv, 2311.06103, doi:10.48550/ARXIV.2311.06103. short: B. Prach, C. Lampert, ArXiv (n.d.). date_created: 2024-02-28T17:59:32Z date_published: 2023-11-10T00:00:00Z date_updated: 2024-03-04T07:02:39Z day: '10' department: - _id: GradSch - _id: ChLa doi: 10.48550/ARXIV.2311.06103 external_id: arxiv: - '2311.06103' language: - iso: eng main_file_link: - open_access: '1' url: https://doi.org/10.48550/arXiv.2311.06103 month: '11' oa: 1 oa_version: Preprint publication: arXiv publication_status: submitted status: public title: 1-Lipschitz neural networks are more expressive with N-activations type: preprint user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2023' ... --- _id: '12660' abstract: - lang: eng text: 'We present Cross-Client Label Propagation(XCLP), a new method for transductive federated learning. XCLP estimates a data graph jointly from the data of multiple clients and computes labels for the unlabeled data by propagating label information across the graph. To avoid clients having to share their data with anyone, XCLP employs two cryptographically secure protocols: secure Hamming distance computation and secure summation. We demonstrate two distinct applications of XCLP within federated learning. In the first, we use it in a one-shot way to predict labels for unseen test points. In the second, we use it to repeatedly pseudo-label unlabeled training data in a federated semi-supervised setting. Experiments on both real federated and standard benchmark datasets show that in both applications XCLP achieves higher classification accuracy than alternative approaches.' article_number: '2210.06434' article_processing_charge: No author: - first_name: Jonathan A full_name: Scott, Jonathan A id: e499926b-f6e0-11ea-865d-9c63db0031e8 last_name: Scott - first_name: Michelle X full_name: Yeo, Michelle X id: 2D82B818-F248-11E8-B48F-1D18A9856A87 last_name: Yeo - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: Scott JA, Yeo MX, Lampert C. Cross-client Label Propagation for transductive federated learning. arXiv. doi:10.48550/arXiv.2210.06434 apa: Scott, J. A., Yeo, M. X., & Lampert, C. (n.d.). Cross-client Label Propagation for transductive federated learning. arXiv. https://doi.org/10.48550/arXiv.2210.06434 chicago: Scott, Jonathan A, Michelle X Yeo, and Christoph Lampert. “Cross-Client Label Propagation for Transductive Federated Learning.” ArXiv, n.d. https://doi.org/10.48550/arXiv.2210.06434. ieee: J. A. Scott, M. X. Yeo, and C. Lampert, “Cross-client Label Propagation for transductive federated learning,” arXiv. . ista: Scott JA, Yeo MX, Lampert C. Cross-client Label Propagation for transductive federated learning. arXiv, 2210.06434. mla: Scott, Jonathan A., et al. “Cross-Client Label Propagation for Transductive Federated Learning.” ArXiv, 2210.06434, doi:10.48550/arXiv.2210.06434. short: J.A. Scott, M.X. Yeo, C. Lampert, ArXiv (n.d.). date_created: 2023-02-20T08:21:50Z date_published: 2022-10-12T00:00:00Z date_updated: 2023-02-21T08:20:18Z day: '12' ddc: - '004' department: - _id: ChLa doi: 10.48550/arXiv.2210.06434 external_id: arxiv: - '2210.06434' file: - access_level: open_access checksum: 7ab20543fd4393f14fb857ce2e4f03c6 content_type: application/pdf creator: chl date_created: 2023-02-20T08:21:35Z date_updated: 2023-02-20T08:21:35Z file_id: '12661' file_name: 2210.06434.pdf file_size: 291893 relation: main_file success: 1 file_date_updated: 2023-02-20T08:21:35Z has_accepted_license: '1' language: - iso: eng month: '10' oa: 1 oa_version: Preprint publication: arXiv publication_status: submitted status: public title: Cross-client Label Propagation for transductive federated learning tmp: image: /images/cc_by.png legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0) short: CC BY (4.0) type: preprint user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2022' ... --- _id: '12662' abstract: - lang: eng text: 'Modern machine learning tasks often require considering not just one but multiple objectives. For example, besides the prediction quality, this could be the efficiency, robustness or fairness of the learned models, or any of their combinations. Multi-objective learning offers a natural framework for handling such problems without having to commit to early trade-offs. Surprisingly, statistical learning theory so far offers almost no insight into the generalization properties of multi-objective learning. In this work, we make first steps to fill this gap: we establish foundational generalization bounds for the multi-objective setting as well as generalization and excess bounds for learning with scalarizations. We also provide the first theoretical analysis of the relation between the Pareto-optimal sets of the true objectives and the Pareto-optimal sets of their empirical approximations from training data. In particular, we show a surprising asymmetry: all Pareto-optimal solutions can be approximated by empirically Pareto-optimal ones, but not vice versa.' article_number: '2208.13499' article_processing_charge: No author: - first_name: Peter full_name: Súkeník, Peter id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c last_name: Súkeník - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: Súkeník P, Lampert C. Generalization in Multi-objective machine learning. arXiv. doi:10.48550/arXiv.2208.13499 apa: Súkeník, P., & Lampert, C. (n.d.). Generalization in Multi-objective machine learning. arXiv. https://doi.org/10.48550/arXiv.2208.13499 chicago: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective Machine Learning.” ArXiv, n.d. https://doi.org/10.48550/arXiv.2208.13499. ieee: P. Súkeník and C. Lampert, “Generalization in Multi-objective machine learning,” arXiv. . ista: Súkeník P, Lampert C. Generalization in Multi-objective machine learning. arXiv, 2208.13499. mla: Súkeník, Peter, and Christoph Lampert. “Generalization in Multi-Objective Machine Learning.” ArXiv, 2208.13499, doi:10.48550/arXiv.2208.13499. short: P. Súkeník, C. Lampert, ArXiv (n.d.). date_created: 2023-02-20T08:23:06Z date_published: 2022-08-29T00:00:00Z date_updated: 2023-02-21T08:24:55Z day: '29' ddc: - '004' department: - _id: ChLa doi: 10.48550/arXiv.2208.13499 external_id: arxiv: - '2208.13499' has_accepted_license: '1' language: - iso: eng main_file_link: - open_access: '1' url: ' https://doi.org/10.48550/arXiv.2208.13499' month: '08' oa: 1 oa_version: Preprint publication: arXiv publication_status: submitted status: public title: Generalization in Multi-objective machine learning type: preprint user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2022' ... --- _id: '12495' abstract: - lang: eng text: "Fairness-aware learning aims at constructing classifiers that not only make accurate predictions, but also do not discriminate against specific groups. It is a fast-growing area of\r\nmachine learning with far-reaching societal impact. However, existing fair learning methods\r\nare vulnerable to accidental or malicious artifacts in the training data, which can cause\r\nthem to unknowingly produce unfair classifiers. In this work we address the problem of\r\nfair learning from unreliable training data in the robust multisource setting, where the\r\navailable training data comes from multiple sources, a fraction of which might not be representative of the true data distribution. We introduce FLEA, a filtering-based algorithm\r\nthat identifies and suppresses those data sources that would have a negative impact on\r\nfairness or accuracy if they were used for training. As such, FLEA is not a replacement of\r\nprior fairness-aware learning methods but rather an augmentation that makes any of them\r\nrobust against unreliable training data. We show the effectiveness of our approach by a\r\ndiverse range of experiments on multiple datasets. Additionally, we prove formally that\r\n–given enough data– FLEA protects the learner against corruptions as long as the fraction of\r\naffected data sources is less than half. Our source code and documentation are available at\r\nhttps://github.com/ISTAustria-CVML/FLEA." acknowledged_ssus: - _id: ScienComp acknowledgement: 'The authors would like to thank Bernd Prach, Elias Frantar, Alexandra Peste, Mahdi Nikdan, and Peter Súkeník for their helpful feedback. This research was supported by the Scientific Service Units (SSU) of IST Austria through resources provided by Scientific Computing (SciComp). This publication was made possible by an ETH AI Center postdoctoral fellowship granted to Nikola Konstantinov. Eugenia Iofinova was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35. ' article_processing_charge: No article_type: original author: - first_name: Eugenia B full_name: Iofinova, Eugenia B id: f9a17499-f6e0-11ea-865d-fdf9a3f77117 last_name: Iofinova orcid: 0000-0002-7778-3221 - first_name: Nikola H full_name: Konstantinov, Nikola H id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87 last_name: Konstantinov - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Iofinova EB, Konstantinov NH, Lampert C. FLEA: Provably robust fair multisource learning from unreliable training data. Transactions on Machine Learning Research. 2022.' apa: 'Iofinova, E. B., Konstantinov, N. H., & Lampert, C. (2022). FLEA: Provably robust fair multisource learning from unreliable training data. Transactions on Machine Learning Research. ML Research Press.' chicago: 'Iofinova, Eugenia B, Nikola H Konstantinov, and Christoph Lampert. “FLEA: Provably Robust Fair Multisource Learning from Unreliable Training Data.” Transactions on Machine Learning Research. ML Research Press, 2022.' ieee: 'E. B. Iofinova, N. H. Konstantinov, and C. Lampert, “FLEA: Provably robust fair multisource learning from unreliable training data,” Transactions on Machine Learning Research. ML Research Press, 2022.' ista: 'Iofinova EB, Konstantinov NH, Lampert C. 2022. FLEA: Provably robust fair multisource learning from unreliable training data. Transactions on Machine Learning Research.' mla: 'Iofinova, Eugenia B., et al. “FLEA: Provably Robust Fair Multisource Learning from Unreliable Training Data.” Transactions on Machine Learning Research, ML Research Press, 2022.' short: E.B. Iofinova, N.H. Konstantinov, C. Lampert, Transactions on Machine Learning Research (2022). date_created: 2023-02-02T20:29:57Z date_published: 2022-12-22T00:00:00Z date_updated: 2023-02-23T10:30:54Z day: '22' ddc: - '000' department: - _id: ChLa external_id: arxiv: - '2106.11732' file: - access_level: open_access checksum: 97c8a8470759cab597abb973ca137a3b content_type: application/pdf creator: dernst date_created: 2023-02-23T10:30:04Z date_updated: 2023-02-23T10:30:04Z file_id: '12673' file_name: 2022_TMLR_Iofinova.pdf file_size: 1948063 relation: main_file success: 1 file_date_updated: 2023-02-23T10:30:04Z has_accepted_license: '1' language: - iso: eng main_file_link: - open_access: '1' url: https://openreview.net/forum?id=XsPopigZXV month: '12' oa: 1 oa_version: Published Version project: - _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A grant_number: ' W1260-N35' name: Vienna Graduate School on Computational Optimization publication: Transactions on Machine Learning Research publication_identifier: issn: - 2835-8856 publication_status: published publisher: ML Research Press quality_controlled: '1' related_material: link: - description: source code relation: software url: https://github.com/ISTAustria-CVML/FLEA status: public title: 'FLEA: Provably robust fair multisource learning from unreliable training data' tmp: image: /images/cc_by.png legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0) short: CC BY (4.0) type: journal_article user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2022' ... --- _id: '11839' abstract: - lang: eng text: "It is a highly desirable property for deep networks to be robust against\r\nsmall input changes. One popular way to achieve this property is by designing\r\nnetworks with a small Lipschitz constant. In this work, we propose a new\r\ntechnique for constructing such Lipschitz networks that has a number of\r\ndesirable properties: it can be applied to any linear network layer\r\n(fully-connected or convolutional), it provides formal guarantees on the\r\nLipschitz constant, it is easy to implement and efficient to run, and it can be\r\ncombined with any training objective and optimization method. In fact, our\r\ntechnique is the first one in the literature that achieves all of these\r\nproperties simultaneously. Our main contribution is a rescaling-based weight\r\nmatrix parametrization that guarantees each network layer to have a Lipschitz\r\nconstant of at most 1 and results in the learned weight matrices to be close to\r\northogonal. Hence we call such layers almost-orthogonal Lipschitz (AOL).\r\nExperiments and ablation studies in the context of image classification with\r\ncertified robust accuracy confirm that AOL layers achieve results that are on\r\npar with most existing methods. Yet, they are simpler to implement and more\r\nbroadly applicable, because they do not require computationally expensive\r\nmatrix orthogonalization or inversion steps as part of the network\r\narchitecture. We provide code at https://github.com/berndprach/AOL." alternative_title: - LNCS article_processing_charge: No author: - first_name: Bernd full_name: Prach, Bernd id: 2D561D42-C427-11E9-89B4-9C1AE6697425 last_name: Prach - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Prach B, Lampert C. Almost-orthogonal layers for efficient general-purpose Lipschitz networks. In: Computer Vision – ECCV 2022. Vol 13681. Springer Nature; 2022:350-365. doi:10.1007/978-3-031-19803-8_21' apa: 'Prach, B., & Lampert, C. (2022). Almost-orthogonal layers for efficient general-purpose Lipschitz networks. In Computer Vision – ECCV 2022 (Vol. 13681, pp. 350–365). Tel Aviv, Israel: Springer Nature. https://doi.org/10.1007/978-3-031-19803-8_21' chicago: Prach, Bernd, and Christoph Lampert. “Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks.” In Computer Vision – ECCV 2022, 13681:350–65. Springer Nature, 2022. https://doi.org/10.1007/978-3-031-19803-8_21. ieee: B. Prach and C. Lampert, “Almost-orthogonal layers for efficient general-purpose Lipschitz networks,” in Computer Vision – ECCV 2022, Tel Aviv, Israel, 2022, vol. 13681, pp. 350–365. ista: 'Prach B, Lampert C. 2022. Almost-orthogonal layers for efficient general-purpose Lipschitz networks. Computer Vision – ECCV 2022. ECCV: European Conference on Computer Vision, LNCS, vol. 13681, 350–365.' mla: Prach, Bernd, and Christoph Lampert. “Almost-Orthogonal Layers for Efficient General-Purpose Lipschitz Networks.” Computer Vision – ECCV 2022, vol. 13681, Springer Nature, 2022, pp. 350–65, doi:10.1007/978-3-031-19803-8_21. short: B. Prach, C. Lampert, in:, Computer Vision – ECCV 2022, Springer Nature, 2022, pp. 350–365. conference: end_date: 2022-10-27 location: Tel Aviv, Israel name: 'ECCV: European Conference on Computer Vision' start_date: 2022-10-23 date_created: 2022-08-12T15:09:47Z date_published: 2022-10-23T00:00:00Z date_updated: 2023-05-03T08:00:46Z day: '23' department: - _id: GradSch - _id: ChLa doi: 10.1007/978-3-031-19803-8_21 external_id: arxiv: - '2208.03160' intvolume: ' 13681' language: - iso: eng main_file_link: - open_access: '1' url: ' https://doi.org/10.48550/arXiv.2208.03160' month: '10' oa: 1 oa_version: Preprint page: 350-365 publication: Computer Vision – ECCV 2022 publication_identifier: eisbn: - '9783031198038' isbn: - '9783031198021' publication_status: published publisher: Springer Nature quality_controlled: '1' scopus_import: '1' status: public title: Almost-orthogonal layers for efficient general-purpose Lipschitz networks type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 13681 year: '2022' ... --- _id: '10752' abstract: - lang: eng text: 'The digitalization of almost all aspects of our everyday lives has led to unprecedented amounts of data being freely available on the Internet. In particular social media platforms provide rich sources of user-generated data, though typically in unstructured form, and with high diversity, such as written in many different languages. Automatically identifying meaningful information in such big data resources and extracting it efficiently is one of the ongoing challenges of our time. A common step for this is sentiment analysis, which forms the foundation for tasks such as opinion mining or trend prediction. Unfortunately, publicly available tools for this task are almost exclusively available for English-language texts. Consequently, a large fraction of the Internet users, who do not communicate in English, are ignored in automatized studies, a phenomenon called rare-language discrimination.In this work we propose a technique to overcome this problem by a truly multi-lingual model, which can be trained automatically without linguistic knowledge or even the ability to read the many target languages. The main step is to combine self-annotation, specifically the use of emoticons as a proxy for labels, with multi-lingual sentence representations.To evaluate our method we curated several large datasets from data obtained via the free Twitter streaming API. The results show that our proposed multi-lingual training is able to achieve sentiment predictions at the same quality level for rare languages as for frequent ones, and in particular clearly better than what mono-lingual training achieves on the same data. ' article_processing_charge: No author: - first_name: Jasmin full_name: Lampert, Jasmin last_name: Lampert - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0002-4561-241X citation: ama: 'Lampert J, Lampert C. Overcoming rare-language discrimination in multi-lingual sentiment analysis. In: 2021 IEEE International Conference on Big Data. IEEE; 2022:5185-5192. doi:10.1109/bigdata52589.2021.9672003' apa: 'Lampert, J., & Lampert, C. (2022). Overcoming rare-language discrimination in multi-lingual sentiment analysis. In 2021 IEEE International Conference on Big Data (pp. 5185–5192). Orlando, FL, United States: IEEE. https://doi.org/10.1109/bigdata52589.2021.9672003' chicago: Lampert, Jasmin, and Christoph Lampert. “Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis.” In 2021 IEEE International Conference on Big Data, 5185–92. IEEE, 2022. https://doi.org/10.1109/bigdata52589.2021.9672003. ieee: J. Lampert and C. Lampert, “Overcoming rare-language discrimination in multi-lingual sentiment analysis,” in 2021 IEEE International Conference on Big Data, Orlando, FL, United States, 2022, pp. 5185–5192. ista: 'Lampert J, Lampert C. 2022. Overcoming rare-language discrimination in multi-lingual sentiment analysis. 2021 IEEE International Conference on Big Data. Big Data: International Conference on Big Data, 5185–5192.' mla: Lampert, Jasmin, and Christoph Lampert. “Overcoming Rare-Language Discrimination in Multi-Lingual Sentiment Analysis.” 2021 IEEE International Conference on Big Data, IEEE, 2022, pp. 5185–92, doi:10.1109/bigdata52589.2021.9672003. short: J. Lampert, C. Lampert, in:, 2021 IEEE International Conference on Big Data, IEEE, 2022, pp. 5185–5192. conference: end_date: 2021-12-18 location: Orlando, FL, United States name: 'Big Data: International Conference on Big Data' start_date: 2021-12-15 date_created: 2022-02-10T14:08:23Z date_published: 2022-01-13T00:00:00Z date_updated: 2023-08-02T14:27:50Z day: '13' department: - _id: ChLa doi: 10.1109/bigdata52589.2021.9672003 external_id: isi: - '000800559505036' isi: 1 language: - iso: eng month: '01' oa_version: None page: 5185-5192 publication: 2021 IEEE International Conference on Big Data publication_identifier: isbn: - '9781665439022' publication_status: published publisher: IEEE quality_controlled: '1' status: public title: Overcoming rare-language discrimination in multi-lingual sentiment analysis type: conference user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8 year: '2022' ... --- _id: '12161' abstract: - lang: eng text: 'We introduce LIMES, a new method for learning with non-stationary streaming data, inspired by the recent success of meta-learning. The main idea is not to attempt to learn a single classifier that would have to work well across all occurring data distributions, nor many separate classifiers, but to exploit a hybrid strategy: we learn a single set of model parameters from which a specific classifier for any specific data distribution is derived via classifier adaptation. Assuming a multiclass classification setting with class-prior shift, the adaptation step can be performed analytically with only the classifier’s bias terms being affected. Another contribution of our work is an extrapolation step that predicts suitable adaptation parameters for future time steps based on the previous data. In combination, we obtain a lightweight procedure for learning from streaming data with varying class distribution that adds no trainable parameters and almost no memory or computational overhead compared to training a single model. Experiments on a set of exemplary tasks using Twitter data show that LIMES achieves higher accuracy than alternative approaches, especially with respect to the relevant real-world metric of lowest within-day accuracy.' article_processing_charge: No author: - first_name: Paulina full_name: Tomaszewska, Paulina last_name: Tomaszewska - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Tomaszewska P, Lampert C. Lightweight conditional model extrapolation for streaming data under class-prior shift. In: 26th International Conference on Pattern Recognition. Vol 2022. Institute of Electrical and Electronics Engineers; 2022:2128-2134. doi:10.1109/icpr56361.2022.9956195' apa: 'Tomaszewska, P., & Lampert, C. (2022). Lightweight conditional model extrapolation for streaming data under class-prior shift. In 26th International Conference on Pattern Recognition (Vol. 2022, pp. 2128–2134). Montreal, Canada: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/icpr56361.2022.9956195' chicago: Tomaszewska, Paulina, and Christoph Lampert. “Lightweight Conditional Model Extrapolation for Streaming Data under Class-Prior Shift.” In 26th International Conference on Pattern Recognition, 2022:2128–34. Institute of Electrical and Electronics Engineers, 2022. https://doi.org/10.1109/icpr56361.2022.9956195. ieee: P. Tomaszewska and C. Lampert, “Lightweight conditional model extrapolation for streaming data under class-prior shift,” in 26th International Conference on Pattern Recognition, Montreal, Canada, 2022, vol. 2022, pp. 2128–2134. ista: 'Tomaszewska P, Lampert C. 2022. Lightweight conditional model extrapolation for streaming data under class-prior shift. 26th International Conference on Pattern Recognition. ICPR: International Conference on Pattern Recognition vol. 2022, 2128–2134.' mla: Tomaszewska, Paulina, and Christoph Lampert. “Lightweight Conditional Model Extrapolation for Streaming Data under Class-Prior Shift.” 26th International Conference on Pattern Recognition, vol. 2022, Institute of Electrical and Electronics Engineers, 2022, pp. 2128–34, doi:10.1109/icpr56361.2022.9956195. short: P. Tomaszewska, C. Lampert, in:, 26th International Conference on Pattern Recognition, Institute of Electrical and Electronics Engineers, 2022, pp. 2128–2134. conference: end_date: 2022-08-25 location: Montreal, Canada name: 'ICPR: International Conference on Pattern Recognition' start_date: 2022-08-21 date_created: 2023-01-12T12:09:38Z date_published: 2022-11-29T00:00:00Z date_updated: 2023-08-04T09:06:34Z day: '29' department: - _id: ChLa doi: 10.1109/icpr56361.2022.9956195 external_id: arxiv: - '2206.05181' isi: - '000897707602018' intvolume: ' 2022' isi: 1 language: - iso: eng main_file_link: - open_access: '1' url: https://doi.org/10.48550/arXiv.2206.05181 month: '11' oa: 1 oa_version: Preprint page: 2128-2134 publication: 26th International Conference on Pattern Recognition publication_identifier: eisbn: - '9781665490627' eissn: - 2831-7475 publication_status: published publisher: Institute of Electrical and Electronics Engineers quality_controlled: '1' scopus_import: '1' status: public title: Lightweight conditional model extrapolation for streaming data under class-prior shift type: conference user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8 volume: 2022 year: '2022' ... --- _id: '12299' abstract: - lang: eng text: 'Transfer learning is a classic paradigm by which models pretrained on large “upstream” datasets are adapted to yield good results on “downstream” specialized datasets. Generally, more accurate models on the “upstream” dataset tend to provide better transfer accuracy “downstream”. In this work, we perform an in-depth investigation of this phenomenon in the context of convolutional neural networks (CNNs) trained on the ImageNet dataset, which have been pruned-that is, compressed by sparsifiying their connections. We consider transfer using unstructured pruned models obtained by applying several state-of-the-art pruning methods, including magnitude-based, second-order, regrowth, lottery-ticket, and regularization approaches, in the context of twelve standard transfer tasks. In a nutshell, our study shows that sparse models can match or even outperform the transfer performance of dense models, even at high sparsities, and, while doing so, can lead to significant inference and even training speedups. At the same time, we observe and analyze significant differences in the behaviour of different pruning methods. The code is available at: https://github.com/IST-DASLab/sparse-imagenet-transfer.' acknowledgement: he authors would like to sincerely thank Christoph Lampert and Nir Shavit for fruitful discussions during the development of this work, and Eldar Kurtic for experimental support. EI was supported in part by the FWF DK VGSCO, grant agreement number W1260-N35, while AP and DA acknowledge generous support by the ERC, via Starting Grant 805223 ScaleML. article_processing_charge: No author: - first_name: Eugenia B full_name: Iofinova, Eugenia B id: f9a17499-f6e0-11ea-865d-fdf9a3f77117 last_name: Iofinova orcid: 0000-0002-7778-3221 - first_name: Elena-Alexandra full_name: Peste, Elena-Alexandra id: 32D78294-F248-11E8-B48F-1D18A9856A87 last_name: Peste - first_name: Mark full_name: Kurtz, Mark last_name: Kurtz - first_name: Dan-Adrian full_name: Alistarh, Dan-Adrian id: 4A899BFC-F248-11E8-B48F-1D18A9856A87 last_name: Alistarh orcid: 0000-0003-3650-940X citation: ama: 'Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. How well do sparse ImageNet models transfer? In: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Institute of Electrical and Electronics Engineers; 2022:12256-12266. doi:10.1109/cvpr52688.2022.01195' apa: 'Iofinova, E. B., Peste, E.-A., Kurtz, M., & Alistarh, D.-A. (2022). How well do sparse ImageNet models transfer? In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 12256–12266). New Orleans, LA, United States: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/cvpr52688.2022.01195' chicago: Iofinova, Eugenia B, Elena-Alexandra Peste, Mark Kurtz, and Dan-Adrian Alistarh. “How Well Do Sparse ImageNet Models Transfer?” In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 12256–66. Institute of Electrical and Electronics Engineers, 2022. https://doi.org/10.1109/cvpr52688.2022.01195. ieee: E. B. Iofinova, E.-A. Peste, M. Kurtz, and D.-A. Alistarh, “How well do sparse ImageNet models transfer?,” in 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, United States, 2022, pp. 12256–12266. ista: 'Iofinova EB, Peste E-A, Kurtz M, Alistarh D-A. 2022. How well do sparse ImageNet models transfer? 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. CVPR: Computer Vision and Pattern Recognition, 12256–12266.' mla: Iofinova, Eugenia B., et al. “How Well Do Sparse ImageNet Models Transfer?” 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers, 2022, pp. 12256–66, doi:10.1109/cvpr52688.2022.01195. short: E.B. Iofinova, E.-A. Peste, M. Kurtz, D.-A. Alistarh, in:, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Institute of Electrical and Electronics Engineers, 2022, pp. 12256–12266. conference: end_date: 2022-06-24 location: New Orleans, LA, United States name: 'CVPR: Computer Vision and Pattern Recognition' start_date: 2022-06-18 date_created: 2023-01-16T10:06:00Z date_published: 2022-09-27T00:00:00Z date_updated: 2023-08-04T10:33:28Z day: '27' department: - _id: DaAl - _id: ChLa doi: 10.1109/cvpr52688.2022.01195 ec_funded: 1 external_id: arxiv: - '2111.13445' isi: - '000870759105034' isi: 1 language: - iso: eng main_file_link: - open_access: '1' url: https://doi.org/10.48550/arXiv.2111.13445 month: '09' oa: 1 oa_version: Preprint page: 12256-12266 project: - _id: 9B9290DE-BA93-11EA-9121-9846C619BF3A grant_number: ' W1260-N35' name: Vienna Graduate School on Computational Optimization - _id: 268A44D6-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '805223' name: Elastic Coordination for Scalable Machine Learning publication: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition publication_identifier: eissn: - 2575-7075 publication_status: published publisher: Institute of Electrical and Electronics Engineers quality_controlled: '1' related_material: record: - id: '13074' relation: dissertation_contains status: public scopus_import: '1' status: public title: How well do sparse ImageNet models transfer? type: conference user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8 year: '2022' ... --- _id: '10802' abstract: - lang: eng text: "Addressing fairness concerns about machine learning models is a crucial step towards their long-term adoption in real-world automated systems. While many approaches have been developed for training fair models from data, little is known about the robustness of these methods to data corruption. In this work we consider fairness-aware learning under worst-case data manipulations. We show that an adversary can in some situations force any learner to return an overly biased classifier, regardless of the sample size and with or without degrading\r\naccuracy, and that the strength of the excess bias increases for learning problems with underrepresented protected groups in the data. We also prove that our hardness results are tight up to constant factors. To this end, we study two natural learning algorithms that optimize for both accuracy and fairness and show that these algorithms enjoy guarantees that are order-optimal in terms of the corruption ratio and the protected groups frequencies in the large data\r\nlimit." acknowledgement: The authors thank Eugenia Iofinova and Bernd Prach for providing feedback on early versions of this paper. This publication was made possible by an ETH AI Center postdoctoral fellowship to Nikola Konstantinov. article_processing_charge: No article_type: original author: - first_name: Nikola H full_name: Konstantinov, Nikola H id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87 last_name: Konstantinov - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0002-4561-241X citation: ama: Konstantinov NH, Lampert C. Fairness-aware PAC learning from corrupted data. Journal of Machine Learning Research. 2022;23:1-60. apa: Konstantinov, N. H., & Lampert, C. (2022). Fairness-aware PAC learning from corrupted data. Journal of Machine Learning Research. ML Research Press. chicago: Konstantinov, Nikola H, and Christoph Lampert. “Fairness-Aware PAC Learning from Corrupted Data.” Journal of Machine Learning Research. ML Research Press, 2022. ieee: N. H. Konstantinov and C. Lampert, “Fairness-aware PAC learning from corrupted data,” Journal of Machine Learning Research, vol. 23. ML Research Press, pp. 1–60, 2022. ista: Konstantinov NH, Lampert C. 2022. Fairness-aware PAC learning from corrupted data. Journal of Machine Learning Research. 23, 1–60. mla: Konstantinov, Nikola H., and Christoph Lampert. “Fairness-Aware PAC Learning from Corrupted Data.” Journal of Machine Learning Research, vol. 23, ML Research Press, 2022, pp. 1–60. short: N.H. Konstantinov, C. Lampert, Journal of Machine Learning Research 23 (2022) 1–60. date_created: 2022-02-28T14:05:42Z date_published: 2022-05-01T00:00:00Z date_updated: 2023-09-26T10:44:37Z day: '01' ddc: - '004' department: - _id: ChLa external_id: arxiv: - '2102.06004' file: - access_level: open_access checksum: 9cac897b54a0ddf3a553a2c33e88cfda content_type: application/pdf creator: kschuh date_created: 2022-07-12T15:08:28Z date_updated: 2022-07-12T15:08:28Z file_id: '11570' file_name: 2022_JournalMachineLearningResearch_Konstantinov.pdf file_size: 551862 relation: main_file success: 1 file_date_updated: 2022-07-12T15:08:28Z has_accepted_license: '1' intvolume: ' 23' keyword: - Fairness - robustness - data poisoning - trustworthy machine learning - PAC learning language: - iso: eng month: '05' oa: 1 oa_version: Published Version page: 1-60 publication: Journal of Machine Learning Research publication_identifier: eissn: - 1533-7928 issn: - 1532-4435 publication_status: published publisher: ML Research Press quality_controlled: '1' related_material: record: - id: '10799' relation: dissertation_contains status: public - id: '13241' relation: shorter_version status: public scopus_import: '1' status: public title: Fairness-aware PAC learning from corrupted data tmp: image: /images/cc_by.png legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0) short: CC BY (4.0) type: journal_article user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 23 year: '2022' ... --- _id: '13241' abstract: - lang: eng text: Addressing fairness concerns about machine learning models is a crucial step towards their long-term adoption in real-world automated systems. Many approaches for training fair models from data have been developed and an implicit assumption about such algorithms is that they are able to recover a fair model, despite potential historical biases in the data. In this work we show a number of impossibility results that indicate that there is no learning algorithm that can recover a fair model when a proportion of the dataset is subject to arbitrary manipulations. Specifically, we prove that there are situations in which an adversary can force any learner to return a biased classifier, with or without degrading accuracy, and that the strength of this bias increases for learning problems with underrepresented protected groups in the data. Our results emphasize on the importance of studying further data corruption models of various strength and of establishing stricter data collection practices for fairness-aware learning. acknowledgement: "This paper is a shortened, workshop version of Konstantinov and Lampert (2021),\r\nhttps://arxiv.org/abs/2102.06004. For further results, including an analysis of algorithms achieving the lower bounds from this paper, we refer to the full version." article_processing_charge: No author: - first_name: Nikola H full_name: Konstantinov, Nikola H id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87 last_name: Konstantinov - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Konstantinov NH, Lampert C. On the impossibility of fairness-aware learning from corrupted data. In: Proceedings of Machine Learning Research. Vol 171. ML Research Press; 2022:59-83.' apa: Konstantinov, N. H., & Lampert, C. (2022). On the impossibility of fairness-aware learning from corrupted data. In Proceedings of Machine Learning Research (Vol. 171, pp. 59–83). ML Research Press. chicago: Konstantinov, Nikola H, and Christoph Lampert. “On the Impossibility of Fairness-Aware Learning from Corrupted Data.” In Proceedings of Machine Learning Research, 171:59–83. ML Research Press, 2022. ieee: N. H. Konstantinov and C. Lampert, “On the impossibility of fairness-aware learning from corrupted data,” in Proceedings of Machine Learning Research, 2022, vol. 171, pp. 59–83. ista: Konstantinov NH, Lampert C. 2022. On the impossibility of fairness-aware learning from corrupted data. Proceedings of Machine Learning Research. vol. 171, 59–83. mla: Konstantinov, Nikola H., and Christoph Lampert. “On the Impossibility of Fairness-Aware Learning from Corrupted Data.” Proceedings of Machine Learning Research, vol. 171, ML Research Press, 2022, pp. 59–83. short: N.H. Konstantinov, C. Lampert, in:, Proceedings of Machine Learning Research, ML Research Press, 2022, pp. 59–83. date_created: 2023-07-16T22:01:13Z date_published: 2022-12-01T00:00:00Z date_updated: 2023-09-26T10:44:37Z day: '01' department: - _id: ChLa external_id: arxiv: - '2102.06004' intvolume: ' 171' language: - iso: eng main_file_link: - open_access: '1' url: https://arxiv.org/abs/2102.06004 month: '12' oa: 1 oa_version: Preprint page: 59-83 publication: Proceedings of Machine Learning Research publication_identifier: eissn: - 2640-3498 publication_status: published publisher: ML Research Press quality_controlled: '1' related_material: record: - id: '10802' relation: extended_version status: public scopus_import: '1' status: public title: On the impossibility of fairness-aware learning from corrupted data type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 171 year: '2022' ... --- _id: '10799' abstract: - lang: eng text: "Because of the increasing popularity of machine learning methods, it is becoming important to understand the impact of learned components on automated decision-making systems and to guarantee that their consequences are beneficial to society. In other words, it is necessary to ensure that machine learning is sufficiently trustworthy to be used in real-world applications. This thesis studies two properties of machine learning models that are highly desirable for the\r\nsake of reliability: robustness and fairness. In the first part of the thesis we study the robustness of learning algorithms to training data corruption. Previous work has shown that machine learning models are vulnerable to a range\r\nof training set issues, varying from label noise through systematic biases to worst-case data manipulations. This is an especially relevant problem from a present perspective, since modern machine learning methods are particularly data hungry and therefore practitioners often have to rely on data collected from various external sources, e.g. from the Internet, from app users or via crowdsourcing. Naturally, such sources vary greatly in the quality and reliability of the\r\ndata they provide. With these considerations in mind, we study the problem of designing machine learning algorithms that are robust to corruptions in data coming from multiple sources. We show that, in contrast to the case of a single dataset with outliers, successful learning within this model is possible both theoretically and practically, even under worst-case data corruptions. The second part of this thesis deals with fairness-aware machine learning. There are multiple areas where machine learning models have shown promising results, but where careful considerations are required, in order to avoid discrimanative decisions taken by such learned components. Ensuring fairness can be particularly challenging, because real-world training datasets are expected to contain various forms of historical bias that may affect the learning process. In this thesis we show that data corruption can indeed render the problem of achieving fairness impossible, by tightly characterizing the theoretical limits of fair learning under worst-case data manipulations. However, assuming access to clean data, we also show how fairness-aware learning can be made practical in contexts beyond binary classification, in particular in the challenging learning to rank setting." alternative_title: - ISTA Thesis article_processing_charge: No author: - first_name: Nikola H full_name: Konstantinov, Nikola H id: 4B9D76E4-F248-11E8-B48F-1D18A9856A87 last_name: Konstantinov citation: ama: Konstantinov NH. Robustness and fairness in machine learning. 2022. doi:10.15479/at:ista:10799 apa: Konstantinov, N. H. (2022). Robustness and fairness in machine learning. Institute of Science and Technology Austria. https://doi.org/10.15479/at:ista:10799 chicago: Konstantinov, Nikola H. “Robustness and Fairness in Machine Learning.” Institute of Science and Technology Austria, 2022. https://doi.org/10.15479/at:ista:10799. ieee: N. H. Konstantinov, “Robustness and fairness in machine learning,” Institute of Science and Technology Austria, 2022. ista: Konstantinov NH. 2022. Robustness and fairness in machine learning. Institute of Science and Technology Austria. mla: Konstantinov, Nikola H. Robustness and Fairness in Machine Learning. Institute of Science and Technology Austria, 2022, doi:10.15479/at:ista:10799. short: N.H. Konstantinov, Robustness and Fairness in Machine Learning, Institute of Science and Technology Austria, 2022. date_created: 2022-02-28T13:03:49Z date_published: 2022-03-08T00:00:00Z date_updated: 2023-10-17T12:31:54Z day: '08' ddc: - '000' degree_awarded: PhD department: - _id: GradSch - _id: ChLa doi: 10.15479/at:ista:10799 ec_funded: 1 file: - access_level: open_access checksum: 626bc523ae8822d20e635d0e2d95182e content_type: application/pdf creator: nkonstan date_created: 2022-03-06T11:42:54Z date_updated: 2022-03-06T11:42:54Z file_id: '10823' file_name: thesis.pdf file_size: 4204905 relation: main_file success: 1 - access_level: closed checksum: e2ca2b88350ac8ea1515b948885cbcb1 content_type: application/x-zip-compressed creator: nkonstan date_created: 2022-03-06T11:42:57Z date_updated: 2022-03-10T12:11:48Z file_id: '10824' file_name: thesis.zip file_size: 22841103 relation: source_file file_date_updated: 2022-03-10T12:11:48Z has_accepted_license: '1' keyword: - robustness - fairness - machine learning - PAC learning - adversarial learning language: - iso: eng month: '03' oa: 1 oa_version: Published Version page: '176' project: - _id: 2564DBCA-B435-11E9-9278-68D0E5697425 call_identifier: H2020 grant_number: '665385' name: International IST Doctoral Program publication_identifier: isbn: - 978-3-99078-015-2 issn: - 2663-337X publication_status: published publisher: Institute of Science and Technology Austria related_material: record: - id: '8724' relation: part_of_dissertation status: public - id: '10803' relation: part_of_dissertation status: public - id: '10802' relation: part_of_dissertation status: public - id: '6590' relation: part_of_dissertation status: public status: public supervisor: - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 title: Robustness and fairness in machine learning type: dissertation user_id: c635000d-4b10-11ee-a964-aac5a93f6ac1 year: '2022' ... --- _id: '9210' abstract: - lang: eng text: "Modern neural networks can easily fit their training set perfectly. Surprisingly, despite being “overfit” in this way, they tend to generalize well to future data, thereby defying the classic bias–variance trade-off of machine learning theory. Of the many possible explanations, a prevalent one is that training by stochastic gradient descent (SGD) imposes an implicit bias that leads it to learn simple functions, and these simple functions generalize well. However, the specifics of this implicit bias are not well understood.\r\nIn this work, we explore the smoothness conjecture which states that SGD is implicitly biased towards learning functions that are smooth. We propose several measures to formalize the intuitive notion of smoothness, and we conduct experiments to determine whether SGD indeed implicitly optimizes for these measures. Our findings rule out the possibility that smoothness measures based on first-order derivatives are being implicitly enforced. They are supportive, though, of the smoothness conjecture for measures based on second-order derivatives." article_processing_charge: No author: - first_name: Vaclav full_name: Volhejn, Vaclav id: d5235fb4-7a6d-11eb-b254-f25d12d631a8 last_name: Volhejn - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Volhejn V, Lampert C. Does SGD implicitly optimize for smoothness? In: 42nd German Conference on Pattern Recognition. Vol 12544. LNCS. Springer; 2021:246-259. doi:10.1007/978-3-030-71278-5_18' apa: 'Volhejn, V., & Lampert, C. (2021). Does SGD implicitly optimize for smoothness? In 42nd German Conference on Pattern Recognition (Vol. 12544, pp. 246–259). Tübingen, Germany: Springer. https://doi.org/10.1007/978-3-030-71278-5_18' chicago: Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” In 42nd German Conference on Pattern Recognition, 12544:246–59. LNCS. Springer, 2021. https://doi.org/10.1007/978-3-030-71278-5_18. ieee: V. Volhejn and C. Lampert, “Does SGD implicitly optimize for smoothness?,” in 42nd German Conference on Pattern Recognition, Tübingen, Germany, 2021, vol. 12544, pp. 246–259. ista: 'Volhejn V, Lampert C. 2021. Does SGD implicitly optimize for smoothness? 42nd German Conference on Pattern Recognition. DAGM GCPR: German Conference on Pattern Recognition LNCS vol. 12544, 246–259.' mla: Volhejn, Vaclav, and Christoph Lampert. “Does SGD Implicitly Optimize for Smoothness?” 42nd German Conference on Pattern Recognition, vol. 12544, Springer, 2021, pp. 246–59, doi:10.1007/978-3-030-71278-5_18. short: V. Volhejn, C. Lampert, in:, 42nd German Conference on Pattern Recognition, Springer, 2021, pp. 246–259. conference: end_date: 2020-10-01 location: Tübingen, Germany name: 'DAGM GCPR: German Conference on Pattern Recognition ' start_date: 2020-09-28 date_created: 2021-03-01T09:01:16Z date_published: 2021-03-17T00:00:00Z date_updated: 2022-08-12T07:28:47Z day: '17' ddc: - '510' department: - _id: ChLa doi: 10.1007/978-3-030-71278-5_18 file: - access_level: open_access checksum: 3e3628ab1cf658d82524963f808004ea content_type: application/pdf creator: dernst date_created: 2022-08-12T07:27:58Z date_updated: 2022-08-12T07:27:58Z file_id: '11820' file_name: 2020_GCPR_submitted_Volhejn.pdf file_size: 420234 relation: main_file success: 1 file_date_updated: 2022-08-12T07:27:58Z has_accepted_license: '1' intvolume: ' 12544' language: - iso: eng month: '03' oa: 1 oa_version: Submitted Version page: 246-259 publication: 42nd German Conference on Pattern Recognition publication_identifier: eissn: - 1611-3349 isbn: - '9783030712778' issn: - 0302-9743 publication_status: published publisher: Springer quality_controlled: '1' scopus_import: '1' series_title: LNCS status: public title: Does SGD implicitly optimize for smoothness? type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 volume: 12544 year: '2021' ... --- _id: '9416' abstract: - lang: eng text: 'We study the inductive bias of two-layer ReLU networks trained by gradient flow. We identify a class of easy-to-learn (`orthogonally separable'') datasets, and characterise the solution that ReLU networks trained on such datasets converge to. Irrespective of network width, the solution turns out to be a combination of two max-margin classifiers: one corresponding to the positive data subset and one corresponding to the negative data subset. The proof is based on the recently introduced concept of extremal sectors, for which we prove a number of properties in the context of orthogonal separability. In particular, we prove stationarity of activation patterns from some time onwards, which enables a reduction of the ReLU network to an ensemble of linear subnetworks.' article_processing_charge: No author: - first_name: Phuong full_name: Bui Thi Mai, Phuong id: 3EC6EE64-F248-11E8-B48F-1D18A9856A87 last_name: Bui Thi Mai - first_name: Christoph full_name: Lampert, Christoph id: 40C20FD2-F248-11E8-B48F-1D18A9856A87 last_name: Lampert orcid: 0000-0001-8622-7887 citation: ama: 'Phuong M, Lampert C. The inductive bias of ReLU networks on orthogonally separable data. In: 9th International Conference on Learning Representations. ; 2021.' apa: Phuong, M., & Lampert, C. (2021). The inductive bias of ReLU networks on orthogonally separable data. In 9th International Conference on Learning Representations. Virtual. chicago: Phuong, Mary, and Christoph Lampert. “The Inductive Bias of ReLU Networks on Orthogonally Separable Data.” In 9th International Conference on Learning Representations, 2021. ieee: M. Phuong and C. Lampert, “The inductive bias of ReLU networks on orthogonally separable data,” in 9th International Conference on Learning Representations, Virtual, 2021. ista: 'Phuong M, Lampert C. 2021. The inductive bias of ReLU networks on orthogonally separable data. 9th International Conference on Learning Representations. ICLR: International Conference on Learning Representations.' mla: Phuong, Mary, and Christoph Lampert. “The Inductive Bias of ReLU Networks on Orthogonally Separable Data.” 9th International Conference on Learning Representations, 2021. short: M. Phuong, C. Lampert, in:, 9th International Conference on Learning Representations, 2021. conference: end_date: 2021-05-07 location: Virtual name: ' ICLR: International Conference on Learning Representations' start_date: 2021-05-03 date_created: 2021-05-24T11:16:46Z date_published: 2021-05-01T00:00:00Z date_updated: 2023-09-07T13:29:50Z day: '01' ddc: - '000' department: - _id: GradSch - _id: ChLa file: - access_level: open_access checksum: f34ff17017527db5ba6927f817bdd125 content_type: application/pdf creator: bphuong date_created: 2021-05-24T11:15:57Z date_updated: 2021-05-24T11:15:57Z file_id: '9417' file_name: iclr2021_conference.pdf file_size: 502356 relation: main_file file_date_updated: 2021-05-24T11:15:57Z has_accepted_license: '1' language: - iso: eng main_file_link: - open_access: '1' url: https://openreview.net/pdf?id=krz7T0xU9Z_ month: '05' oa: 1 oa_version: Published Version publication: 9th International Conference on Learning Representations publication_status: published quality_controlled: '1' related_material: record: - id: '9418' relation: dissertation_contains status: public scopus_import: '1' status: public title: The inductive bias of ReLU networks on orthogonally separable data type: conference user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87 year: '2021' ...