---
_id: '15172'
abstract:
- lang: eng
text: 'We propose a novel approach to concentration for non-independent random variables.
The main idea is to “pretend” that the random variables are independent and pay
a multiplicative price measuring how far they are from actually being independent.
This price is encapsulated in the Hellinger integral between the joint and the
product of the marginals, which is then upper bounded leveraging tensorisation
properties. Our bounds represent a natural generalisation of concentration inequalities
in the presence of dependence: we recover exactly the classical bounds (McDiarmid’s
inequality) when the random variables are independent. Furthermore, in a “large
deviations” regime, we obtain the same decay in the probability as for the independent
case, even when the random variables display non-trivial dependencies. To show
this, we consider a number of applications of interest. First, we provide a bound
for Markov chains with finite state space. Then, we consider the Simple Symmetric
Random Walk, which is a non-contracting Markov chain, and a non-Markovian setting
in which the stochastic process depends on its entire past. To conclude, we propose
an application to Markov Chain Monte Carlo methods, where our approach leads to
an improved lower bound on the minimum burn-in period required to reach a certain
accuracy. In all of these settings, we provide a regime of parameters in which
our bound fares better than what the state of the art can provide.'
article_processing_charge: No
article_type: original
author:
- first_name: Amedeo Roberto
full_name: Esposito, Amedeo Roberto
id: 9583e921-e1ad-11ec-9862-cef099626dc9
last_name: Esposito
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: Esposito AR, Mondelli M. Concentration without independence via information
measures. IEEE Transactions on Information Theory. doi:10.1109/TIT.2024.3367767
apa: Esposito, A. R., & Mondelli, M. (n.d.). Concentration without independence
via information measures. IEEE Transactions on Information Theory. IEEE.
https://doi.org/10.1109/TIT.2024.3367767
chicago: Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence
via Information Measures.” IEEE Transactions on Information Theory. IEEE,
n.d. https://doi.org/10.1109/TIT.2024.3367767.
ieee: A. R. Esposito and M. Mondelli, “Concentration without independence via information
measures,” IEEE Transactions on Information Theory. IEEE.
ista: Esposito AR, Mondelli M. Concentration without independence via information
measures. IEEE Transactions on Information Theory.
mla: Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence
via Information Measures.” IEEE Transactions on Information Theory, IEEE,
doi:10.1109/TIT.2024.3367767.
short: A.R. Esposito, M. Mondelli, IEEE Transactions on Information Theory (n.d.).
date_created: 2024-03-24T23:01:00Z
date_published: 2024-02-20T00:00:00Z
date_updated: 2024-03-25T07:15:51Z
day: '20'
department:
- _id: MaMo
doi: 10.1109/TIT.2024.3367767
external_id:
arxiv:
- '2303.07245'
language:
- iso: eng
month: '02'
oa_version: None
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: IEEE Transactions on Information Theory
publication_identifier:
eissn:
- 1557-9654
issn:
- 0018-9448
publication_status: inpress
publisher: IEEE
quality_controlled: '1'
related_material:
record:
- id: '14922'
relation: earlier_version
status: public
scopus_import: '1'
status: public
title: Concentration without independence via information measures
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2024'
...
---
_id: '13315'
abstract:
- lang: eng
text: How do statistical dependencies in measurement noise influence high-dimensional
inference? To answer this, we study the paradigmatic spiked matrix model of principal
components analysis (PCA), where a rank-one matrix is corrupted by additive noise.
We go beyond the usual independence assumption on the noise entries, by drawing
the noise from a low-order polynomial orthogonal matrix ensemble. The resulting
noise correlations make the setting relevant for applications but analytically
challenging. We provide characterization of the Bayes optimal limits of inference
in this model. If the spike is rotation invariant, we show that standard spectral
PCA is optimal. However, for more general priors, both PCA and the existing approximate
message-passing algorithm (AMP) fall short of achieving the information-theoretic
limits, which we compute using the replica method from statistical physics. We
thus propose an AMP, inspired by the theory of adaptive Thouless–Anderson–Palmer
equations, which is empirically observed to saturate the conjectured theoretical
limit. This AMP comes with a rigorous state evolution analysis tracking its performance.
Although we focus on specific noise distributions, our methodology can be generalized
to a wide class of trace matrix ensembles at the cost of more involved expressions.
Finally, despite the seemingly strong assumption of rotation-invariant noise,
our theory empirically predicts algorithmic performance on real data, pointing
at strong universality properties.
acknowledgement: J.B. was funded by the European Union (ERC, CHORAL, project number
101039794). Views and opinions expressed are however those of the author(s) only
and do not necessarily reflect those of the European Union or the European Research
Council. Neither the European Union nor the granting authority can be held responsible
for them. M.M. was supported by the 2019 Lopez-Loreta Prize. We would like to thank
the reviewers for the insightful comments and, in particular, for suggesting the
BAMP-inspired denoisers leading to AMP-AP.
article_number: e2302028120
article_processing_charge: Yes (in subscription journal)
article_type: original
author:
- first_name: Jean
full_name: Barbier, Jean
last_name: Barbier
- first_name: Francesco
full_name: Camilli, Francesco
last_name: Camilli
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Manuel
full_name: Sáenz, Manuel
last_name: Sáenz
citation:
ama: Barbier J, Camilli F, Mondelli M, Sáenz M. Fundamental limits in structured
principal component analysis and how to reach them. Proceedings of the National
Academy of Sciences of the United States of America. 2023;120(30). doi:10.1073/pnas.2302028120
apa: Barbier, J., Camilli, F., Mondelli, M., & Sáenz, M. (2023). Fundamental
limits in structured principal component analysis and how to reach them. Proceedings
of the National Academy of Sciences of the United States of America. National
Academy of Sciences. https://doi.org/10.1073/pnas.2302028120
chicago: Barbier, Jean, Francesco Camilli, Marco Mondelli, and Manuel Sáenz. “Fundamental
Limits in Structured Principal Component Analysis and How to Reach Them.” Proceedings
of the National Academy of Sciences of the United States of America. National
Academy of Sciences, 2023. https://doi.org/10.1073/pnas.2302028120.
ieee: J. Barbier, F. Camilli, M. Mondelli, and M. Sáenz, “Fundamental limits in
structured principal component analysis and how to reach them,” Proceedings
of the National Academy of Sciences of the United States of America, vol.
120, no. 30. National Academy of Sciences, 2023.
ista: Barbier J, Camilli F, Mondelli M, Sáenz M. 2023. Fundamental limits in structured
principal component analysis and how to reach them. Proceedings of the National
Academy of Sciences of the United States of America. 120(30), e2302028120.
mla: Barbier, Jean, et al. “Fundamental Limits in Structured Principal Component
Analysis and How to Reach Them.” Proceedings of the National Academy of Sciences
of the United States of America, vol. 120, no. 30, e2302028120, National Academy
of Sciences, 2023, doi:10.1073/pnas.2302028120.
short: J. Barbier, F. Camilli, M. Mondelli, M. Sáenz, Proceedings of the National
Academy of Sciences of the United States of America 120 (2023).
date_created: 2023-07-30T22:01:02Z
date_published: 2023-07-25T00:00:00Z
date_updated: 2023-10-17T11:44:55Z
day: '25'
ddc:
- '000'
department:
- _id: MaMo
doi: 10.1073/pnas.2302028120
external_id:
pmid:
- '37463204'
file:
- access_level: open_access
checksum: 1fc06228afdb3aa80cf8e7766bcf9dc5
content_type: application/pdf
creator: dernst
date_created: 2023-07-31T07:30:48Z
date_updated: 2023-07-31T07:30:48Z
file_id: '13323'
file_name: 2023_PNAS_Barbier.pdf
file_size: 995933
relation: main_file
success: 1
file_date_updated: 2023-07-31T07:30:48Z
has_accepted_license: '1'
intvolume: ' 120'
issue: '30'
language:
- iso: eng
month: '07'
oa: 1
oa_version: Published Version
pmid: 1
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the National Academy of Sciences of the United States
of America
publication_identifier:
eissn:
- 1091-6490
publication_status: published
publisher: National Academy of Sciences
quality_controlled: '1'
related_material:
link:
- relation: software
url: https://github.com/fcamilli95/Structured-PCA-
scopus_import: '1'
status: public
title: Fundamental limits in structured principal component analysis and how to reach
them
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 120
year: '2023'
...
---
_id: '14459'
abstract:
- lang: eng
text: Autoencoders are a popular model in many branches of machine learning and
lossy data compression. However, their fundamental limits, the performance of
gradient methods and the features learnt during optimization remain poorly understood,
even in the two-layer setting. In fact, earlier work has considered either linear
autoencoders or specific training regimes (leading to vanishing or diverging compression
rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders
trained in the challenging proportional regime in which the input dimension scales
linearly with the size of the representation. Our results characterize the minimizers
of the population risk, and show that such minimizers are achieved by gradient
methods; their structure is also unveiled, thus leading to a concise description
of the features obtained via training. For the special case of a sign activation
function, our analysis establishes the fundamental limits for the lossy compression
of Gaussian sources via (shallow) autoencoders. Finally, while the results are
proved for Gaussian data, numerical simulations on standard datasets display the
universality of the theoretical predictions.
acknowledgement: Aleksandr Shevchenko, Kevin Kogler and Marco Mondelli are supported
by the 2019 Lopez-Loreta Prize. Hamed Hassani acknowledges the support by the NSF
CIF award (1910056) and the NSF Institute for CORE Emerging Methods in Data Science
(EnCORE).
alternative_title:
- PMLR
article_processing_charge: No
author:
- first_name: Aleksandr
full_name: Shevchenko, Aleksandr
id: F2B06EC2-C99E-11E9-89F0-752EE6697425
last_name: Shevchenko
- first_name: Kevin
full_name: Kögler, Kevin
id: 94ec913c-dc85-11ea-9058-e5051ab2428b
last_name: Kögler
- first_name: Hamed
full_name: Hassani, Hamed
last_name: Hassani
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Shevchenko A, Kögler K, Hassani H, Mondelli M. Fundamental limits of two-layer
autoencoders, and achieving them with gradient methods. In: Proceedings of
the 40th International Conference on Machine Learning. Vol 202. ML Research
Press; 2023:31151-31209.'
apa: 'Shevchenko, A., Kögler, K., Hassani, H., & Mondelli, M. (2023). Fundamental
limits of two-layer autoencoders, and achieving them with gradient methods. In
Proceedings of the 40th International Conference on Machine Learning (Vol.
202, pp. 31151–31209). Honolulu, Hawaii, HI, United States: ML Research Press.'
chicago: Shevchenko, Aleksandr, Kevin Kögler, Hamed Hassani, and Marco Mondelli.
“Fundamental Limits of Two-Layer Autoencoders, and Achieving Them with Gradient
Methods.” In Proceedings of the 40th International Conference on Machine Learning,
202:31151–209. ML Research Press, 2023.
ieee: A. Shevchenko, K. Kögler, H. Hassani, and M. Mondelli, “Fundamental limits
of two-layer autoencoders, and achieving them with gradient methods,” in Proceedings
of the 40th International Conference on Machine Learning, Honolulu, Hawaii,
HI, United States, 2023, vol. 202, pp. 31151–31209.
ista: 'Shevchenko A, Kögler K, Hassani H, Mondelli M. 2023. Fundamental limits of
two-layer autoencoders, and achieving them with gradient methods. Proceedings
of the 40th International Conference on Machine Learning. ICML: International
Conference on Machine Learning, PMLR, vol. 202, 31151–31209.'
mla: Shevchenko, Aleksandr, et al. “Fundamental Limits of Two-Layer Autoencoders,
and Achieving Them with Gradient Methods.” Proceedings of the 40th International
Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 31151–209.
short: A. Shevchenko, K. Kögler, H. Hassani, M. Mondelli, in:, Proceedings of the
40th International Conference on Machine Learning, ML Research Press, 2023, pp.
31151–31209.
conference:
end_date: 2023-07-29
location: Honolulu, Hawaii, HI, United States
name: 'ICML: International Conference on Machine Learning'
start_date: 2023-07-23
date_created: 2023-10-29T23:01:17Z
date_published: 2023-07-30T00:00:00Z
date_updated: 2023-10-31T08:52:28Z
day: '30'
department:
- _id: MaMo
- _id: DaAl
external_id:
arxiv:
- '2212.13468'
intvolume: ' 202'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2212.13468
month: '07'
oa: 1
oa_version: Preprint
page: 31151-31209
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 40th International Conference on Machine Learning
publication_identifier:
eissn:
- 2640-3498
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
scopus_import: '1'
status: public
title: Fundamental limits of two-layer autoencoders, and achieving them with gradient
methods
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '13321'
abstract:
- lang: eng
text: We consider the problem of reconstructing the signal and the hidden variables
from observations coming from a multi-layer network with rotationally invariant
weight matrices. The multi-layer structure models inference from deep generative
priors, and the rotational invariance imposed on the weights generalizes the i.i.d.
Gaussian assumption by allowing for a complex correlation structure, which is
typical in applications. In this work, we present a new class of approximate message
passing (AMP) algorithms and give a state evolution recursion which precisely
characterizes their performance in the large system limit. In contrast with the
existing multi-layer VAMP (ML-VAMP) approach, our proposed AMP – dubbed multilayer
rotationally invariant generalized AMP (ML-RI-GAMP) – provides a natural generalization
beyond Gaussian designs, in the sense that it recovers the existing Gaussian AMP
as a special case. Furthermore, ML-RI-GAMP exhibits a significantly lower complexity
than ML-VAMP, as the computationally intensive singular value decomposition is
replaced by an estimation of the moments of the design matrices. Finally, our
numerical results show that this complexity gain comes at little to no cost in
the performance of the algorithm.
acknowledgement: Marco Mondelli was partially supported by the 2019 Lopez-Loreta prize.
article_processing_charge: No
author:
- first_name: Yizhou
full_name: Xu, Yizhou
last_name: Xu
- first_name: Tian Qi
full_name: Hou, Tian Qi
last_name: Hou
- first_name: Shan Suo
full_name: Liang, Shan Suo
last_name: Liang
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Xu Y, Hou TQ, Liang SS, Mondelli M. Approximate message passing for multi-layer
estimation in rotationally invariant models. In: 2023 IEEE Information Theory
Workshop. Institute of Electrical and Electronics Engineers; 2023:294-298.
doi:10.1109/ITW55543.2023.10160238'
apa: 'Xu, Y., Hou, T. Q., Liang, S. S., & Mondelli, M. (2023). Approximate message
passing for multi-layer estimation in rotationally invariant models. In 2023
IEEE Information Theory Workshop (pp. 294–298). Saint-Malo, France: Institute
of Electrical and Electronics Engineers. https://doi.org/10.1109/ITW55543.2023.10160238'
chicago: Xu, Yizhou, Tian Qi Hou, Shan Suo Liang, and Marco Mondelli. “Approximate
Message Passing for Multi-Layer Estimation in Rotationally Invariant Models.”
In 2023 IEEE Information Theory Workshop, 294–98. Institute of Electrical
and Electronics Engineers, 2023. https://doi.org/10.1109/ITW55543.2023.10160238.
ieee: Y. Xu, T. Q. Hou, S. S. Liang, and M. Mondelli, “Approximate message passing
for multi-layer estimation in rotationally invariant models,” in 2023 IEEE
Information Theory Workshop, Saint-Malo, France, 2023, pp. 294–298.
ista: 'Xu Y, Hou TQ, Liang SS, Mondelli M. 2023. Approximate message passing for
multi-layer estimation in rotationally invariant models. 2023 IEEE Information
Theory Workshop. ITW: Information Theory Workshop, 294–298.'
mla: Xu, Yizhou, et al. “Approximate Message Passing for Multi-Layer Estimation
in Rotationally Invariant Models.” 2023 IEEE Information Theory Workshop,
Institute of Electrical and Electronics Engineers, 2023, pp. 294–98, doi:10.1109/ITW55543.2023.10160238.
short: Y. Xu, T.Q. Hou, S.S. Liang, M. Mondelli, in:, 2023 IEEE Information Theory
Workshop, Institute of Electrical and Electronics Engineers, 2023, pp. 294–298.
conference:
end_date: 2023-04-28
location: Saint-Malo, France
name: 'ITW: Information Theory Workshop'
start_date: 2023-04-23
date_created: 2023-07-30T22:01:04Z
date_published: 2023-05-01T00:00:00Z
date_updated: 2023-12-13T11:35:46Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/ITW55543.2023.10160238
external_id:
arxiv:
- '2212.01572'
isi:
- '001031733100053'
isi: 1
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2212.01572
month: '05'
oa: 1
oa_version: Preprint
page: 294-298
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 2023 IEEE Information Theory Workshop
publication_identifier:
eissn:
- 2475-4218
isbn:
- '9798350301496'
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Approximate message passing for multi-layer estimation in rotationally invariant
models
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '12859'
abstract:
- lang: eng
text: 'Machine learning models are vulnerable to adversarial perturbations, and
a thought-provoking paper by Bubeck and Sellke has analyzed this phenomenon through
the lens of over-parameterization: interpolating smoothly the data requires significantly
more parameters than simply memorizing it. However, this "universal" law provides
only a necessary condition for robustness, and it is unable to discriminate between
models. In this paper, we address these gaps by focusing on empirical risk minimization
in two prototypical settings, namely, random features and the neural tangent kernel
(NTK). We prove that, for random features, the model is not robust for any degree
of over-parameterization, even when the necessary condition coming from the universal
law of robustness is satisfied. In contrast, for even activations, the NTK model
meets the universal lower bound, and it is robust as soon as the necessary condition
on over-parameterization is fulfilled. This also addresses a conjecture in prior
work by Bubeck, Li and Nagaraj. Our analysis decouples the effect of the kernel
of the model from an "interaction matrix", which describes the interaction with
the test data and captures the effect of the activation. Our theoretical results
are corroborated by numerical evidence on both synthetic and standard datasets
(MNIST, CIFAR-10).'
acknowledgement: "Simone Bombari and Marco Mondelli were partially supported by the
2019 Lopez-Loreta prize, and\r\nthe authors would like to thank Hamed Hassani for
helpful discussions.\r\n"
alternative_title:
- PMLR
article_processing_charge: No
author:
- first_name: Simone
full_name: Bombari, Simone
id: ca726dda-de17-11ea-bc14-f9da834f63aa
last_name: Bombari
- first_name: Shayan
full_name: Kiyani, Shayan
id: f5a2b424-e339-11ed-8435-ff3b4fe70cf8
last_name: Kiyani
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Bombari S, Kiyani S, Mondelli M. Beyond the universal law of robustness: Sharper
laws for random features and neural tangent kernels. In: Proceedings of the
40th International Conference on Machine Learning. Vol 202. ML Research Press;
2023:2738-2776.'
apa: 'Bombari, S., Kiyani, S., & Mondelli, M. (2023). Beyond the universal law
of robustness: Sharper laws for random features and neural tangent kernels. In
Proceedings of the 40th International Conference on Machine Learning (Vol.
202, pp. 2738–2776). Honolulu, HI, United States: ML Research Press.'
chicago: 'Bombari, Simone, Shayan Kiyani, and Marco Mondelli. “Beyond the Universal
Law of Robustness: Sharper Laws for Random Features and Neural Tangent Kernels.”
In Proceedings of the 40th International Conference on Machine Learning,
202:2738–76. ML Research Press, 2023.'
ieee: 'S. Bombari, S. Kiyani, and M. Mondelli, “Beyond the universal law of robustness:
Sharper laws for random features and neural tangent kernels,” in Proceedings
of the 40th International Conference on Machine Learning, Honolulu, HI, United
States, 2023, vol. 202, pp. 2738–2776.'
ista: 'Bombari S, Kiyani S, Mondelli M. 2023. Beyond the universal law of robustness:
Sharper laws for random features and neural tangent kernels. Proceedings of the
40th International Conference on Machine Learning. ICML: International Conference
on Machine Learning, PMLR, vol. 202, 2738–2776.'
mla: 'Bombari, Simone, et al. “Beyond the Universal Law of Robustness: Sharper Laws
for Random Features and Neural Tangent Kernels.” Proceedings of the 40th International
Conference on Machine Learning, vol. 202, ML Research Press, 2023, pp. 2738–76.'
short: S. Bombari, S. Kiyani, M. Mondelli, in:, Proceedings of the 40th International
Conference on Machine Learning, ML Research Press, 2023, pp. 2738–2776.
conference:
end_date: 2023-07-29
location: Honolulu, HI, United States
name: 'ICML: International Conference on Machine Learning'
start_date: 2023-07-23
date_created: 2023-04-23T16:11:03Z
date_published: 2023-10-27T00:00:00Z
date_updated: 2024-02-06T07:42:00Z
day: '27'
department:
- _id: GradSch
- _id: MaMo
external_id:
arxiv:
- '2302.01629'
intvolume: ' 202'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://arxiv.org/abs/2302.01629
month: '10'
oa: 1
oa_version: Preprint
page: 2738-2776
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 40th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
related_material:
link:
- relation: software
url: https://github.com/simone-bombari/beyond-universal-robustness
status: public
title: 'Beyond the universal law of robustness: Sharper laws for random features and
neural tangent kernels'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 202
year: '2023'
...
---
_id: '14921'
abstract:
- lang: eng
text: Neural collapse (NC) refers to the surprising structure of the last layer
of deep neural networks in the terminal phase of gradient descent training. Recently,
an increasing amount of experimental evidence has pointed to the propagation of
NC to earlier layers of neural networks. However, while the NC in the last layer
is well studied theoretically, much less is known about its multi-layered counterpart
- deep neural collapse (DNC). In particular, existing work focuses either on linear
layers or only on the last two layers at the price of an extra assumption. Our
paper fills this gap by generalizing the established analytical framework for
NC - the unconstrained features model - to multiple non-linear layers. Our key
technical contribution is to show that, in a deep unconstrained features model,
the unique global optimum for binary classification exhibits all the properties
typical of DNC. This explains the existing experimental evidence of DNC. We also
empirically show that (i) by optimizing deep unconstrained features models via
gradient descent, the resulting solution agrees well with our theory, and (ii)
trained networks recover the unconstrained features suitable for the occurrence
of DNC, thus supporting the validity of this modeling principle.
acknowledgement: M. M. is partially supported by the 2019 Lopez-Loreta Prize. The
authors would like to thank Eugenia Iofinova, Bernd Prach and Simone Bombari for
valuable feedback on the manuscript.
alternative_title:
- NeurIPS
article_processing_charge: No
author:
- first_name: Peter
full_name: Súkeník, Peter
id: d64d6a8d-eb8e-11eb-b029-96fd216dec3c
last_name: Súkeník
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Christoph
full_name: Lampert, Christoph
id: 40C20FD2-F248-11E8-B48F-1D18A9856A87
last_name: Lampert
orcid: 0000-0001-8622-7887
citation:
ama: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
for the deep unconstrained features model. In: 37th Annual Conference on Neural
Information Processing Systems.'
apa: Súkeník, P., Mondelli, M., & Lampert, C. (n.d.). Deep neural collapse is
provably optimal for the deep unconstrained features model. In 37th Annual
Conference on Neural Information Processing Systems. New Orleans, LA, United
States.
chicago: Súkeník, Peter, Marco Mondelli, and Christoph Lampert. “Deep Neural Collapse
Is Provably Optimal for the Deep Unconstrained Features Model.” In 37th Annual
Conference on Neural Information Processing Systems, n.d.
ieee: P. Súkeník, M. Mondelli, and C. Lampert, “Deep neural collapse is provably
optimal for the deep unconstrained features model,” in 37th Annual Conference
on Neural Information Processing Systems, New Orleans, LA, United States.
ista: 'Súkeník P, Mondelli M, Lampert C. Deep neural collapse is provably optimal
for the deep unconstrained features model. 37th Annual Conference on Neural Information
Processing Systems. NeurIPS: Neural Information Processing Systems, NeurIPS, .'
mla: Súkeník, Peter, et al. “Deep Neural Collapse Is Provably Optimal for the Deep
Unconstrained Features Model.” 37th Annual Conference on Neural Information
Processing Systems.
short: P. Súkeník, M. Mondelli, C. Lampert, in:, 37th Annual Conference on Neural
Information Processing Systems, n.d.
conference:
end_date: 2023-12-16
location: New Orleans, LA, United States
name: 'NeurIPS: Neural Information Processing Systems'
start_date: 2023-12-10
date_created: 2024-02-02T11:17:41Z
date_published: 2023-12-15T00:00:00Z
date_updated: 2024-02-06T07:53:26Z
day: '15'
department:
- _id: MaMo
- _id: ChLa
external_id:
arxiv:
- '2305.13165'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2305.13165'
month: '12'
oa: 1
oa_version: Preprint
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 37th Annual Conference on Neural Information Processing Systems
publication_status: inpress
quality_controlled: '1'
status: public
title: Deep neural collapse is provably optimal for the deep unconstrained features
model
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '14924'
abstract:
- lang: eng
text: "The stochastic heavy ball method (SHB), also known as stochastic gradient
descent (SGD) with Polyak's momentum, is widely used in training neural networks.
However, despite the remarkable success of such algorithm in practice, its theoretical
characterization remains limited. In this paper, we focus on neural networks with
two and three layers and provide a rigorous understanding of the properties of
the solutions found by SHB: \\emph{(i)} stability after dropping out part of the
neurons, \\emph{(ii)} connectivity along a low-loss path, and \\emph{(iii)} convergence
to the global optimum.\r\nTo achieve this goal, we take a mean-field view and
relate the SHB dynamics to a certain partial differential equation in the limit
of large network widths. This mean-field perspective has inspired a recent line
of work focusing on SGD while, in contrast, our paper considers an algorithm with
momentum. More specifically, after proving existence and uniqueness of the limit
differential equations, we show convergence to the global optimum and give a quantitative
bound between the mean-field limit and the SHB dynamics of a finite-width network.
Armed with this last bound, we are able to establish the dropout-stability and
connectivity of SHB solutions."
acknowledgement: D. Wu and M. Mondelli are partially supported by the 2019 Lopez-Loreta
Prize. V. Kungurtsev was supported by the OP VVV project CZ.02.1.01/0.0/0.0/16_019/0000765
"Research Center for Informatics".
alternative_title:
- TMLR
article_processing_charge: No
author:
- first_name: Diyuan
full_name: Wu, Diyuan
id: 1a5914c2-896a-11ed-bdf8-fb80621a0635
last_name: Wu
- first_name: Vyacheslav
full_name: Kungurtsev, Vyacheslav
last_name: Kungurtsev
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Wu D, Kungurtsev V, Mondelli M. Mean-field analysis for heavy ball methods:
Dropout-stability, connectivity, and global convergence. In: Transactions on
Machine Learning Research. ML Research Press; 2023.'
apa: 'Wu, D., Kungurtsev, V., & Mondelli, M. (2023). Mean-field analysis for
heavy ball methods: Dropout-stability, connectivity, and global convergence. In
Transactions on Machine Learning Research. ML Research Press.'
chicago: 'Wu, Diyuan, Vyacheslav Kungurtsev, and Marco Mondelli. “Mean-Field Analysis
for Heavy Ball Methods: Dropout-Stability, Connectivity, and Global Convergence.”
In Transactions on Machine Learning Research. ML Research Press, 2023.'
ieee: 'D. Wu, V. Kungurtsev, and M. Mondelli, “Mean-field analysis for heavy ball
methods: Dropout-stability, connectivity, and global convergence,” in Transactions
on Machine Learning Research, 2023.'
ista: 'Wu D, Kungurtsev V, Mondelli M. 2023. Mean-field analysis for heavy ball
methods: Dropout-stability, connectivity, and global convergence. Transactions
on Machine Learning Research. , TMLR, .'
mla: 'Wu, Diyuan, et al. “Mean-Field Analysis for Heavy Ball Methods: Dropout-Stability,
Connectivity, and Global Convergence.” Transactions on Machine Learning Research,
ML Research Press, 2023.'
short: D. Wu, V. Kungurtsev, M. Mondelli, in:, Transactions on Machine Learning
Research, ML Research Press, 2023.
date_created: 2024-02-02T11:21:56Z
date_published: 2023-02-28T00:00:00Z
date_updated: 2024-02-06T08:53:15Z
day: '28'
department:
- _id: MaMo
external_id:
arxiv:
- '2210.06819'
has_accepted_license: '1'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2210.06819
month: '02'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Transactions on Machine Learning Research
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: 'Mean-field analysis for heavy ball methods: Dropout-stability, connectivity,
and global convergence'
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '14923'
abstract:
- lang: eng
text: We study the performance of a Bayesian statistician who estimates a rank-one
signal corrupted by non-symmetric rotationally invariant noise with a generic
distribution of singular values. As the signal-to-noise ratio and the noise structure
are unknown, a Gaussian setup is incorrectly assumed. We derive the exact analytic
expression for the error of the mismatched Bayes estimator and also provide the
analysis of an approximate message passing (AMP) algorithm. The first result exploits
the asymptotic behavior of spherical integrals for rectangular matrices and of
low-rank matrix perturbations; the second one relies on the design and analysis
of an auxiliary AMP. The numerical experiments show that there is a performance
gap between the AMP and Bayes estimators, which is due to the incorrect estimation
of the signal norm.
article_processing_charge: No
author:
- first_name: Teng
full_name: Fu, Teng
last_name: Fu
- first_name: YuHao
full_name: Liu, YuHao
last_name: Liu
- first_name: Jean
full_name: Barbier, Jean
last_name: Barbier
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: ShanSuo
full_name: Liang, ShanSuo
last_name: Liang
- first_name: TianQi
full_name: Hou, TianQi
last_name: Hou
citation:
ama: 'Fu T, Liu Y, Barbier J, Mondelli M, Liang S, Hou T. Mismatched estimation
of non-symmetric rank-one matrices corrupted by structured noise. In: Proceedings
of 2023 IEEE International Symposium on Information Theory. IEEE. doi:10.1109/isit54713.2023.10206671'
apa: 'Fu, T., Liu, Y., Barbier, J., Mondelli, M., Liang, S., & Hou, T. (n.d.).
Mismatched estimation of non-symmetric rank-one matrices corrupted by structured
noise. In Proceedings of 2023 IEEE International Symposium on Information Theory.
Taipei, Taiwan: IEEE. https://doi.org/10.1109/isit54713.2023.10206671'
chicago: Fu, Teng, YuHao Liu, Jean Barbier, Marco Mondelli, ShanSuo Liang, and TianQi
Hou. “Mismatched Estimation of Non-Symmetric Rank-One Matrices Corrupted by Structured
Noise.” In Proceedings of 2023 IEEE International Symposium on Information
Theory. IEEE, n.d. https://doi.org/10.1109/isit54713.2023.10206671.
ieee: T. Fu, Y. Liu, J. Barbier, M. Mondelli, S. Liang, and T. Hou, “Mismatched
estimation of non-symmetric rank-one matrices corrupted by structured noise,”
in Proceedings of 2023 IEEE International Symposium on Information Theory,
Taipei, Taiwan.
ista: 'Fu T, Liu Y, Barbier J, Mondelli M, Liang S, Hou T. Mismatched estimation
of non-symmetric rank-one matrices corrupted by structured noise. Proceedings
of 2023 IEEE International Symposium on Information Theory. ISIT: IEEE International
Symposium on Information Theory.'
mla: Fu, Teng, et al. “Mismatched Estimation of Non-Symmetric Rank-One Matrices
Corrupted by Structured Noise.” Proceedings of 2023 IEEE International Symposium
on Information Theory, IEEE, doi:10.1109/isit54713.2023.10206671.
short: T. Fu, Y. Liu, J. Barbier, M. Mondelli, S. Liang, T. Hou, in:, Proceedings
of 2023 IEEE International Symposium on Information Theory, IEEE, n.d.
conference:
end_date: 2023-06-30
location: Taipei, Taiwan
name: 'ISIT: IEEE International Symposium on Information Theory'
start_date: 2023-06-25
date_created: 2024-02-02T11:20:39Z
date_published: 2023-06-30T00:00:00Z
date_updated: 2024-02-14T14:34:03Z
day: '30'
department:
- _id: MaMo
doi: 10.1109/isit54713.2023.10206671
external_id:
arxiv:
- '2302.03306'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2302.03306
month: '06'
oa: 1
oa_version: Preprint
publication: Proceedings of 2023 IEEE International Symposium on Information Theory
publication_status: inpress
publisher: IEEE
quality_controlled: '1'
status: public
title: Mismatched estimation of non-symmetric rank-one matrices corrupted by structured
noise
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '14922'
abstract:
- lang: eng
text: 'We propose a novel approach to concentration for non-independent random variables.
The main idea is to ``pretend'''' that the random variables are independent and
pay a multiplicative price measuring how far they are from actually being independent.
This price is encapsulated in the Hellinger integral between the joint and the
product of the marginals, which is then upper bounded leveraging tensorisation
properties. Our bounds represent a natural generalisation of concentration inequalities
in the presence of dependence: we recover exactly the classical bounds (McDiarmid''s
inequality) when the random variables are independent. Furthermore, in a ``large
deviations'''' regime, we obtain the same decay in the probability as for the
independent case, even when the random variables display non-trivial dependencies.
To show this, we consider a number of applications of interest. First, we provide
a bound for Markov chains with finite state space. Then, we consider the Simple
Symmetric Random Walk, which is a non-contracting Markov chain, and a non-Markovian
setting in which the stochastic process depends on its entire past. To conclude,
we propose an application to Markov Chain Monte Carlo methods, where our approach
leads to an improved lower bound on the minimum burn-in period required to reach
a certain accuracy. In all of these settings, we provide a regime of parameters
in which our bound fares better than what the state of the art can provide.'
acknowledgement: The authors are partially supported by the 2019 Lopez-Loreta Prize.
They would also like to thank Professor Jan Maas for providing valuable suggestions
and comments on an early version of the work.
article_processing_charge: No
author:
- first_name: Amedeo Roberto
full_name: Esposito, Amedeo Roberto
id: 9583e921-e1ad-11ec-9862-cef099626dc9
last_name: Esposito
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Esposito AR, Mondelli M. Concentration without independence via information
measures. In: Proceedings of 2023 IEEE International Symposium on Information
Theory. IEEE; 2023:400-405. doi:10.1109/isit54713.2023.10206899'
apa: 'Esposito, A. R., & Mondelli, M. (2023). Concentration without independence
via information measures. In Proceedings of 2023 IEEE International Symposium
on Information Theory (pp. 400–405). Taipei, Taiwan: IEEE. https://doi.org/10.1109/isit54713.2023.10206899'
chicago: Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence
via Information Measures.” In Proceedings of 2023 IEEE International Symposium
on Information Theory, 400–405. IEEE, 2023. https://doi.org/10.1109/isit54713.2023.10206899.
ieee: A. R. Esposito and M. Mondelli, “Concentration without independence via information
measures,” in Proceedings of 2023 IEEE International Symposium on Information
Theory, Taipei, Taiwan, 2023, pp. 400–405.
ista: 'Esposito AR, Mondelli M. 2023. Concentration without independence via information
measures. Proceedings of 2023 IEEE International Symposium on Information Theory.
ISIT: IEEE International Symposium on Information Theory, 400–405.'
mla: Esposito, Amedeo Roberto, and Marco Mondelli. “Concentration without Independence
via Information Measures.” Proceedings of 2023 IEEE International Symposium
on Information Theory, IEEE, 2023, pp. 400–05, doi:10.1109/isit54713.2023.10206899.
short: A.R. Esposito, M. Mondelli, in:, Proceedings of 2023 IEEE International Symposium
on Information Theory, IEEE, 2023, pp. 400–405.
conference:
end_date: 2023-06-30
location: Taipei, Taiwan
name: 'ISIT: IEEE International Symposium on Information Theory'
start_date: 2023-06-25
date_created: 2024-02-02T11:18:40Z
date_published: 2023-06-30T00:00:00Z
date_updated: 2024-03-25T07:15:51Z
day: '30'
department:
- _id: MaMo
doi: 10.1109/isit54713.2023.10206899
external_id:
arxiv:
- '2303.07245'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2303.07245
month: '06'
oa: 1
oa_version: Preprint
page: 400-405
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of 2023 IEEE International Symposium on Information Theory
publication_identifier:
eisbn:
- '9781665475549'
eissn:
- 2157-8117
publication_status: published
publisher: IEEE
quality_controlled: '1'
related_material:
record:
- id: '15172'
relation: later_version
status: public
status: public
title: Concentration without independence via information measures
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2023'
...
---
_id: '11420'
abstract:
- lang: eng
text: 'Understanding the properties of neural networks trained via stochastic gradient
descent (SGD) is at the heart of the theory of deep learning. In this work, we
take a mean-field view, and consider a two-layer ReLU network trained via noisy-SGD
for a univariate regularized regression problem. Our main result is that SGD with
vanishingly small noise injected in the gradients is biased towards a simple solution:
at convergence, the ReLU network implements a piecewise linear map of the inputs,
and the number of “knot” points -- i.e., points where the tangent of the ReLU
network estimator changes -- between two consecutive training inputs is at most
three. In particular, as the number of neurons of the network grows, the SGD dynamics
is captured by the solution of a gradient flow and, at convergence, the distribution
of the weights approaches the unique minimizer of a related free energy, which
has a Gibbs form. Our key technical contribution consists in the analysis of the
estimator resulting from this minimizer: we show that its second derivative vanishes
everywhere, except at some specific locations which represent the “knot” points.
We also provide empirical evidence that knots at locations distinct from the data
points might occur, as predicted by our theory.'
acknowledgement: "We would like to thank Mert Pilanci for several exploratory discussions
in the early stage\r\nof the project, Jan Maas for clarifications about Jordan et
al. (1998), and Max Zimmer for\r\nsuggestive numerical experiments. A. Shevchenko
and M. Mondelli are partially supported\r\nby the 2019 Lopez-Loreta Prize. V. Kungurtsev
acknowledges support to the OP VVV\r\nproject CZ.02.1.01/0.0/0.0/16 019/0000765
Research Center for Informatics.\r\n"
article_processing_charge: No
article_type: original
author:
- first_name: Aleksandr
full_name: Shevchenko, Aleksandr
id: F2B06EC2-C99E-11E9-89F0-752EE6697425
last_name: Shevchenko
- first_name: Vyacheslav
full_name: Kungurtsev, Vyacheslav
last_name: Kungurtsev
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: Shevchenko A, Kungurtsev V, Mondelli M. Mean-field analysis of piecewise linear
solutions for wide ReLU networks. Journal of Machine Learning Research.
2022;23(130):1-55.
apa: Shevchenko, A., Kungurtsev, V., & Mondelli, M. (2022). Mean-field analysis
of piecewise linear solutions for wide ReLU networks. Journal of Machine Learning
Research. Journal of Machine Learning Research.
chicago: Shevchenko, Aleksandr, Vyacheslav Kungurtsev, and Marco Mondelli. “Mean-Field
Analysis of Piecewise Linear Solutions for Wide ReLU Networks.” Journal of
Machine Learning Research. Journal of Machine Learning Research, 2022.
ieee: A. Shevchenko, V. Kungurtsev, and M. Mondelli, “Mean-field analysis of piecewise
linear solutions for wide ReLU networks,” Journal of Machine Learning Research,
vol. 23, no. 130. Journal of Machine Learning Research, pp. 1–55, 2022.
ista: Shevchenko A, Kungurtsev V, Mondelli M. 2022. Mean-field analysis of piecewise
linear solutions for wide ReLU networks. Journal of Machine Learning Research.
23(130), 1–55.
mla: Shevchenko, Aleksandr, et al. “Mean-Field Analysis of Piecewise Linear Solutions
for Wide ReLU Networks.” Journal of Machine Learning Research, vol. 23,
no. 130, Journal of Machine Learning Research, 2022, pp. 1–55.
short: A. Shevchenko, V. Kungurtsev, M. Mondelli, Journal of Machine Learning Research
23 (2022) 1–55.
date_created: 2022-05-29T22:01:54Z
date_published: 2022-04-01T00:00:00Z
date_updated: 2022-05-30T08:34:14Z
day: '01'
ddc:
- '000'
department:
- _id: MaMo
- _id: DaAl
external_id:
arxiv:
- '2111.02278'
file:
- access_level: open_access
checksum: d4ff5d1affb34848b5c5e4002483fc62
content_type: application/pdf
creator: cchlebak
date_created: 2022-05-30T08:22:55Z
date_updated: 2022-05-30T08:22:55Z
file_id: '11422'
file_name: 21-1365.pdf
file_size: 1521701
relation: main_file
success: 1
file_date_updated: 2022-05-30T08:22:55Z
has_accepted_license: '1'
intvolume: ' 23'
issue: '130'
language:
- iso: eng
month: '04'
oa: 1
oa_version: Published Version
page: 1-55
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Journal of Machine Learning Research
publication_identifier:
eissn:
- 1533-7928
issn:
- 1532-4435
publication_status: published
publisher: Journal of Machine Learning Research
quality_controlled: '1'
related_material:
link:
- relation: other
url: https://www.jmlr.org/papers/v23/21-1365.html
scopus_import: '1'
status: public
title: Mean-field analysis of piecewise linear solutions for wide ReLU networks
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: journal_article
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
volume: 23
year: '2022'
...
---
_id: '12016'
abstract:
- lang: eng
text: We consider the problem of coded distributed computing using polar codes.
The average execution time of a coded computing system is related to the error
probability for transmission over the binary erasure channel in recent work by
Soleymani, Jamali and Mahdavifar, where the performance of binary linear codes
is investigated. In this paper, we focus on polar codes and unveil a connection
between the average execution time and the scaling exponent μ of the family of
codes. In the finite-length characterization of polar codes, the scaling exponent
is a key object capturing the speed of convergence to capacity. In particular,
we show that (i) the gap between the normalized average execution time of polar
codes and that of optimal MDS codes is O(n –1/μ ), and (ii) this upper bound can
be improved to roughly O(n –1/2 ) by considering polar codes with large kernels.
We conjecture that these bounds could be improved to O(n –2/μ ) and O(n –1 ),
respectively, and provide a heuristic argument as well as numerical evidence supporting
this view.
acknowledgement: D. Fathollahi and M. Mondelli were partially supported by the 2019
Lopez-Loreta Prize. The authors thank Hamed Hassani and Hessam Mahdavifar for helpful
discussions.
article_processing_charge: No
author:
- first_name: Dorsa
full_name: Fathollahi, Dorsa
last_name: Fathollahi
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Fathollahi D, Mondelli M. Polar coded computing: The role of the scaling exponent.
In: 2022 IEEE International Symposium on Information Theory. Vol 2022.
IEEE; 2022:2154-2159. doi:10.1109/ISIT50566.2022.9834712'
apa: 'Fathollahi, D., & Mondelli, M. (2022). Polar coded computing: The role
of the scaling exponent. In 2022 IEEE International Symposium on Information
Theory (Vol. 2022, pp. 2154–2159). Espoo, Finland: IEEE. https://doi.org/10.1109/ISIT50566.2022.9834712'
chicago: 'Fathollahi, Dorsa, and Marco Mondelli. “Polar Coded Computing: The Role
of the Scaling Exponent.” In 2022 IEEE International Symposium on Information
Theory, 2022:2154–59. IEEE, 2022. https://doi.org/10.1109/ISIT50566.2022.9834712.'
ieee: 'D. Fathollahi and M. Mondelli, “Polar coded computing: The role of the scaling
exponent,” in 2022 IEEE International Symposium on Information Theory,
Espoo, Finland, 2022, vol. 2022, pp. 2154–2159.'
ista: 'Fathollahi D, Mondelli M. 2022. Polar coded computing: The role of the scaling
exponent. 2022 IEEE International Symposium on Information Theory. ISIT: Internation
Symposium on Information Theory vol. 2022, 2154–2159.'
mla: 'Fathollahi, Dorsa, and Marco Mondelli. “Polar Coded Computing: The Role of
the Scaling Exponent.” 2022 IEEE International Symposium on Information Theory,
vol. 2022, IEEE, 2022, pp. 2154–59, doi:10.1109/ISIT50566.2022.9834712.'
short: D. Fathollahi, M. Mondelli, in:, 2022 IEEE International Symposium on Information
Theory, IEEE, 2022, pp. 2154–2159.
conference:
end_date: 2022-07-01
location: Espoo, Finland
name: 'ISIT: Internation Symposium on Information Theory'
start_date: 2022-06-26
date_created: 2022-09-04T22:02:05Z
date_published: 2022-08-03T00:00:00Z
date_updated: 2022-09-05T10:35:40Z
day: '03'
department:
- _id: MaMo
doi: 10.1109/ISIT50566.2022.9834712
external_id:
arxiv:
- '2201.10082'
intvolume: ' 2022'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2201.10082
month: '08'
oa: 1
oa_version: Preprint
page: 2154-2159
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 2022 IEEE International Symposium on Information Theory
publication_identifier:
isbn:
- '9781665421591'
issn:
- 2157-8095
publication_status: published
publisher: IEEE
quality_controlled: '1'
scopus_import: '1'
status: public
title: 'Polar coded computing: The role of the scaling exponent'
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 2022
year: '2022'
...
---
_id: '12540'
abstract:
- lang: eng
text: We consider the problem of signal estimation in generalized linear models
defined via rotationally invariant design matrices. Since these matrices can have
an arbitrary spectral distribution, this model is well suited for capturing complex
correlation structures which often arise in applications. We propose a novel family
of approximate message passing (AMP) algorithms for signal estimation, and rigorously
characterize their performance in the high-dimensional limit via a state evolution
recursion. Our rotationally invariant AMP has complexity of the same order as
the existing AMP derived under the restrictive assumption of a Gaussian design;
our algorithm also recovers this existing AMP as a special case. Numerical results
showcase a performance close to Vector AMP (which is conjectured to be Bayes-optimal
in some settings), but obtained with a much lower complexity, as the proposed
algorithm does not require a computationally expensive singular value decomposition.
acknowledgement: The authors would like to thank the anonymous reviewers for their
helpful comments. KK and MM were partially supported by the 2019 Lopez-Loreta Prize.
article_number: '22'
article_processing_charge: No
author:
- first_name: Ramji
full_name: Venkataramanan, Ramji
last_name: Venkataramanan
- first_name: Kevin
full_name: Kögler, Kevin
id: 94ec913c-dc85-11ea-9058-e5051ab2428b
last_name: Kögler
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Venkataramanan R, Kögler K, Mondelli M. Estimation in rotationally invariant
generalized linear models via approximate message passing. In: Proceedings
of the 39th International Conference on Machine Learning. Vol 162. ML Research
Press; 2022.'
apa: 'Venkataramanan, R., Kögler, K., & Mondelli, M. (2022). Estimation in rotationally
invariant generalized linear models via approximate message passing. In Proceedings
of the 39th International Conference on Machine Learning (Vol. 162). Baltimore,
MD, United States: ML Research Press.'
chicago: Venkataramanan, Ramji, Kevin Kögler, and Marco Mondelli. “Estimation in
Rotationally Invariant Generalized Linear Models via Approximate Message Passing.”
In Proceedings of the 39th International Conference on Machine Learning,
Vol. 162. ML Research Press, 2022.
ieee: R. Venkataramanan, K. Kögler, and M. Mondelli, “Estimation in rotationally
invariant generalized linear models via approximate message passing,” in Proceedings
of the 39th International Conference on Machine Learning, Baltimore, MD, United
States, 2022, vol. 162.
ista: 'Venkataramanan R, Kögler K, Mondelli M. 2022. Estimation in rotationally
invariant generalized linear models via approximate message passing. Proceedings
of the 39th International Conference on Machine Learning. ICML: International
Conference on Machine Learning vol. 162, 22.'
mla: Venkataramanan, Ramji, et al. “Estimation in Rotationally Invariant Generalized
Linear Models via Approximate Message Passing.” Proceedings of the 39th International
Conference on Machine Learning, vol. 162, 22, ML Research Press, 2022.
short: R. Venkataramanan, K. Kögler, M. Mondelli, in:, Proceedings of the 39th International
Conference on Machine Learning, ML Research Press, 2022.
conference:
end_date: 2022-07-23
location: Baltimore, MD, United States
name: 'ICML: International Conference on Machine Learning'
start_date: 2022-07-17
date_created: 2023-02-10T13:49:04Z
date_published: 2022-01-01T00:00:00Z
date_updated: 2023-02-13T10:54:58Z
ddc:
- '000'
department:
- _id: MaMo
file:
- access_level: open_access
checksum: 67436eb0a660789514cdf9db79e84683
content_type: application/pdf
creator: dernst
date_created: 2023-02-13T10:53:11Z
date_updated: 2023-02-13T10:53:11Z
file_id: '12547'
file_name: 2022_PMLR_Venkataramanan.pdf
file_size: 2341343
relation: main_file
success: 1
file_date_updated: 2023-02-13T10:53:11Z
has_accepted_license: '1'
intvolume: ' 162'
language:
- iso: eng
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 39th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: Estimation in rotationally invariant generalized linear models via approximate
message passing
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 162
year: '2022'
...
---
_id: '12536'
abstract:
- lang: eng
text: 'We consider the problem of estimating a rank-1 signal corrupted by structured
rotationally invariant noise, and address the following question: how well do
inference algorithms perform when the noise statistics is unknown and hence Gaussian
noise is assumed? While the matched Bayes-optimal setting with unstructured noise
is well understood, the analysis of this mismatched problem is only at its premises.
In this paper, we make a step towards understanding the effect of the strong source
of mismatch which is the noise statistics. Our main technical contribution is
the rigorous analysis of a Bayes estimator and of an approximate message passing
(AMP) algorithm, both of which incorrectly assume a Gaussian setup. The first
result exploits the theory of spherical integrals and of low-rank matrix perturbations;
the idea behind the second one is to design and analyze an artificial AMP which,
by taking advantage of the flexibility in the denoisers, is able to "correct"
the mismatch. Armed with these sharp asymptotic characterizations, we unveil a
rich and often unexpected phenomenology. For example, despite AMP is in principle
designed to efficiently compute the Bayes estimator, the former is outperformed
by the latter in terms of mean-square error. We show that this performance gap
is due to an incorrect estimation of the signal norm. In fact, when the SNR is
large enough, the overlaps of the AMP and the Bayes estimator coincide, and they
even match those of optimal estimators taking into account the structure of the
noise.'
article_number: '2205.10009'
article_processing_charge: No
author:
- first_name: Jean
full_name: Barbier, Jean
last_name: Barbier
- first_name: TianQi
full_name: Hou, TianQi
last_name: Hou
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Manuel
full_name: Saenz, Manuel
last_name: Saenz
citation:
ama: 'Barbier J, Hou T, Mondelli M, Saenz M. The price of ignorance: How much does
it cost to forget noise structure in low-rank matrix estimation? arXiv.
doi:10.48550/arXiv.2205.10009'
apa: 'Barbier, J., Hou, T., Mondelli, M., & Saenz, M. (n.d.). The price of ignorance:
How much does it cost to forget noise structure in low-rank matrix estimation?
arXiv. https://doi.org/10.48550/arXiv.2205.10009'
chicago: 'Barbier, Jean, TianQi Hou, Marco Mondelli, and Manuel Saenz. “The Price
of Ignorance: How Much Does It Cost to Forget Noise Structure in Low-Rank Matrix
Estimation?” ArXiv, n.d. https://doi.org/10.48550/arXiv.2205.10009.'
ieee: 'J. Barbier, T. Hou, M. Mondelli, and M. Saenz, “The price of ignorance: How
much does it cost to forget noise structure in low-rank matrix estimation?,” arXiv.
.'
ista: 'Barbier J, Hou T, Mondelli M, Saenz M. The price of ignorance: How much does
it cost to forget noise structure in low-rank matrix estimation? arXiv, 2205.10009.'
mla: 'Barbier, Jean, et al. “The Price of Ignorance: How Much Does It Cost to Forget
Noise Structure in Low-Rank Matrix Estimation?” ArXiv, 2205.10009, doi:10.48550/arXiv.2205.10009.'
short: J. Barbier, T. Hou, M. Mondelli, M. Saenz, ArXiv (n.d.).
date_created: 2023-02-10T13:45:41Z
date_published: 2022-05-20T00:00:00Z
date_updated: 2023-02-16T09:41:25Z
day: '20'
department:
- _id: MaMo
doi: 10.48550/arXiv.2205.10009
external_id:
arxiv:
- '2205.10009'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://doi.org/10.48550/arXiv.2205.10009
month: '05'
oa: 1
oa_version: Preprint
publication: arXiv
publication_status: accepted
status: public
title: 'The price of ignorance: How much does it cost to forget noise structure in
low-rank matrix estimation?'
type: preprint
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...
---
_id: '12233'
abstract:
- lang: eng
text: A novel recursive list decoding (RLD) algorithm for Reed-Muller (RM) codes
based on successive permutations (SP) of the codeword is presented. A low-complexity
SP scheme applied to a subset of the symmetry group of RM codes is first proposed
to carefully select a good codeword permutation on the fly. Then, the proposed
SP technique is integrated into an improved RLD algorithm that initializes different
decoding paths with random codeword permutations, which are sampled from the full
symmetry group of RM codes. Finally, efficient latency and complexity reduction
schemes are introduced that virtually preserve the error-correction performance
of the proposed decoder. Simulation results demonstrate that at the target frame
error rate of 10−3 for the RM code of length 256 with 163 information bits, the
proposed decoder reduces 6% of the computational complexity and 22% of the decoding
latency of the state-of-the-art semi-parallel simplified successive-cancellation
decoder with fast Hadamard transform (SSC-FHT) that uses 96 permutations from
the full symmetry group of RM codes, while relatively maintaining the error-correction
performance and memory consumption of the semi-parallel permuted SSC-FHT decoder.
article_processing_charge: No
article_type: original
author:
- first_name: Nghia
full_name: Doan, Nghia
last_name: Doan
- first_name: Seyyed Ali
full_name: Hashemi, Seyyed Ali
last_name: Hashemi
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Warren J.
full_name: Gross, Warren J.
last_name: Gross
citation:
ama: Doan N, Hashemi SA, Mondelli M, Gross WJ. Decoding Reed-Muller codes with successive
codeword permutations. IEEE Transactions on Communications. 2022;70(11):7134-7145.
doi:10.1109/tcomm.2022.3211101
apa: Doan, N., Hashemi, S. A., Mondelli, M., & Gross, W. J. (2022). Decoding
Reed-Muller codes with successive codeword permutations. IEEE Transactions
on Communications. Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/tcomm.2022.3211101
chicago: Doan, Nghia, Seyyed Ali Hashemi, Marco Mondelli, and Warren J. Gross. “Decoding
Reed-Muller Codes with Successive Codeword Permutations.” IEEE Transactions
on Communications. Institute of Electrical and Electronics Engineers, 2022.
https://doi.org/10.1109/tcomm.2022.3211101.
ieee: N. Doan, S. A. Hashemi, M. Mondelli, and W. J. Gross, “Decoding Reed-Muller
codes with successive codeword permutations,” IEEE Transactions on Communications,
vol. 70, no. 11. Institute of Electrical and Electronics Engineers, pp. 7134–7145,
2022.
ista: Doan N, Hashemi SA, Mondelli M, Gross WJ. 2022. Decoding Reed-Muller codes
with successive codeword permutations. IEEE Transactions on Communications. 70(11),
7134–7145.
mla: Doan, Nghia, et al. “Decoding Reed-Muller Codes with Successive Codeword Permutations.”
IEEE Transactions on Communications, vol. 70, no. 11, Institute of Electrical
and Electronics Engineers, 2022, pp. 7134–45, doi:10.1109/tcomm.2022.3211101.
short: N. Doan, S.A. Hashemi, M. Mondelli, W.J. Gross, IEEE Transactions on Communications
70 (2022) 7134–7145.
date_created: 2023-01-16T09:50:38Z
date_published: 2022-11-01T00:00:00Z
date_updated: 2023-08-04T09:34:43Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/tcomm.2022.3211101
external_id:
arxiv:
- '2109.02122'
isi:
- '000937284600006'
intvolume: ' 70'
isi: 1
issue: '11'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2109.02122'
month: '11'
oa: 1
oa_version: Preprint
page: 7134-7145
publication: IEEE Transactions on Communications
publication_identifier:
eissn:
- 1558-0857
issn:
- 0090-6778
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Decoding Reed-Muller codes with successive codeword permutations
type: journal_article
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
volume: 70
year: '2022'
...
---
_id: '10364'
abstract:
- lang: eng
text: 'This paper characterizes the latency of the simplified successive-cancellation
(SSC) decoding scheme for polar codes under hardware resource constraints. In
particular, when the number of processing elements P that can perform SSC decoding
operations in parallel is limited, as is the case in practice, the latency of
SSC decoding is O(N1-1/μ + N/P log2 log2 N/P), where N is the block length of
the code and μ is the scaling exponent of the channel. Three direct consequences
of this bound are presented. First, in a fully-parallel implementation where P
= N/2, the latency of SSC decoding is O(N1-1/μ), which is sublinear in the block
length. This recovers a result from our earlier work. Second, in a fully-serial
implementation where P = 1, the latency of SSC decoding scales as O(N log2 log2
N). The multiplicative constant is also calculated: we show that the latency of
SSC decoding when P = 1 is given by (2 + o(1))N log2 log2 N. Third, in a semi-parallel
implementation, the smallest P that gives the same latency as that of the fully-parallel
implementation is P = N1/μ. The tightness of our bound on SSC decoding latency
and the applicability of the foregoing results is validated through extensive
simulations.'
acknowledgement: "S. A. Hashemi is supported by a Postdoctoral Fellowship from the
Natural Sciences and\r\nEngineering Research Council of Canada (NSERC) and by Huawei.
M. Mondelli is partially\r\nsupported by the 2019 Lopez-Loreta Prize. A. Fazeli
and A. Vardy were supported in part by\r\nthe National Science Foundation under
Grant CCF-1764104."
article_processing_charge: No
article_type: original
author:
- first_name: Seyyed Ali
full_name: Hashemi, Seyyed Ali
last_name: Hashemi
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Arman
full_name: Fazeli, Arman
last_name: Fazeli
- first_name: Alexander
full_name: Vardy, Alexander
last_name: Vardy
- first_name: John
full_name: Cioffi, John
last_name: Cioffi
- first_name: Andrea
full_name: Goldsmith, Andrea
last_name: Goldsmith
citation:
ama: Hashemi SA, Mondelli M, Fazeli A, Vardy A, Cioffi J, Goldsmith A. Parallelism
versus latency in simplified successive-cancellation decoding of polar codes.
IEEE Transactions on Wireless Communications. 2022;21(6):3909-3920. doi:10.1109/TWC.2021.3125626
apa: Hashemi, S. A., Mondelli, M., Fazeli, A., Vardy, A., Cioffi, J., & Goldsmith,
A. (2022). Parallelism versus latency in simplified successive-cancellation decoding
of polar codes. IEEE Transactions on Wireless Communications. Institute
of Electrical and Electronics Engineers. https://doi.org/10.1109/TWC.2021.3125626
chicago: Hashemi, Seyyed Ali, Marco Mondelli, Arman Fazeli, Alexander Vardy, John
Cioffi, and Andrea Goldsmith. “Parallelism versus Latency in Simplified Successive-Cancellation
Decoding of Polar Codes.” IEEE Transactions on Wireless Communications.
Institute of Electrical and Electronics Engineers, 2022. https://doi.org/10.1109/TWC.2021.3125626.
ieee: S. A. Hashemi, M. Mondelli, A. Fazeli, A. Vardy, J. Cioffi, and A. Goldsmith,
“Parallelism versus latency in simplified successive-cancellation decoding of
polar codes,” IEEE Transactions on Wireless Communications, vol. 21, no.
6. Institute of Electrical and Electronics Engineers, pp. 3909–3920, 2022.
ista: Hashemi SA, Mondelli M, Fazeli A, Vardy A, Cioffi J, Goldsmith A. 2022. Parallelism
versus latency in simplified successive-cancellation decoding of polar codes.
IEEE Transactions on Wireless Communications. 21(6), 3909–3920.
mla: Hashemi, Seyyed Ali, et al. “Parallelism versus Latency in Simplified Successive-Cancellation
Decoding of Polar Codes.” IEEE Transactions on Wireless Communications,
vol. 21, no. 6, Institute of Electrical and Electronics Engineers, 2022, pp. 3909–20,
doi:10.1109/TWC.2021.3125626.
short: S.A. Hashemi, M. Mondelli, A. Fazeli, A. Vardy, J. Cioffi, A. Goldsmith,
IEEE Transactions on Wireless Communications 21 (2022) 3909–3920.
date_created: 2021-11-28T23:01:29Z
date_published: 2022-06-01T00:00:00Z
date_updated: 2023-08-14T06:55:57Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/TWC.2021.3125626
external_id:
arxiv:
- '2012.13378'
isi:
- '000809406400028'
intvolume: ' 21'
isi: 1
issue: '6'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: https://arxiv.org/abs/2012.13378
month: '06'
oa: 1
oa_version: Preprint
page: 3909-3920
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: IEEE Transactions on Wireless Communications
publication_identifier:
eissn:
- 1558-2248
issn:
- 1536-1276
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
related_material:
record:
- id: '10053'
relation: earlier_version
status: public
scopus_import: '1'
status: public
title: Parallelism versus latency in simplified successive-cancellation decoding of
polar codes
type: journal_article
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
volume: 21
year: '2022'
...
---
_id: '12538'
abstract:
- lang: eng
text: In this paper, we study the compression of a target two-layer neural network
with N nodes into a compressed network with MIEEE Information Theory Workshop.
2022:588-593. doi:10.1109/ITW54588.2022.9965870
apa: 'Amani, M. H., Bombari, S., Mondelli, M., Pukdee, R., & Rini, S. (2022).
Sharp asymptotics on the compression of two-layer neural networks. IEEE Information
Theory Workshop. Mumbai, India: IEEE. https://doi.org/10.1109/ITW54588.2022.9965870'
chicago: Amani, Mohammad Hossein, Simone Bombari, Marco Mondelli, Rattana Pukdee,
and Stefano Rini. “Sharp Asymptotics on the Compression of Two-Layer Neural Networks.”
IEEE Information Theory Workshop. IEEE, 2022. https://doi.org/10.1109/ITW54588.2022.9965870.
ieee: M. H. Amani, S. Bombari, M. Mondelli, R. Pukdee, and S. Rini, “Sharp asymptotics
on the compression of two-layer neural networks,” IEEE Information Theory Workshop.
IEEE, pp. 588–593, 2022.
ista: Amani MH, Bombari S, Mondelli M, Pukdee R, Rini S. 2022. Sharp asymptotics
on the compression of two-layer neural networks. IEEE Information Theory Workshop.,
588–593.
mla: Amani, Mohammad Hossein, et al. “Sharp Asymptotics on the Compression of Two-Layer
Neural Networks.” IEEE Information Theory Workshop, IEEE, 2022, pp. 588–93,
doi:10.1109/ITW54588.2022.9965870.
short: M.H. Amani, S. Bombari, M. Mondelli, R. Pukdee, S. Rini, IEEE Information
Theory Workshop (2022) 588–593.
conference:
end_date: 2022-11-09
location: Mumbai, India
name: 'ITW: Information Theory Workshop'
start_date: 2022-11-01
date_created: 2023-02-10T13:47:56Z
date_published: 2022-11-16T00:00:00Z
date_updated: 2023-12-18T11:31:47Z
day: '16'
department:
- _id: MaMo
doi: 10.1109/ITW54588.2022.9965870
external_id:
arxiv:
- '2205.08199'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2205.08199'
month: '11'
oa: 1
oa_version: Preprint
page: 588-593
publication: IEEE Information Theory Workshop
publication_identifier:
isbn:
- '9781665483414'
publication_status: published
publisher: IEEE
quality_controlled: '1'
scopus_import: '1'
status: public
title: Sharp asymptotics on the compression of two-layer neural networks
type: journal_article
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
year: '2022'
...
---
_id: '12537'
abstract:
- lang: eng
text: 'The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide
memorization, optimization and generalization guarantees in deep neural networks.
A line of work has studied the NTK spectrum for two-layer and deep networks with
at least a layer with Ω(N) neurons, N being the number of training samples. Furthermore,
there is increasing evidence suggesting that deep networks with sub-linear layer
widths are powerful memorizers and optimizers, as long as the number of parameters
exceeds the number of samples. Thus, a natural open question is whether the NTK
is well conditioned in such a challenging sub-linear setup. In this paper, we
answer this question in the affirmative. Our key technical contribution is a lower
bound on the smallest NTK eigenvalue for deep networks with the minimum possible
over-parameterization: the number of parameters is roughly Ω(N) and, hence, the
number of neurons is as little as Ω(N−−√). To showcase the applicability of our
NTK bounds, we provide two results concerning memorization capacity and optimization
guarantees for gradient descent training.'
acknowledgement: "The authors were partially supported by the 2019 Lopez-Loreta prize,
and they would like to thank\r\nQuynh Nguyen, Mahdi Soltanolkotabi and Adel Javanmard
for helpful discussions.\r\n"
article_processing_charge: No
author:
- first_name: Simone
full_name: Bombari, Simone
id: ca726dda-de17-11ea-bc14-f9da834f63aa
last_name: Bombari
- first_name: Mohammad Hossein
full_name: Amani, Mohammad Hossein
last_name: Amani
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
citation:
ama: 'Bombari S, Amani MH, Mondelli M. Memorization and optimization in deep neural
networks with minimum over-parameterization. In: 36th Conference on Neural
Information Processing Systems. Vol 35. Curran Associates; 2022:7628-7640.'
apa: Bombari, S., Amani, M. H., & Mondelli, M. (2022). Memorization and optimization
in deep neural networks with minimum over-parameterization. In 36th Conference
on Neural Information Processing Systems (Vol. 35, pp. 7628–7640). Curran
Associates.
chicago: Bombari, Simone, Mohammad Hossein Amani, and Marco Mondelli. “Memorization
and Optimization in Deep Neural Networks with Minimum Over-Parameterization.”
In 36th Conference on Neural Information Processing Systems, 35:7628–40.
Curran Associates, 2022.
ieee: S. Bombari, M. H. Amani, and M. Mondelli, “Memorization and optimization in
deep neural networks with minimum over-parameterization,” in 36th Conference
on Neural Information Processing Systems, 2022, vol. 35, pp. 7628–7640.
ista: Bombari S, Amani MH, Mondelli M. 2022. Memorization and optimization in deep
neural networks with minimum over-parameterization. 36th Conference on Neural
Information Processing Systems. vol. 35, 7628–7640.
mla: Bombari, Simone, et al. “Memorization and Optimization in Deep Neural Networks
with Minimum Over-Parameterization.” 36th Conference on Neural Information
Processing Systems, vol. 35, Curran Associates, 2022, pp. 7628–40.
short: S. Bombari, M.H. Amani, M. Mondelli, in:, 36th Conference on Neural Information
Processing Systems, Curran Associates, 2022, pp. 7628–7640.
date_created: 2023-02-10T13:46:37Z
date_published: 2022-07-24T00:00:00Z
date_updated: 2023-12-18T11:39:09Z
day: '24'
department:
- _id: MaMo
external_id:
arxiv:
- '2205.10217'
intvolume: ' 35'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2205.10217'
month: '07'
oa: 1
oa_version: Preprint
page: 7628-7640
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 36th Conference on Neural Information Processing Systems
publication_identifier:
isbn:
- '9781713871088'
publication_status: published
publisher: Curran Associates
quality_controlled: '1'
status: public
title: Memorization and optimization in deep neural networks with minimum over-parameterization
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 35
year: '2022'
...
---
_id: '12480'
abstract:
- lang: eng
text: 'We consider the problem of estimating a signal from measurements obtained
via a generalized linear model. We focus on estimators based on approximate message
passing (AMP), a family of iterative algorithms with many appealing features:
the performance of AMP in the high-dimensional limit can be succinctly characterized
under suitable model assumptions; AMP can also be tailored to the empirical distribution
of the signal entries, and for a wide class of estimation problems, AMP is conjectured
to be optimal among all polynomial-time algorithms. However, a major issue of
AMP is that in many models (such as phase retrieval), it requires an initialization
correlated with the ground-truth signal and independent from the measurement matrix.
Assuming that such an initialization is available is typically not realistic.
In this paper, we solve this problem by proposing an AMP algorithm initialized
with a spectral estimator. With such an initialization, the standard AMP analysis
fails since the spectral estimator depends in a complicated way on the design
matrix. Our main contribution is a rigorous characterization of the performance
of AMP with spectral initialization in the high-dimensional limit. The key technical
idea is to define and analyze a two-phase artificial AMP algorithm that first
produces the spectral estimator, and then closely approximates the iterates of
the true AMP. We also provide numerical results that demonstrate the validity
of the proposed approach.'
acknowledgement: "The authors would like to thank Andrea Montanari for helpful discussions.\r\nM
Mondelli was partially supported by the 2019 Lopez-Loreta Prize. R Venkataramanan
was partially supported by the Alan Turing Institute under the EPSRC Grant\r\nEP/N510129/1."
article_number: '114003'
article_processing_charge: Yes (via OA deal)
article_type: original
author:
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Ramji
full_name: Venkataramanan, Ramji
last_name: Venkataramanan
citation:
ama: 'Mondelli M, Venkataramanan R. Approximate message passing with spectral initialization
for generalized linear models. Journal of Statistical Mechanics: Theory and
Experiment. 2022;2022(11). doi:10.1088/1742-5468/ac9828'
apa: 'Mondelli, M., & Venkataramanan, R. (2022). Approximate message passing
with spectral initialization for generalized linear models. Journal of Statistical
Mechanics: Theory and Experiment. IOP Publishing. https://doi.org/10.1088/1742-5468/ac9828'
chicago: 'Mondelli, Marco, and Ramji Venkataramanan. “Approximate Message Passing
with Spectral Initialization for Generalized Linear Models.” Journal of Statistical
Mechanics: Theory and Experiment. IOP Publishing, 2022. https://doi.org/10.1088/1742-5468/ac9828.'
ieee: 'M. Mondelli and R. Venkataramanan, “Approximate message passing with spectral
initialization for generalized linear models,” Journal of Statistical Mechanics:
Theory and Experiment, vol. 2022, no. 11. IOP Publishing, 2022.'
ista: 'Mondelli M, Venkataramanan R. 2022. Approximate message passing with spectral
initialization for generalized linear models. Journal of Statistical Mechanics:
Theory and Experiment. 2022(11), 114003.'
mla: 'Mondelli, Marco, and Ramji Venkataramanan. “Approximate Message Passing with
Spectral Initialization for Generalized Linear Models.” Journal of Statistical
Mechanics: Theory and Experiment, vol. 2022, no. 11, 114003, IOP Publishing,
2022, doi:10.1088/1742-5468/ac9828.'
short: 'M. Mondelli, R. Venkataramanan, Journal of Statistical Mechanics: Theory
and Experiment 2022 (2022).'
date_created: 2023-02-02T08:31:57Z
date_published: 2022-11-24T00:00:00Z
date_updated: 2024-03-07T10:36:52Z
day: '24'
ddc:
- '510'
- '530'
department:
- _id: MaMo
doi: 10.1088/1742-5468/ac9828
external_id:
isi:
- '000889589900001'
file:
- access_level: open_access
checksum: 01411ffa76d3e380a0446baeb89b1ef7
content_type: application/pdf
creator: dernst
date_created: 2023-02-02T08:35:52Z
date_updated: 2023-02-02T08:35:52Z
file_id: '12481'
file_name: 2022_JourStatisticalMechanics_Mondelli.pdf
file_size: 1729997
relation: main_file
success: 1
file_date_updated: 2023-02-02T08:35:52Z
has_accepted_license: '1'
intvolume: ' 2022'
isi: 1
issue: '11'
keyword:
- Statistics
- Probability and Uncertainty
- Statistics and Probability
- Statistical and Nonlinear Physics
language:
- iso: eng
month: '11'
oa: 1
oa_version: Published Version
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: 'Journal of Statistical Mechanics: Theory and Experiment'
publication_identifier:
issn:
- 1742-5468
publication_status: published
publisher: IOP Publishing
quality_controlled: '1'
related_material:
record:
- id: '10598'
relation: earlier_version
status: public
scopus_import: '1'
status: public
title: Approximate message passing with spectral initialization for generalized linear
models
tmp:
image: /images/cc_by.png
legal_code_url: https://creativecommons.org/licenses/by/4.0/legalcode
name: Creative Commons Attribution 4.0 International Public License (CC-BY 4.0)
short: CC BY (4.0)
type: journal_article
user_id: 4359f0d1-fa6c-11eb-b949-802e58b17ae8
volume: 2022
year: '2022'
...
---
_id: '10595'
abstract:
- lang: eng
text: 'A recent line of work has analyzed the theoretical properties of deep neural
networks via the Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue
of the NTK has been related to the memorization capacity, the global convergence
of gradient descent algorithms and the generalization of deep nets. However, existing
results either provide bounds in the two-layer setting or assume that the spectrum
of the NTK matrices is bounded away from 0 for multi-layer networks. In this paper,
we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU
nets, both in the limiting case of infinite widths and for finite widths. In the
finite-width setting, the network architectures we consider are fairly general:
we require the existence of a wide layer with roughly order of $N$ neurons, $N$
being the number of data samples; and the scaling of the remaining layer widths
is arbitrary (up to logarithmic factors). To obtain our results, we analyze various
quantities of independent interest: we give lower bounds on the smallest singular
value of hidden feature matrices, and upper bounds on the Lipschitz constant of
input-output feature maps.'
acknowledgement: "The authors would like to thank the anonymous reviewers for their
helpful comments. MM was partially supported\r\nby the 2019 Lopez-Loreta Prize.
QN and GM acknowledge support from the European Research Council (ERC) under\r\nthe
European Union’s Horizon 2020 research and innovation programme (grant agreement
no 757983)."
alternative_title:
- Proceedings of Machine Learning Research
article_processing_charge: No
author:
- first_name: Quynh
full_name: Nguyen, Quynh
last_name: Nguyen
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: Guido F
full_name: Montufar, Guido F
last_name: Montufar
citation:
ama: 'Nguyen Q, Mondelli M, Montufar GF. Tight bounds on the smallest eigenvalue
of the neural tangent kernel for deep ReLU networks. In: Meila M, Zhang T, eds.
Proceedings of the 38th International Conference on Machine Learning. Vol
139. ML Research Press; 2021:8119-8129.'
apa: 'Nguyen, Q., Mondelli, M., & Montufar, G. F. (2021). Tight bounds on the
smallest eigenvalue of the neural tangent kernel for deep ReLU networks. In M.
Meila & T. Zhang (Eds.), Proceedings of the 38th International Conference
on Machine Learning (Vol. 139, pp. 8119–8129). Virtual: ML Research Press.'
chicago: Nguyen, Quynh, Marco Mondelli, and Guido F Montufar. “Tight Bounds on the
Smallest Eigenvalue of the Neural Tangent Kernel for Deep ReLU Networks.” In Proceedings
of the 38th International Conference on Machine Learning, edited by Marina
Meila and Tong Zhang, 139:8119–29. ML Research Press, 2021.
ieee: Q. Nguyen, M. Mondelli, and G. F. Montufar, “Tight bounds on the smallest
eigenvalue of the neural tangent kernel for deep ReLU networks,” in Proceedings
of the 38th International Conference on Machine Learning, Virtual, 2021, vol.
139, pp. 8119–8129.
ista: 'Nguyen Q, Mondelli M, Montufar GF. 2021. Tight bounds on the smallest eigenvalue
of the neural tangent kernel for deep ReLU networks. Proceedings of the 38th International
Conference on Machine Learning. ICML: International Conference on Machine Learning,
Proceedings of Machine Learning Research, vol. 139, 8119–8129.'
mla: Nguyen, Quynh, et al. “Tight Bounds on the Smallest Eigenvalue of the Neural
Tangent Kernel for Deep ReLU Networks.” Proceedings of the 38th International
Conference on Machine Learning, edited by Marina Meila and Tong Zhang, vol.
139, ML Research Press, 2021, pp. 8119–29.
short: Q. Nguyen, M. Mondelli, G.F. Montufar, in:, M. Meila, T. Zhang (Eds.), Proceedings
of the 38th International Conference on Machine Learning, ML Research Press, 2021,
pp. 8119–8129.
conference:
end_date: 2021-07-24
location: Virtual
name: 'ICML: International Conference on Machine Learning'
start_date: 2021-07-18
date_created: 2022-01-03T10:57:49Z
date_published: 2021-01-01T00:00:00Z
date_updated: 2022-01-04T09:59:21Z
department:
- _id: MaMo
editor:
- first_name: Marina
full_name: Meila, Marina
last_name: Meila
- first_name: Tong
full_name: Zhang, Tong
last_name: Zhang
external_id:
arxiv:
- '2012.11654'
intvolume: ' 139'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: http://proceedings.mlr.press/v139/nguyen21g.html
oa: 1
oa_version: Published Version
page: 8119-8129
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 38th International Conference on Machine Learning
publication_status: published
publisher: ML Research Press
quality_controlled: '1'
status: public
title: Tight bounds on the smallest eigenvalue of the neural tangent kernel for deep
ReLU networks
type: conference
user_id: 8b945eb4-e2f2-11eb-945a-df72226e66a9
volume: 139
year: '2021'
...
---
_id: '10599'
abstract:
- lang: eng
text: A two-part successive syndrome-check decoding of polar codes is proposed with
the first part successively refining the received codeword and the second part
checking its syndrome. A new formulation of the successive-cancellation (SC) decoding
algorithm is presented that allows for successively refining the received codeword
by comparing the log-likelihood ratio value of a frozen bit with its predefined
value. The syndrome of the refined received codeword is then checked for possible
errors. In case there are no errors, the decoding process is terminated. Otherwise,
the decoder continues to refine the received codeword. The proposed method is
extended to the case of SC list (SCL) decoding by terminating the decoding process
when the syndrome of the best candidate in the list indicates no errors. Simulation
results show that the proposed method reduces the time-complexity of SC and SCL
decoders and their fast variants, especially at high signal-to-noise ratios.
acknowledgement: This work is supported in part by ONR grant N00014-18-1-2191. S.
A. Hashemi was supported by a Postdoctoral Fellowship from the Natural Sciences
and Engineering Research Council of Canada (NSERC) and by Huawei. M. Mondelli was
partially supported by the 2019 Lopez-Loreta Prize.
article_processing_charge: No
author:
- first_name: Seyyed Ali
full_name: Hashemi, Seyyed Ali
last_name: Hashemi
- first_name: Marco
full_name: Mondelli, Marco
id: 27EB676C-8706-11E9-9510-7717E6697425
last_name: Mondelli
orcid: 0000-0002-3242-7020
- first_name: John
full_name: Cioffi, John
last_name: Cioffi
- first_name: Andrea
full_name: Goldsmith, Andrea
last_name: Goldsmith
citation:
ama: 'Hashemi SA, Mondelli M, Cioffi J, Goldsmith A. Successive syndrome-check decoding
of polar codes. In: Proceedings of the 55th Asilomar Conference on Signals,
Systems, and Computers. Vol 2021-October. Institute of Electrical and Electronics
Engineers; 2021:943-947. doi:10.1109/IEEECONF53345.2021.9723394'
apa: 'Hashemi, S. A., Mondelli, M., Cioffi, J., & Goldsmith, A. (2021). Successive
syndrome-check decoding of polar codes. In Proceedings of the 55th Asilomar
Conference on Signals, Systems, and Computers (Vol. 2021–October, pp. 943–947).
Virtual, Pacific Grove, CA, United States: Institute of Electrical and Electronics
Engineers. https://doi.org/10.1109/IEEECONF53345.2021.9723394'
chicago: Hashemi, Seyyed Ali, Marco Mondelli, John Cioffi, and Andrea Goldsmith.
“Successive Syndrome-Check Decoding of Polar Codes.” In Proceedings of the
55th Asilomar Conference on Signals, Systems, and Computers, 2021–October:943–47.
Institute of Electrical and Electronics Engineers, 2021. https://doi.org/10.1109/IEEECONF53345.2021.9723394.
ieee: S. A. Hashemi, M. Mondelli, J. Cioffi, and A. Goldsmith, “Successive syndrome-check
decoding of polar codes,” in Proceedings of the 55th Asilomar Conference on
Signals, Systems, and Computers, Virtual, Pacific Grove, CA, United States,
2021, vol. 2021–October, pp. 943–947.
ista: 'Hashemi SA, Mondelli M, Cioffi J, Goldsmith A. 2021. Successive syndrome-check
decoding of polar codes. Proceedings of the 55th Asilomar Conference on Signals,
Systems, and Computers. ACSSC: Asilomar Conference on Signals, Systems, and Computers
vol. 2021–October, 943–947.'
mla: Hashemi, Seyyed Ali, et al. “Successive Syndrome-Check Decoding of Polar Codes.”
Proceedings of the 55th Asilomar Conference on Signals, Systems, and Computers,
vol. 2021–October, Institute of Electrical and Electronics Engineers, 2021, pp.
943–47, doi:10.1109/IEEECONF53345.2021.9723394.
short: S.A. Hashemi, M. Mondelli, J. Cioffi, A. Goldsmith, in:, Proceedings of the
55th Asilomar Conference on Signals, Systems, and Computers, Institute of Electrical
and Electronics Engineers, 2021, pp. 943–947.
conference:
end_date: 2021-11-03
location: Virtual, Pacific Grove, CA, United States
name: 'ACSSC: Asilomar Conference on Signals, Systems, and Computers'
start_date: 2021-10-31
date_created: 2022-01-03T11:39:51Z
date_published: 2021-11-01T00:00:00Z
date_updated: 2023-02-13T10:44:16Z
day: '01'
department:
- _id: MaMo
doi: 10.1109/IEEECONF53345.2021.9723394
external_id:
arxiv:
- '2112.00057'
language:
- iso: eng
main_file_link:
- open_access: '1'
url: ' https://doi.org/10.48550/arXiv.2112.00057'
month: '11'
oa: 1
oa_version: Preprint
page: 943-947
project:
- _id: 059876FA-7A3F-11EA-A408-12923DDC885E
name: Prix Lopez-Loretta 2019 - Marco Mondelli
publication: Proceedings of the 55th Asilomar Conference on Signals, Systems, and
Computers
publication_identifier:
isbn:
- '9781665458283'
issn:
- 1058-6393
publication_status: published
publisher: Institute of Electrical and Electronics Engineers
quality_controlled: '1'
scopus_import: '1'
status: public
title: Successive syndrome-check decoding of polar codes
type: conference
user_id: 2DF688A6-F248-11E8-B48F-1D18A9856A87
volume: 2021-October
year: '2021'
...