TY  - JOUR
AB  - We propose a novel approach to concentration for non-independent random variables. The main idea is to “pretend” that the random variables are independent and pay a multiplicative price measuring how far they are from actually being independent. This price is encapsulated in the Hellinger integral between the joint and the product of the marginals, which is then upper bounded leveraging tensorisation properties. Our bounds represent a natural generalisation of concentration inequalities in the presence of dependence: we recover exactly the classical bounds (McDiarmid’s inequality) when the random variables are independent. Furthermore, in a “large deviations” regime, we obtain the same decay in the probability as for the independent case, even when the random variables display non-trivial dependencies. To show this, we consider a number of applications of interest. First, we provide a bound for Markov chains with finite state space. Then, we consider the Simple Symmetric Random Walk, which is a non-contracting Markov chain, and a non-Markovian setting in which the stochastic process depends on its entire past. To conclude, we propose an application to Markov Chain Monte Carlo methods, where our approach leads to an improved lower bound on the minimum burn-in period required to reach a certain accuracy. In all of these settings, we provide a regime of parameters in which our bound fares better than what the state of the art can provide.
AU  - Esposito, Amedeo Roberto
AU  - Mondelli, Marco
ID  - 15172
JF  - IEEE Transactions on Information Theory
SN  - 0018-9448
TI  - Concentration without independence via information measures
ER  - 
TY  - CONF
AB  - In this work we consider the list-decodability and list-recoverability of arbitrary q-ary codes, for all integer values of q ≥ 2. A code is called (p,L)_q-list-decodable if every radius pn Hamming ball contains less than L codewords; (p,𝓁,L)_q-list-recoverability is a generalization where we place radius pn Hamming balls on every point of a combinatorial rectangle with side length 𝓁 and again stipulate that there be less than L codewords.
Our main contribution is to precisely calculate the maximum value of p for which there exist infinite families of positive rate (p,𝓁,L)_q-list-recoverable codes, the quantity we call the zero-rate threshold. Denoting this value by p_*, we in fact show that codes correcting a p_*+ε fraction of errors must have size O_ε(1), i.e., independent of n. Such a result is typically referred to as a "Plotkin bound." To complement this, a standard random code with expurgation construction shows that there exist positive rate codes correcting a p_*-ε fraction of errors. We also follow a classical proof template (typically attributed to Elias and Bassalygo) to derive from the zero-rate threshold other tradeoffs between rate and decoding radius for list-decoding and list-recovery.
Technically, proving the Plotkin bound boils down to demonstrating the Schur convexity of a certain function defined on the q-simplex as well as the convexity of a univariate function derived from it. We remark that an earlier argument claimed similar results for q-ary list-decoding; however, we point out that this earlier proof is flawed.
AU  - Resch, Nicolas
AU  - Yuan, Chen
AU  - Zhang, Yihan
ID  - 14083
SN  - 1868-8969
T2  - 50th International Colloquium on Automata, Languages, and Programming
TI  - Zero-rate thresholds and new capacity bounds for list-decoding and list-recovery
VL  - 261
ER  - 
TY  - JOUR
AB  - How do statistical dependencies in measurement noise influence high-dimensional inference? To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise. We go beyond the usual independence assumption on the noise entries, by drawing the noise from a low-order polynomial orthogonal matrix ensemble. The resulting noise correlations make the setting relevant for applications but analytically challenging. We provide characterization of the Bayes optimal limits of inference in this model. If the spike is rotation invariant, we show that standard spectral PCA is optimal. However, for more general priors, both PCA and the existing approximate message-passing algorithm (AMP) fall short of achieving the information-theoretic limits, which we compute using the replica method from statistical physics. We thus propose an AMP, inspired by the theory of adaptive Thouless–Anderson–Palmer equations, which is empirically observed to saturate the conjectured theoretical limit. This AMP comes with a rigorous state evolution analysis tracking its performance. Although we focus on specific noise distributions, our methodology can be generalized to a wide class of trace matrix ensembles at the cost of more involved expressions. Finally, despite the seemingly strong assumption of rotation-invariant noise, our theory empirically predicts algorithmic performance on real data, pointing at strong universality properties.
AU  - Barbier, Jean
AU  - Camilli, Francesco
AU  - Mondelli, Marco
AU  - Sáenz, Manuel
ID  - 13315
IS  - 30
JF  - Proceedings of the National Academy of Sciences of the United States of America
TI  - Fundamental limits in structured principal component analysis and how to reach them
VL  - 120
ER  - 
TY  - CONF
AB  - Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions.
AU  - Shevchenko, Aleksandr
AU  - Kögler, Kevin
AU  - Hassani, Hamed
AU  - Mondelli, Marco
ID  - 14459
T2  - Proceedings of the 40th International Conference on Machine Learning
TI  - Fundamental limits of two-layer autoencoders, and achieving them with gradient methods
VL  - 202
ER  - 
TY  - JOUR
AB  - We study the problem of high-dimensional multiple packing in Euclidean space. Multiple packing is a natural generalization of sphere packing and is defined as follows. Let N > 0 and L ∈ Z ≽2 . A multiple packing is a set C of points in R n such that any point in R n lies in the intersection of at most L – 1 balls of radius √ nN around points in C . Given a well-known connection with coding theory, multiple packings can be viewed as the Euclidean analog of list-decodable codes, which are well-studied for finite fields. In this paper, we derive the best known lower bounds on the optimal density of list-decodable infinite constellations for constant L under a stronger notion called average-radius multiple packing. To this end, we apply tools from high-dimensional geometry and large deviation theory.
AU  - Zhang, Yihan
AU  - Vatedka, Shashank
ID  - 12838
IS  - 7
JF  - IEEE Transactions on Information Theory
SN  - 0018-9448
TI  - Multiple packing: Lower bounds via infinite constellations
VL  - 69
ER  - 
TY  - CONF
AB  - We consider the problem of reconstructing the signal and the hidden variables from observations coming from a multi-layer network with rotationally invariant weight matrices. The multi-layer structure models inference from deep generative priors, and the rotational invariance imposed on the weights generalizes the i.i.d. Gaussian assumption by allowing for a complex correlation structure, which is typical in applications. In this work, we present a new class of approximate message passing (AMP) algorithms and give a state evolution recursion which precisely characterizes their performance in the large system limit. In contrast with the existing multi-layer VAMP (ML-VAMP) approach, our proposed AMP – dubbed multilayer rotationally invariant generalized AMP (ML-RI-GAMP) – provides a natural generalization beyond Gaussian designs, in the sense that it recovers the existing Gaussian AMP as a special case. Furthermore, ML-RI-GAMP exhibits a significantly lower complexity than ML-VAMP, as the computationally intensive singular value decomposition is replaced by an estimation of the moments of the design matrices. Finally, our numerical results show that this complexity gain comes at little to no cost in the performance of the algorithm.
AU  - Xu, Yizhou
AU  - Hou, Tian Qi
AU  - Liang, Shan Suo
AU  - Mondelli, Marco
ID  - 13321
SN  - 9798350301496
T2  - 2023 IEEE Information Theory Workshop
TI  - Approximate message passing for multi-layer estimation in rotationally invariant models
ER  - 
TY  - JOUR
AB  - We derive lower bounds on the maximal rates for multiple packings in high-dimensional Euclidean spaces. For any N > 0 and L ∈ Z ≥2 , a multiple packing is a set C of points in R n such that any point in R n lies in the intersection of at most L - 1 balls of radius √ nN around points in C . This is a natural generalization of the sphere packing problem. We study the multiple packing problem for both bounded point sets whose points have norm at most √ nP for some constant P > 0, and unbounded point sets whose points are allowed to be anywhere in R n . Given a well-known connection with coding theory, multiple packings can be viewed as the Euclidean analog of list-decodable codes, which are well-studied over finite fields. We derive the best known lower bounds on the optimal multiple packing density. This is accomplished by establishing an inequality which relates the list-decoding error exponent for additive white Gaussian noise channels, a quantity of average-case nature, to the list-decoding radius, a quantity of worst-case nature. We also derive novel bounds on the list-decoding error exponent for infinite constellations and closed-form expressions for the list-decoding error exponents for the power-constrained AWGN channel, which may be of independent interest beyond multiple packing.
AU  - Zhang, Yihan
AU  - Vatedka, Shashank
ID  - 14665
JF  - IEEE Transactions on Information Theory
SN  - 0018-9448
TI  - Multiple packing: Lower bounds via error exponents
ER  - 
TY  - JOUR
AB  - We consider zero-error communication over a two-transmitter deterministic adversarial multiple access channel (MAC) governed by an adversary who has access to the transmissions of both senders (hence called omniscient ) and aims to maliciously corrupt the communication. None of the encoders, jammer and decoder is allowed to randomize using private or public randomness. This enforces a combinatorial nature of the problem. Our model covers a large family of channels studied in the literature, including all deterministic discrete memoryless noisy or noiseless MACs. In this work, given an arbitrary two-transmitter deterministic omniscient adversarial MAC, we characterize when the capacity region: 1) has nonempty interior (in particular, is two-dimensional); 2) consists of two line segments (in particular, has empty interior); 3) consists of one line segment (in particular, is one-dimensional); 4) or only contains (0,0) (in particular, is zero-dimensional). This extends a recent result by Wang et al. (201 9) from the point-to-point setting to the multiple access setting. Indeed, our converse arguments build upon their generalized Plotkin bound and involve delicate case analysis. One of the technical challenges is to take care of both “joint confusability” and “marginal confusability”. In particular, the treatment of marginal confusability does not follow from the point-to-point results by Wang et al. Our achievability results follow from random coding with expurgation.
AU  - Zhang, Yihan
ID  - 14751
IS  - 7
JF  - IEEE Transactions on Information Theory
KW  - Computer Science Applications
KW  - Information Systems
SN  - 0018-9448
TI  - Zero-error communication over adversarial MACs
VL  - 69
ER  - 
TY  - JOUR
AB  - This paper is a collection of results on combinatorial properties of codes for the Z-channel . A Z-channel with error fraction τ takes as input a length- n binary codeword and injects in an adversarial manner up to n τ asymmetric errors, i.e., errors that only zero out bits but do not flip 0’s to 1’s. It is known that the largest ( L - 1)-list-decodable code for the Z-channel with error fraction τ has exponential size (in n ) if τ is less than a critical value that we call the ( L - 1)- list-decoding Plotkin point and has constant size if τ is larger than the threshold. The ( L -1)-list-decoding Plotkin point is known to be L -1/L-1 – L -L/ L-1 , which equals 1/4 for unique-decoding with L -1 = 1. In this paper, we derive various results for the size of the largest codes above and below the list-decoding Plotkin point. In particular, we show that the largest ( L -1)-list-decodable code ε-above the Plotkin point, for any given sufficiently small positive constant ε > 0, has size Θ L (ε -3/2 ) for any L - 1 ≥ 1. We also devise upper and lower bounds on the exponential size of codes below the list-decoding Plotkin point.
AU  - Polyanskii, Nikita
AU  - Zhang, Yihan
ID  - 13269
IS  - 10
JF  - IEEE Transactions on Information Theory
SN  - 0018-9448
TI  - Codes for the Z-channel
VL  - 69
ER  - 
TY  - CONF
AB  - Machine learning models are vulnerable to adversarial perturbations, and a thought-provoking paper by Bubeck and Sellke has analyzed this phenomenon through the lens of over-parameterization: interpolating smoothly the data requires significantly more parameters than simply memorizing it. However, this "universal" law provides only a necessary condition for robustness, and it is unable to discriminate between models. In this paper, we address these gaps by focusing on empirical risk minimization in two prototypical settings, namely, random features and the neural tangent kernel (NTK). We prove that, for random features, the model is not robust for any degree of over-parameterization, even when the necessary condition coming from the universal law of robustness is satisfied. In contrast, for even activations, the NTK model meets the universal lower bound, and it is robust as soon as the necessary condition on over-parameterization is fulfilled. This also addresses a conjecture in prior work by Bubeck, Li and Nagaraj. Our analysis decouples the effect of the kernel of the model from an "interaction matrix", which describes the interaction with the test data and captures the effect of the activation. Our theoretical results are corroborated by numerical evidence on both synthetic and standard datasets (MNIST, CIFAR-10).
AU  - Bombari, Simone
AU  - Kiyani, Shayan
AU  - Mondelli, Marco
ID  - 12859
T2  - Proceedings of the 40th International Conference on Machine Learning
TI  - Beyond the universal law of robustness: Sharper laws for random features and neural tangent kernels
VL  - 202
ER  - 
TY  - CONF
AB  - Neural collapse (NC) refers to the surprising structure of the last layer of deep neural networks in the terminal phase of gradient descent training. Recently, an increasing amount of experimental evidence has pointed to the propagation of NC to earlier layers of neural networks. However, while the NC in the last layer is well studied theoretically, much less is known about its multi-layered counterpart - deep neural collapse (DNC). In particular, existing work focuses either on linear layers or only on the last two layers at the price of an extra assumption. Our paper fills this gap by generalizing the established analytical framework for NC - the unconstrained features model - to multiple non-linear layers. Our key technical contribution is to show that, in a deep unconstrained features model, the unique global optimum for binary classification exhibits all the properties typical of DNC. This explains the existing experimental evidence of DNC. We also empirically show that (i) by optimizing deep unconstrained features models via gradient descent, the resulting solution agrees well with our theory, and (ii) trained networks recover the unconstrained features suitable for the occurrence of DNC, thus supporting the validity of this modeling principle.
AU  - Súkeník, Peter
AU  - Mondelli, Marco
AU  - Lampert, Christoph
ID  - 14921
T2  - 37th Annual Conference on Neural Information Processing Systems
TI  - Deep neural collapse is provably optimal for the deep unconstrained features model
ER  - 
TY  - CONF
AB  - The stochastic heavy ball method (SHB), also known as stochastic gradient descent (SGD) with Polyak's momentum, is widely used in training neural networks. However, despite the remarkable success of such algorithm in practice, its theoretical characterization remains limited. In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum.
To achieve this goal, we take a mean-field view and relate the SHB dynamics to a certain partial differential equation in the limit of large network widths. This mean-field perspective has inspired a recent line of work focusing on SGD while, in contrast, our paper considers an algorithm with momentum. More specifically, after proving existence and uniqueness of the limit differential equations, we show convergence to the global optimum and give a quantitative bound between the mean-field limit and the SHB dynamics of a finite-width network. Armed with this last bound, we are able to establish the dropout-stability and connectivity of SHB solutions.
AU  - Wu, Diyuan
AU  - Kungurtsev, Vyacheslav
AU  - Mondelli, Marco
ID  - 14924
T2  - Transactions on Machine Learning Research
TI  - Mean-field analysis for heavy ball methods: Dropout-stability, connectivity, and global convergence
ER  - 
TY  - CONF
AB  - We study the performance of a Bayesian statistician who estimates a rank-one signal corrupted by non-symmetric rotationally invariant noise with a generic distribution of singular values. As the signal-to-noise ratio and the noise structure are unknown, a Gaussian setup is incorrectly assumed. We derive the exact analytic expression for the error of the mismatched Bayes estimator and also provide the analysis of an approximate message passing (AMP) algorithm. The first result exploits the asymptotic behavior of spherical integrals for rectangular matrices and of low-rank matrix perturbations; the second one relies on the design and analysis of an auxiliary AMP. The numerical experiments show that there is a performance gap between the AMP and Bayes estimators, which is due to the incorrect estimation of the signal norm.
AU  - Fu, Teng
AU  - Liu, YuHao
AU  - Barbier, Jean
AU  - Mondelli, Marco
AU  - Liang, ShanSuo
AU  - Hou, TianQi
ID  - 14923
T2  - Proceedings of 2023 IEEE International Symposium on Information Theory
TI  - Mismatched estimation of non-symmetric rank-one matrices corrupted by structured noise
ER  - 
TY  - CONF
AB  - We propose a novel approach to concentration for non-independent random variables. The main idea is to ``pretend'' that the random variables are independent and pay a multiplicative price measuring how far they are from actually being independent. This price is encapsulated in the Hellinger integral between the joint and the product of the marginals, which is then upper bounded leveraging tensorisation properties. Our bounds represent a natural generalisation of concentration inequalities in the presence of dependence: we recover exactly the classical bounds (McDiarmid's inequality) when the random variables are independent. Furthermore, in a ``large deviations'' regime, we obtain the same decay in the probability as for the independent case, even when the random variables display non-trivial dependencies. To show this, we consider a number of applications of interest. First, we provide a bound for Markov chains with finite state space. Then, we consider the Simple Symmetric Random Walk, which is a non-contracting Markov chain, and a non-Markovian setting in which the stochastic process depends on its entire past. To conclude, we propose an application to Markov Chain Monte Carlo methods, where our approach leads to an improved lower bound on the minimum burn-in period required to reach a certain accuracy. In all of these settings, we provide a regime of parameters in which our bound fares better than what the state of the art can provide.
AU  - Esposito, Amedeo Roberto
AU  - Mondelli, Marco
ID  - 14922
T2  - Proceedings of 2023 IEEE International Symposium on Information Theory
TI  - Concentration without independence via information measures
ER  - 
TY  - JOUR
AB  - Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning. In this work, we take a mean-field view, and consider a two-layer ReLU network trained via noisy-SGD for a univariate regularized regression problem. Our main result is that SGD with vanishingly small noise injected in the gradients is biased towards a simple solution: at convergence, the ReLU network implements a piecewise linear map of the inputs, and the number of “knot” points -- i.e., points where the tangent of the ReLU network estimator changes -- between two consecutive training inputs is at most three. In particular, as the number of neurons of the network grows, the SGD dynamics is captured by the solution of a gradient flow and, at convergence, the distribution of the weights approaches the unique minimizer of a related free energy, which has a Gibbs form. Our key technical contribution consists in the analysis of the estimator resulting from this minimizer: we show that its second derivative vanishes everywhere, except at some specific locations which represent the “knot” points. We also provide empirical evidence that knots at locations distinct from the data points might occur, as predicted by our theory.
AU  - Shevchenko, Aleksandr
AU  - Kungurtsev, Vyacheslav
AU  - Mondelli, Marco
ID  - 11420
IS  - 130
JF  - Journal of Machine Learning Research
SN  - 1532-4435
TI  - Mean-field analysis of piecewise linear solutions for wide ReLU networks
VL  - 23
ER  - 
TY  - CONF
AB  - We characterize the capacity for the discrete-time arbitrarily varying channel with discrete inputs, outputs, and states when (a) the encoder and decoder do not share common randomness, (b) the input and state are subject to cost constraints, (c) the transition matrix of the channel is deterministic given the state, and (d) at each time step the adversary can only observe the current and past channel inputs when choosing the state at that time. The achievable strategy involves stochastic encoding together with list decoding and a disambiguation step. The converse uses a two-phase "babble-and-push" strategy where the adversary chooses the state randomly in the first phase, list decodes the output, and then chooses state inputs to symmetrize the channel in the second phase. These results generalize prior work on specific channels models (additive, erasure) to general discrete alphabets and models.
AU  - Zhang, Yihan
AU  - Jaggi, Sidharth
AU  - Langberg, Michael
AU  - Sarwate, Anand D.
ID  - 12011
SN  - 2157-8095
T2  - 2022 IEEE International Symposium on Information Theory
TI  - The capacity of causal adversarial channels
VL  - 2022
ER  - 
TY  - CONF
AB  - In the classic adversarial communication problem, two parties communicate over a noisy channel in the presence of a malicious jamming adversary. The arbitrarily varying channels (AVCs) offer an elegant framework to study a wide range of interesting adversary models. The optimal throughput or capacity over such AVCs is intimately tied to the underlying adversary model; in some cases, capacity is unknown and the problem is known to be notoriously hard. The omniscient adversary, one which knows the sender’s entire channel transmission a priori, is one of such classic models of interest; the capacity under such an adversary remains an exciting open problem. The myopic adversary is a generalization of that model where the adversary’s observation may be corrupted over a noisy discrete memoryless channel. Through the adversary’s myopicity, one can unify the slew of different adversary models, ranging from the omniscient adversary to one that is completely blind to the transmission (the latter is the well known oblivious model where the capacity is fully characterized).In this work, we present new results on the capacity under both the omniscient and myopic adversary models. We completely characterize the positive capacity threshold over general AVCs with omniscient adversaries. The characterization is in terms of two key combinatorial objects: the set of completely positive distributions and the CP-confusability set. For omniscient AVCs with positive capacity, we present non-trivial lower and upper bounds on the capacity; unlike some of the previous bounds, our bounds hold under fairly general input and jamming constraints. Our lower bound improves upon the generalized Gilbert-Varshamov bound for general AVCs while the upper bound generalizes the well known Elias-Bassalygo bound (known for binary and q-ary alphabets). For the myopic AVCs, we build on prior results known for the so-called sufficiently myopic model, and present new results on the positive rate communication threshold over the so-called insufficiently myopic regime (a completely insufficient myopic adversary specializes to an omniscient adversary). We present interesting examples for the widely studied models of adversarial bit-flip and bit-erasure channels. In fact, for the bit-flip AVC with additive adversarial noise as well as random noise, we completely characterize the omniscient model capacity when the random noise is sufficiently large vis-a-vis the adversary’s budget.
AU  - Yadav, Anuj Kumar
AU  - Alimohammadi, Mohammadreza
AU  - Zhang, Yihan
AU  - Budkuley, Amitalok J.
AU  - Jaggi, Sidharth
ID  - 12017
SN  - 2157-8095
T2  - 2022 IEEE International Symposium on Information Theory
TI  - New results on AVCs with omniscient and myopic adversaries
VL  - 2022
ER  - 
TY  - CONF
AB  - We consider the problem of communication over adversarial channels with feedback. Two parties comprising sender Alice and receiver Bob seek to communicate reliably. An adversary James observes Alice's channel transmission entirely and chooses, maliciously, its additive channel input or jamming state thereby corrupting Bob's observation. Bob can communicate over a one-way reverse link with Alice; we assume that transmissions over this feedback link cannot be corrupted by James. Our goal in this work is to study the optimum throughput or capacity over such channels with feedback. We first present results for the quadratically-constrained additive channel where communication is known to be impossible when the noise-to-signal (power) ratio (NSR) is at least 1. We present a novel achievability scheme to establish that positive rate communication is possible even when the NSR is as high as 8/9. We also present new converse upper bounds on the capacity of this channel under potentially stochastic encoders and decoders. We also study feedback communication over the more widely studied q-ary alphabet channel under additive noise. For the q -ary channel, where q > 2, it is well known that capacity is positive under full feedback if and only if the adversary can corrupt strictly less than half the transmitted symbols. We generalize this result and show that the same threshold holds for positive rate communication when the noiseless feedback may only be partial; our scheme employs a stochastic decoder. We extend this characterization, albeit partially, to fully deterministic schemes under partial noiseless feedback. We also present new converse upper bounds for q-ary channels under full feedback, where the encoder and/or decoder may privately randomize. Our converse results bring to the fore an interesting alternate expression for the well known converse bound for the q—ary channel under full feedback which, when specialized to the binary channel, also equals its known capacity.
AU  - Joshi, Pranav
AU  - Purkayastha, Amritakshya
AU  - Zhang, Yihan
AU  - Budkuley, Amitalok J.
AU  - Jaggi, Sidharth
ID  - 12013
SN  - 2157-8095
T2  - 2022 IEEE International Symposium on Information Theory
TI  - On the capacity of additive AVCs with feedback
VL  - 2022
ER  - 
TY  - CONF
AB  - We consider the problem of coded distributed computing using polar codes. The average execution time of a coded computing system is related to the error probability for transmission over the binary erasure channel in recent work by Soleymani, Jamali and Mahdavifar, where the performance of binary linear codes is investigated. In this paper, we focus on polar codes and unveil a connection between the average execution time and the scaling exponent μ of the family of codes. In the finite-length characterization of polar codes, the scaling exponent is a key object capturing the speed of convergence to capacity. In particular, we show that (i) the gap between the normalized average execution time of polar codes and that of optimal MDS codes is O(n –1/μ ), and (ii) this upper bound can be improved to roughly O(n –1/2 ) by considering polar codes with large kernels. We conjecture that these bounds could be improved to O(n –2/μ ) and O(n –1 ), respectively, and provide a heuristic argument as well as numerical evidence supporting this view.
AU  - Fathollahi, Dorsa
AU  - Mondelli, Marco
ID  - 12016
SN  - 2157-8095
T2  - 2022 IEEE International Symposium on Information Theory
TI  - Polar coded computing: The role of the scaling exponent
VL  - 2022
ER  - 
TY  - CONF
AB  - This paper is eligible for the Jack Keil Wolf ISIT Student Paper Award. We generalize a previous framework for designing utility-optimal differentially private (DP) mechanisms via graphs, where datasets are vertices in the graph and edges represent dataset neighborhood. The boundary set contains datasets where an individual’s response changes the binary-valued query compared to its neighbors. Previous work was limited to the homogeneous case where the privacy parameter ε across all datasets was the same and the mechanism at boundary datasets was identical. In our work, the mechanism can take different distributions at the boundary and the privacy parameter ε is a function of neighboring datasets, which recovers an earlier definition of personalized DP as special case. The problem is how to extend the mechanism, which is only defined at the boundary set, to other datasets in the graph in a computationally efficient and utility optimal manner. Using the concept of strongest induced DP condition we solve this problem efficiently in polynomial time (in the size of the graph).
AU  - Torkamani, Sahel
AU  - Ebrahimi, Javad B.
AU  - Sadeghi, Parastoo
AU  - D'Oliveira, Rafael G.L.
AU  - Médard, Muriel
ID  - 12012
SN  - 2157-8095
T2  - 2022 IEEE International Symposium on Information Theory
TI  - Heterogeneous differential privacy via graphs
VL  - 2022
ER  -