TY - CONF AB - Autoencoders are a popular model in many branches of machine learning and lossy data compression. However, their fundamental limits, the performance of gradient methods and the features learnt during optimization remain poorly understood, even in the two-layer setting. In fact, earlier work has considered either linear autoencoders or specific training regimes (leading to vanishing or diverging compression rates). Our paper addresses this gap by focusing on non-linear two-layer autoencoders trained in the challenging proportional regime in which the input dimension scales linearly with the size of the representation. Our results characterize the minimizers of the population risk, and show that such minimizers are achieved by gradient methods; their structure is also unveiled, thus leading to a concise description of the features obtained via training. For the special case of a sign activation function, our analysis establishes the fundamental limits for the lossy compression of Gaussian sources via (shallow) autoencoders. Finally, while the results are proved for Gaussian data, numerical simulations on standard datasets display the universality of the theoretical predictions. AU - Shevchenko, Aleksandr AU - Kögler, Kevin AU - Hassani, Hamed AU - Mondelli, Marco ID - 14459 T2 - Proceedings of the 40th International Conference on Machine Learning TI - Fundamental limits of two-layer autoencoders, and achieving them with gradient methods VL - 202 ER - TY - CONF AB - We provide an efficient implementation of the backpropagation algorithm, specialized to the case where the weights of the neural network being trained are sparse. Our algorithm is general, as it applies to arbitrary (unstructured) sparsity and common layer types (e.g., convolutional or linear). We provide a fast vectorized implementation on commodity CPUs, and show that it can yield speedups in end-to-end runtime experiments, both in transfer learning using already-sparsified networks, and in training sparse networks from scratch. Thus, our results provide the first support for sparse training on commodity hardware. AU - Nikdan, Mahdi AU - Pegolotti, Tommaso AU - Iofinova, Eugenia B AU - Kurtic, Eldar AU - Alistarh, Dan-Adrian ID - 14460 T2 - Proceedings of the 40th International Conference on Machine Learning TI - SparseProp: Efficient sparse backpropagation for faster training of neural networks at the edge VL - 202 ER - TY - CONF AB - Threshold secret sharing allows a dealer to split a secret s into n shares, such that any t shares allow for reconstructing s, but no t-1 shares reveal any information about s. Leakage-resilient secret sharing requires that the secret remains hidden, even when an adversary additionally obtains a limited amount of leakage from every share. Benhamouda et al. (CRYPTO’18) proved that Shamir’s secret sharing scheme is one bit leakage-resilient for reconstruction threshold t≥0.85n and conjectured that the same holds for t = c.n for any constant 0≤c≤1. Nielsen and Simkin (EUROCRYPT’20) showed that this is the best one can hope for by proving that Shamir’s scheme is not secure against one-bit leakage when t0c.n/log(n). In this work, we strengthen the lower bound of Nielsen and Simkin. We consider noisy leakage-resilience, where a random subset of leakages is replaced by uniformly random noise. We prove a lower bound for Shamir’s secret sharing, similar to that of Nielsen and Simkin, which holds even when a constant fraction of leakages is replaced by random noise. To this end, we first prove a lower bound on the share size of any noisy-leakage-resilient sharing scheme. We then use this lower bound to show that there exist universal constants c1, c2, such that for sufficiently large n it holds that Shamir’s secret sharing scheme is not noisy-leakage-resilient for t≤c1.n/log(n), even when a c2 fraction of leakages are replaced by random noise. AU - Hoffmann, Charlotte AU - Simkin, Mark ID - 14457 SN - 0302-9743 T2 - 8th International Conference on Cryptology and Information Security in Latin America TI - Stronger lower bounds for leakage-resilient secret sharing VL - 14168 ER - TY - CONF AB - We show for the first time that large-scale generative pretrained transformer (GPT) family models can be pruned to at least 50% sparsity in one-shot, without any retraining, at minimal loss of accuracy. This is achieved via a new pruning method called SparseGPT, specifically designed to work efficiently and accurately on massive GPT-family models. We can execute SparseGPT on the largest available open-source models, OPT-175B and BLOOM-176B, in under 4.5 hours, and can reach 60% unstructured sparsity with negligible increase in perplexity: remarkably, more than 100 billion weights from these models can be ignored at inference time. SparseGPT generalizes to semi-structured (2:4 and 4:8) patterns, and is compatible with weight quantization approaches. The code is available at: https://github.com/IST-DASLab/sparsegpt. AU - Frantar, Elias AU - Alistarh, Dan-Adrian ID - 14458 T2 - Proceedings of the 40th International Conference on Machine Learning TI - SparseGPT: Massive language models can be accurately pruned in one-shot VL - 202 ER - TY - JOUR AB - We investigate the potential of Multi-Objective, Deep Reinforcement Learning for stock and cryptocurrency single-asset trading: in particular, we consider a Multi-Objective algorithm which generalizes the reward functions and discount factor (i.e., these components are not specified a priori, but incorporated in the learning process). Firstly, using several important assets (BTCUSD, ETHUSDT, XRPUSDT, AAPL, SPY, NIFTY50), we verify the reward generalization property of the proposed Multi-Objective algorithm, and provide preliminary statistical evidence showing increased predictive stability over the corresponding Single-Objective strategy. Secondly, we show that the Multi-Objective algorithm has a clear edge over the corresponding Single-Objective strategy when the reward mechanism is sparse (i.e., when non-null feedback is infrequent over time). Finally, we discuss the generalization properties with respect to the discount factor. The entirety of our code is provided in open-source format. AU - Cornalba, Federico AU - Disselkamp, Constantin AU - Scassola, Davide AU - Helf, Christopher ID - 14451 JF - Neural Computing and Applications SN - 0941-0643 TI - Multi-objective reward generalization: improving performance of Deep Reinforcement Learning for applications in single-asset trading ER -