TY - GEN AB - We derive a weak-strong uniqueness principle for BV solutions to multiphase mean curvature flow of triple line clusters in three dimensions. Our proof is based on the explicit construction of a gradient-flow calibration in the sense of the recent work of Fischer et al. [arXiv:2003.05478] for any such cluster. This extends the two-dimensional construction to the three-dimensional case of surfaces meeting along triple junctions. AU - Hensel, Sebastian AU - Laux, Tim ID - 10013 T2 - arXiv TI - Weak-strong uniqueness for the mean curvature flow of double bubbles ER - TY - JOUR AB - There are two elementary superconducting qubit types that derive directly from the quantum harmonic oscillator. In one, the inductor is replaced by a nonlinear Josephson junction to realize the widely used charge qubits with a compact phase variable and a discrete charge wave function. In the other, the junction is added in parallel, which gives rise to an extended phase variable, continuous wave functions, and a rich energy-level structure due to the loop topology. While the corresponding rf superconducting quantum interference device Hamiltonian was introduced as a quadratic quasi-one-dimensional potential approximation to describe the fluxonium qubit implemented with long Josephson-junction arrays, in this work we implement it directly using a linear superinductor formed by a single uninterrupted aluminum wire. We present a large variety of qubits, all stemming from the same circuit but with drastically different characteristic energy scales. This includes flux and fluxonium qubits but also the recently introduced quasicharge qubit with strongly enhanced zero-point phase fluctuations and a heavily suppressed flux dispersion. The use of a geometric inductor results in high reproducibility of the inductive energy as guaranteed by top-down lithography—a key ingredient for intrinsically protected superconducting qubits. AU - Peruzzo, Matilda AU - Hassani, Farid AU - Szep, Gregory AU - Trioni, Andrea AU - Redchenko, Elena AU - Zemlicka, Martin AU - Fink, Johannes M ID - 9928 IS - 4 JF - PRX Quantum KW - quantum physics KW - mesoscale and nanoscale physics TI - Geometric superinductance qubits: Controlling phase delocalization across a single Josephson junction VL - 2 ER - TY - THES AB - This PhD thesis is primarily focused on the study of discrete transport problems, introduced for the first time in the seminal works of Maas [Maa11] and Mielke [Mie11] on finite state Markov chains and reaction-diffusion equations, respectively. More in detail, my research focuses on the study of transport costs on graphs, in particular the convergence and the stability of such problems in the discrete-to-continuum limit. This thesis also includes some results concerning non-commutative optimal transport. The first chapter of this thesis consists of a general introduction to the optimal transport problems, both in the discrete, the continuous, and the non-commutative setting. Chapters 2 and 3 present the content of two works, obtained in collaboration with Peter Gladbach, Eva Kopfer, and Jan Maas, where we have been able to show the convergence of discrete transport costs on periodic graphs to suitable continuous ones, which can be described by means of a homogenisation result. We first focus on the particular case of quadratic costs on the real line and then extending the result to more general costs in arbitrary dimension. Our results are the first complete characterisation of limits of transport costs on periodic graphs in arbitrary dimension which do not rely on any additional symmetry. In Chapter 4 we turn our attention to one of the intriguing connection between evolution equations and optimal transport, represented by the theory of gradient flows. We show that discrete gradient flow structures associated to a finite volume approximation of a certain class of diffusive equations (Fokker–Planck) is stable in the limit of vanishing meshes, reproving the convergence of the scheme via the method of evolutionary Γ-convergence and exploiting a more variational point of view on the problem. This is based on a collaboration with Dominik Forkert and Jan Maas. Chapter 5 represents a change of perspective, moving away from the discrete world and reaching the non-commutative one. As in the discrete case, we discuss how classical tools coming from the commutative optimal transport can be translated into the setting of density matrices. In particular, in this final chapter we present a non-commutative version of the Schrödinger problem (or entropic regularised optimal transport problem) and discuss existence and characterisation of minimisers, a duality result, and present a non-commutative version of the well-known Sinkhorn algorithm to compute the above mentioned optimisers. This is based on a joint work with Dario Feliciangeli and Augusto Gerolin. Finally, Appendix A and B contain some additional material and discussions, with particular attention to Harnack inequalities and the regularity of flows on discrete spaces. AU - Portinale, Lorenzo ID - 10030 SN - 2663-337X TI - Discrete-to-continuum limits of transport problems and gradient flows in the space of measures ER - TY - THES AB - This work is concerned with two fascinating circuit quantum electrodynamics components, the Josephson junction and the geometric superinductor, and the interesting experiments that can be done by combining the two. The Josephson junction has revolutionized the field of superconducting circuits as a non-linear dissipation-less circuit element and is used in almost all superconducting qubit implementations since the 90s. On the other hand, the superinductor is a relatively new circuit element introduced as a key component of the fluxonium qubit in 2009. This is an inductor with characteristic impedance larger than the resistance quantum and self-resonance frequency in the GHz regime. The combination of these two elements can occur in two fundamental ways: in parallel and in series. When connected in parallel the two create the fluxonium qubit, a loop with large inductance and a rich energy spectrum reliant on quantum tunneling. On the other hand placing the two elements in series aids with the measurement of the IV curve of a single Josephson junction in a high impedance environment. In this limit theory predicts that the junction will behave as its dual element: the phase-slip junction. While the Josephson junction acts as a non-linear inductor the phase-slip junction has the behavior of a non-linear capacitance and can be used to measure new Josephson junction phenomena, namely Coulomb blockade of Cooper pairs and phase-locked Bloch oscillations. The latter experiment allows for a direct link between frequency and current which is an elusive connection in quantum metrology. This work introduces the geometric superinductor, a superconducting circuit element where the high inductance is due to the geometry rather than the material properties of the superconductor, realized from a highly miniaturized superconducting planar coil. These structures will be described and characterized as resonators and qubit inductors and progress towards the measurement of phase-locked Bloch oscillations will be presented. AU - Peruzzo, Matilda ID - 9920 KW - quantum computing KW - superinductor KW - quantum metrology SN - 2663-337X TI - Geometric superinductors and their applications in circuit quantum electrodynamics ER - TY - CONF AB - One key element behind the recent progress of machine learning has been the ability to train machine learning models in large-scale distributed shared-memory and message-passing environments. Most of these models are trained employing variants of stochastic gradient descent (SGD) based optimization, but most methods involve some type of consistency relaxation relative to sequential SGD, to mitigate its large communication or synchronization costs at scale. In this paper, we introduce a general consistency condition covering communication-reduced and asynchronous distributed SGD implementations. Our framework, called elastic consistency, decouples the system-specific aspects of the implementation from the SGD convergence requirements, giving a general way to obtain convergence bounds for a wide variety of distributed SGD methods used in practice. Elastic consistency can be used to re-derive or improve several previous convergence bounds in message-passing and shared-memory settings, but also to analyze new models and distribution schemes. As a direct application, we propose and analyze a new synchronization-avoiding scheduling scheme for distributed SGD, and show that it can be used to efficiently train deep convolutional models for image classification. AU - Nadiradze, Giorgi AU - Markov, Ilia AU - Chatterjee, Bapi AU - Kungurtsev, Vyacheslav AU - Alistarh, Dan-Adrian ID - 10432 IS - 10 T2 - Proceedings of the AAAI Conference on Artificial Intelligence TI - Elastic consistency: A practical consistency model for distributed stochastic gradient descent VL - 35 ER - TY - CONF AB - Yao’s garbling scheme is one of the most fundamental cryptographic constructions. Lindell and Pinkas (Journal of Cryptograhy 2009) gave a formal proof of security in the selective setting where the adversary chooses the challenge inputs before seeing the garbled circuit assuming secure symmetric-key encryption (and hence one-way functions). This was followed by results, both positive and negative, concerning its security in the, stronger, adaptive setting. Applebaum et al. (Crypto 2013) showed that it cannot satisfy adaptive security as is, due to a simple incompressibility argument. Jafargholi and Wichs (TCC 2017) considered a natural adaptation of Yao’s scheme (where the output mapping is sent in the online phase, together with the garbled input) that circumvents this negative result, and proved that it is adaptively secure, at least for shallow circuits. In particular, they showed that for the class of circuits of depth δ , the loss in security is at most exponential in δ . The above results all concern the simulation-based notion of security. In this work, we show that the upper bound of Jafargholi and Wichs is basically optimal in a strong sense. As our main result, we show that there exists a family of Boolean circuits, one for each depth δ∈N , such that any black-box reduction proving the adaptive indistinguishability of the natural adaptation of Yao’s scheme from any symmetric-key encryption has to lose a factor that is exponential in δ√ . Since indistinguishability is a weaker notion than simulation, our bound also applies to adaptive simulation. To establish our results, we build on the recent approach of Kamath et al. (Eprint 2021), which uses pebbling lower bounds in conjunction with oracle separations to prove fine-grained lower bounds on loss in cryptographic security. AU - Kamath Hosdurg, Chethan AU - Klein, Karen AU - Pietrzak, Krzysztof Z AU - Wichs, Daniel ID - 10041 SN - 0302-9743 T2 - 41st Annual International Cryptology Conference, Part II TI - Limits on the Adaptive Security of Yao’s Garbling VL - 12826 ER - TY - CONF AB - While messaging systems with strong security guarantees are widely used in practice, designing a protocol that scales efficiently to large groups and enjoys similar security guarantees remains largely open. The two existing proposals to date are ART (Cohn-Gordon et al., CCS18) and TreeKEM (IETF, The Messaging Layer Security Protocol, draft). TreeKEM is the currently considered candidate by the IETF MLS working group, but dynamic group operations (i.e. adding and removing users) can cause efficiency issues. In this paper we formalize and analyze a variant of TreeKEM which we term Tainted TreeKEM (TTKEM for short). The basic idea underlying TTKEM was suggested by Millican (MLS mailing list, February 2018). This version is more efficient than TreeKEM for some natural distributions of group operations, we quantify this through simulations.Our second contribution is two security proofs for TTKEM which establish post compromise and forward secrecy even against adaptive attackers. The security loss (to the underlying PKE) in the Random Oracle Model is a polynomial factor, and a quasipolynomial one in the Standard Model. Our proofs can be adapted to TreeKEM as well. Before our work no security proof for any TreeKEM-like protocol establishing tight security against an adversary who can adaptively choose the sequence of operations was known. We also are the first to prove (or even formalize) active security where the server can arbitrarily deviate from the protocol specification. Proving fully active security – where also the users can arbitrarily deviate – remains open. AU - Klein, Karen AU - Pascual Perez, Guillermo AU - Walter, Michael AU - Kamath Hosdurg, Chethan AU - Capretto, Margarita AU - Cueto Noval, Miguel AU - Markov, Ilia AU - Yeo, Michelle X AU - Alwen, Joel F AU - Pietrzak, Krzysztof Z ID - 10049 T2 - 2021 IEEE Symposium on Security and Privacy TI - Keep the dirt: tainted TreeKEM, adaptively and actively secure continuous group key agreement ER - TY - CONF AB - We show that Yao’s garbling scheme is adaptively indistinguishable for the class of Boolean circuits of size S and treewidth w with only a S^O(w) loss in security. For instance, circuits with constant treewidth are as a result adaptively indistinguishable with only a polynomial loss. This (partially) complements a negative result of Applebaum et al. (Crypto 2013), which showed (assuming one-way functions) that Yao’s garbling scheme cannot be adaptively simulatable. As main technical contributions, we introduce a new pebble game that abstracts out our security reduction and then present a pebbling strategy for this game where the number of pebbles used is roughly O(d w log(S)), d being the fan-out of the circuit. The design of the strategy relies on separators, a graph-theoretic notion with connections to circuit complexity. AU - Kamath Hosdurg, Chethan AU - Klein, Karen AU - Pietrzak, Krzysztof Z ID - 10044 T2 - 19th Theory of Cryptography Conference 2021 TI - On treewidth, separators and Yao's garbling ER - TY - THES AB - Those who aim to devise new materials with desirable properties usually examine present methods first. However, they will find out that some approaches can exist only conceptually without high chances to become practically useful. It seems that a numerical technique called automatic differentiation together with increasing supply of computational accelerators will soon shift many methods of the material design from the category ”unimaginable” to the category ”expensive but possible”. Approach we suggest is not an exception. Our overall goal is to have an efficient and generalizable approach allowing to solve inverse design problems. In this thesis we scratch its surface. We consider jammed systems of identical particles. And ask ourselves how the shape of those particles (or the parameters codifying it) may affect mechanical properties of the system. An indispensable part of reaching the answer is an appropriate particle parametrization. We come up with a simple, yet generalizable and purposeful scheme for it. Using our generalizable shape parameterization, we simulate the formation of a solid composed of pentagonal-like particles and measure anisotropy in the resulting elastic response. Through automatic differentiation techniques, we directly connect the shape parameters with the elastic response. Interestingly, for our system we find that less isotropic particles lead to a more isotropic elastic response. Together with other results known about our method it seems that it can be successfully generalized for different inverse design problems. AU - Piankov, Anton ID - 10422 SN - 2791-4585 TI - Towards designer materials using customizable particle shape ER - TY - GEN AB - Given the abundance of applications of ranking in recent years, addressing fairness concerns around automated ranking systems becomes necessary for increasing the trust among end-users. Previous work on fair ranking has mostly focused on application-specific fairness notions, often tailored to online advertising, and it rarely considers learning as part of the process. In this work, we show how to transfer numerous fairness notions from binary classification to a learning to rank setting. Our formalism allows us to design methods for incorporating fairness objectives with provable generalization guarantees. An extensive experimental evaluation shows that our method can improve ranking fairness substantially with no or only little loss of model quality. AU - Konstantinov, Nikola H AU - Lampert, Christoph ID - 10803 T2 - arXiv TI - Fairness through regularization for learning to rank ER - TY - GEN AB - Methods inspired from machine learning have recently attracted great interest in the computational study of quantum many-particle systems. So far, however, it has proven challenging to deal with microscopic models in which the total number of particles is not conserved. To address this issue, we propose a new variant of neural network states, which we term neural coherent states. Taking the Fröhlich impurity model as a case study, we show that neural coherent states can learn the ground state of non-additive systems very well. In particular, we observe substantial improvement over the standard coherent state estimates in the most challenging intermediate coupling regime. Our approach is generic and does not assume specific details of the system, suggesting wide applications. AU - Rzadkowski, Wojciech AU - Lemeshko, Mikhail AU - Mentink, Johan H. ID - 10762 T2 - arXiv TI - Artificial neural network states for non-additive systems ER - TY - THES AB - Deep learning is best known for its empirical success across a wide range of applications spanning computer vision, natural language processing and speech. Of equal significance, though perhaps less known, are its ramifications for learning theory: deep networks have been observed to perform surprisingly well in the high-capacity regime, aka the overfitting or underspecified regime. Classically, this regime on the far right of the bias-variance curve is associated with poor generalisation; however, recent experiments with deep networks challenge this view. This thesis is devoted to investigating various aspects of underspecification in deep learning. First, we argue that deep learning models are underspecified on two levels: a) any given training dataset can be fit by many different functions, and b) any given function can be expressed by many different parameter configurations. We refer to the second kind of underspecification as parameterisation redundancy and we precisely characterise its extent. Second, we characterise the implicit criteria (the inductive bias) that guide learning in the underspecified regime. Specifically, we consider a nonlinear but tractable classification setting, and show that given the choice, neural networks learn classifiers with a large margin. Third, we consider learning scenarios where the inductive bias is not by itself sufficient to deal with underspecification. We then study different ways of ‘tightening the specification’: i) In the setting of representation learning with variational autoencoders, we propose a hand- crafted regulariser based on mutual information. ii) In the setting of binary classification, we consider soft-label (real-valued) supervision. We derive a generalisation bound for linear networks supervised in this way and verify that soft labels facilitate fast learning. Finally, we explore an application of soft-label supervision to the training of multi-exit models. AU - Bui Thi Mai, Phuong ID - 9418 SN - 2663-337X TI - Underspecification in deep learning ER - TY - CONF AB - The focus of disentanglement approaches has been on identifying independent factors of variation in data. However, the causal variables underlying real-world observations are often not statistically independent. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data in a large-scale empirical study (including 4260 models). We show and quantify that systematically induced correlations in the dataset are being learned and reflected in the latent representations, which has implications for downstream applications of disentanglement such as fairness. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels. AU - Träuble, Frederik AU - Creager, Elliot AU - Kilbertus, Niki AU - Locatello, Francesco AU - Dittadi, Andrea AU - Goyal, Anirudh AU - Schölkopf, Bernhard AU - Bauer, Stefan ID - 14177 T2 - Proceedings of the 38th International Conference on Machine Learning TI - On disentangled representations learned from correlated data VL - 139 ER - TY - CONF AB - Intensive care units (ICU) are increasingly looking towards machine learning for methods to provide online monitoring of critically ill patients. In machine learning, online monitoring is often formulated as a supervised learning problem. Recently, contrastive learning approaches have demonstrated promising improvements over competitive supervised benchmarks. These methods rely on well-understood data augmentation techniques developed for image data which do not apply to online monitoring. In this work, we overcome this limitation by supplementing time-series data augmentation techniques with a novel contrastive learning objective which we call neighborhood contrastive learning (NCL). Our objective explicitly groups together contiguous time segments from each patient while maintaining state-specific information. Our experiments demonstrate a marked improvement over existing work applying contrastive methods to medical time-series. AU - Yèche, Hugo AU - Dresdner, Gideon AU - Locatello, Francesco AU - Hüser, Matthias AU - Rätsch, Gunnar ID - 14176 T2 - Proceedings of 38th International Conference on Machine Learning TI - Neighborhood contrastive learning applied to online patient monitoring VL - 139 ER - TY - CONF AB - When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly improving state-of-the-art models. While new improved models develop at a fast pace, downstream tasks vary more slowly or stay constant. Assume that we have a large unlabelled data set for which we want to maintain accurate predictions. Whenever a new and presumably better ML models becomes available, we encounter two problems: (i) given a limited budget, which data points should be re-evaluated using the new model?; and (ii) if the new predictions differ from the current ones, should we update? Problem (i) is about compute cost, which matters for very large data sets and models. Problem (ii) is about maintaining consistency of the predictions, which can be highly relevant for downstream applications; our demand is to avoid negative flips, i.e., changing correct to incorrect predictions. In this paper, we formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates. AU - Träuble, Frederik AU - Kügelgen, Julius von AU - Kleindessner, Matthäus AU - Locatello, Francesco AU - Schölkopf, Bernhard AU - Gehler, Peter ID - 14182 SN - 9781713845393 T2 - 35th Conference on Neural Information Processing Systems TI - Backward-compatible prediction updates: A probabilistic approach VL - 34 ER - TY - CONF AB - Variational Inference makes a trade-off between the capacity of the variational family and the tractability of finding an approximate posterior distribution. Instead, Boosting Variational Inference allows practitioners to obtain increasingly good posterior approximations by spending more compute. The main obstacle to widespread adoption of Boosting Variational Inference is the amount of resources necessary to improve over a strong Variational Inference baseline. In our work, we trace this limitation back to the global curvature of the KL-divergence. We characterize how the global curvature impacts time and memory consumption, address the problem with the notion of local curvature, and provide a novel approximate backtracking algorithm for estimating local curvature. We give new theoretical convergence rates for our algorithms and provide experimental validation on synthetic and real-world datasets. AU - Dresdner, Gideon AU - Shekhar, Saurav AU - Pedregosa, Fabian AU - Locatello, Francesco AU - Rätsch, Gunnar ID - 14181 T2 - Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence TI - Boosting variational inference with locally adaptive step-sizes ER - TY - CONF AB - Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant to augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice. AU - Kügelgen, Julius von AU - Sharma, Yash AU - Gresele, Luigi AU - Brendel, Wieland AU - Schölkopf, Bernhard AU - Besserve, Michel AU - Locatello, Francesco ID - 14179 SN - 9781713845393 T2 - Advances in Neural Information Processing Systems TI - Self-supervised learning with data augmentations provably isolates content from style VL - 34 ER - TY - CONF AB - Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call \emph{functions}. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization. AU - Rahaman, Nasim AU - Gondal, Muhammad Waleed AU - Joshi, Shruti AU - Gehler, Peter AU - Bengio, Yoshua AU - Locatello, Francesco AU - Schölkopf, Bernhard ID - 14180 SN - 9781713845393 T2 - Advances in Neural Information Processing Systems TI - Dynamic inference with neural interpreters VL - 34 ER - TY - JOUR AB - The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities. AU - Scholkopf, Bernhard AU - Locatello, Francesco AU - Bauer, Stefan AU - Ke, Nan Rosemary AU - Kalchbrenner, Nal AU - Goyal, Anirudh AU - Bengio, Yoshua ID - 14117 IS - 5 JF - Proceedings of the IEEE KW - Electrical and Electronic Engineering SN - 0018-9219 TI - Toward causal representation learning VL - 109 ER - TY - CONF AB - Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable. We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution. We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance. AU - Dittadi, Andrea AU - Träuble, Frederik AU - Locatello, Francesco AU - Wüthrich, Manuel AU - Agrawal, Vaibhav AU - Winther, Ole AU - Bauer, Stefan AU - Schölkopf, Bernhard ID - 14178 T2 - The Ninth International Conference on Learning Representations TI - On the transfer of disentangled representations in realistic settings ER -