TY - GEN AB - Methods inspired from machine learning have recently attracted great interest in the computational study of quantum many-particle systems. So far, however, it has proven challenging to deal with microscopic models in which the total number of particles is not conserved. To address this issue, we propose a new variant of neural network states, which we term neural coherent states. Taking the Fröhlich impurity model as a case study, we show that neural coherent states can learn the ground state of non-additive systems very well. In particular, we observe substantial improvement over the standard coherent state estimates in the most challenging intermediate coupling regime. Our approach is generic and does not assume specific details of the system, suggesting wide applications. AU - Rzadkowski, Wojciech AU - Lemeshko, Mikhail AU - Mentink, Johan H. ID - 10762 T2 - arXiv TI - Artificial neural network states for non-additive systems ER - TY - THES AB - Deep learning is best known for its empirical success across a wide range of applications spanning computer vision, natural language processing and speech. Of equal significance, though perhaps less known, are its ramifications for learning theory: deep networks have been observed to perform surprisingly well in the high-capacity regime, aka the overfitting or underspecified regime. Classically, this regime on the far right of the bias-variance curve is associated with poor generalisation; however, recent experiments with deep networks challenge this view. This thesis is devoted to investigating various aspects of underspecification in deep learning. First, we argue that deep learning models are underspecified on two levels: a) any given training dataset can be fit by many different functions, and b) any given function can be expressed by many different parameter configurations. We refer to the second kind of underspecification as parameterisation redundancy and we precisely characterise its extent. Second, we characterise the implicit criteria (the inductive bias) that guide learning in the underspecified regime. Specifically, we consider a nonlinear but tractable classification setting, and show that given the choice, neural networks learn classifiers with a large margin. Third, we consider learning scenarios where the inductive bias is not by itself sufficient to deal with underspecification. We then study different ways of ‘tightening the specification’: i) In the setting of representation learning with variational autoencoders, we propose a hand- crafted regulariser based on mutual information. ii) In the setting of binary classification, we consider soft-label (real-valued) supervision. We derive a generalisation bound for linear networks supervised in this way and verify that soft labels facilitate fast learning. Finally, we explore an application of soft-label supervision to the training of multi-exit models. AU - Bui Thi Mai, Phuong ID - 9418 SN - 2663-337X TI - Underspecification in deep learning ER - TY - CONF AB - The focus of disentanglement approaches has been on identifying independent factors of variation in data. However, the causal variables underlying real-world observations are often not statistically independent. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data in a large-scale empirical study (including 4260 models). We show and quantify that systematically induced correlations in the dataset are being learned and reflected in the latent representations, which has implications for downstream applications of disentanglement such as fairness. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels. AU - Träuble, Frederik AU - Creager, Elliot AU - Kilbertus, Niki AU - Locatello, Francesco AU - Dittadi, Andrea AU - Goyal, Anirudh AU - Schölkopf, Bernhard AU - Bauer, Stefan ID - 14177 T2 - Proceedings of the 38th International Conference on Machine Learning TI - On disentangled representations learned from correlated data VL - 139 ER - TY - CONF AB - Intensive care units (ICU) are increasingly looking towards machine learning for methods to provide online monitoring of critically ill patients. In machine learning, online monitoring is often formulated as a supervised learning problem. Recently, contrastive learning approaches have demonstrated promising improvements over competitive supervised benchmarks. These methods rely on well-understood data augmentation techniques developed for image data which do not apply to online monitoring. In this work, we overcome this limitation by supplementing time-series data augmentation techniques with a novel contrastive learning objective which we call neighborhood contrastive learning (NCL). Our objective explicitly groups together contiguous time segments from each patient while maintaining state-specific information. Our experiments demonstrate a marked improvement over existing work applying contrastive methods to medical time-series. AU - Yèche, Hugo AU - Dresdner, Gideon AU - Locatello, Francesco AU - Hüser, Matthias AU - Rätsch, Gunnar ID - 14176 T2 - Proceedings of 38th International Conference on Machine Learning TI - Neighborhood contrastive learning applied to online patient monitoring VL - 139 ER - TY - CONF AB - When machine learning systems meet real world applications, accuracy is only one of several requirements. In this paper, we assay a complementary perspective originating from the increasing availability of pre-trained and regularly improving state-of-the-art models. While new improved models develop at a fast pace, downstream tasks vary more slowly or stay constant. Assume that we have a large unlabelled data set for which we want to maintain accurate predictions. Whenever a new and presumably better ML models becomes available, we encounter two problems: (i) given a limited budget, which data points should be re-evaluated using the new model?; and (ii) if the new predictions differ from the current ones, should we update? Problem (i) is about compute cost, which matters for very large data sets and models. Problem (ii) is about maintaining consistency of the predictions, which can be highly relevant for downstream applications; our demand is to avoid negative flips, i.e., changing correct to incorrect predictions. In this paper, we formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions. In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies along key metrics for backward-compatible prediction updates. AU - Träuble, Frederik AU - Kügelgen, Julius von AU - Kleindessner, Matthäus AU - Locatello, Francesco AU - Schölkopf, Bernhard AU - Gehler, Peter ID - 14182 SN - 9781713845393 T2 - 35th Conference on Neural Information Processing Systems TI - Backward-compatible prediction updates: A probabilistic approach VL - 34 ER - TY - CONF AB - Variational Inference makes a trade-off between the capacity of the variational family and the tractability of finding an approximate posterior distribution. Instead, Boosting Variational Inference allows practitioners to obtain increasingly good posterior approximations by spending more compute. The main obstacle to widespread adoption of Boosting Variational Inference is the amount of resources necessary to improve over a strong Variational Inference baseline. In our work, we trace this limitation back to the global curvature of the KL-divergence. We characterize how the global curvature impacts time and memory consumption, address the problem with the notion of local curvature, and provide a novel approximate backtracking algorithm for estimating local curvature. We give new theoretical convergence rates for our algorithms and provide experimental validation on synthetic and real-world datasets. AU - Dresdner, Gideon AU - Shekhar, Saurav AU - Pedregosa, Fabian AU - Locatello, Francesco AU - Rätsch, Gunnar ID - 14181 T2 - Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence TI - Boosting variational inference with locally adaptive step-sizes ER - TY - CONF AB - Self-supervised representation learning has shown remarkable success in a number of domains. A common practice is to perform data augmentation via hand-crafted transformations intended to leave the semantics of the data invariant. We seek to understand the empirical success of this approach from a theoretical perspective. We formulate the augmentation process as a latent variable model by postulating a partition of the latent representation into a content component, which is assumed invariant to augmentation, and a style component, which is allowed to change. Unlike prior work on disentanglement and independent component analysis, we allow for both nontrivial statistical and causal dependencies in the latent space. We study the identifiability of the latent representation based on pairs of views of the observations and prove sufficient conditions that allow us to identify the invariant content partition up to an invertible mapping in both generative and discriminative settings. We find numerical simulations with dependent latent variables are consistent with our theory. Lastly, we introduce Causal3DIdent, a dataset of high-dimensional, visually complex images with rich causal dependencies, which we use to study the effect of data augmentations performed in practice. AU - Kügelgen, Julius von AU - Sharma, Yash AU - Gresele, Luigi AU - Brendel, Wieland AU - Schölkopf, Bernhard AU - Besserve, Michel AU - Locatello, Francesco ID - 14179 SN - 9781713845393 T2 - Advances in Neural Information Processing Systems TI - Self-supervised learning with data augmentations provably isolates content from style VL - 34 ER - TY - CONF AB - Modern neural network architectures can leverage large amounts of data to generalize well within the training distribution. However, they are less capable of systematic generalization to data drawn from unseen but related distributions, a feat that is hypothesized to require compositional reasoning and reuse of knowledge. In this work, we present Neural Interpreters, an architecture that factorizes inference in a self-attention network as a system of modules, which we call \emph{functions}. Inputs to the model are routed through a sequence of functions in a way that is end-to-end learned. The proposed architecture can flexibly compose computation along width and depth, and lends itself well to capacity extension after training. To demonstrate the versatility of Neural Interpreters, we evaluate it in two distinct settings: image classification and visual abstract reasoning on Raven Progressive Matrices. In the former, we show that Neural Interpreters perform on par with the vision transformer using fewer parameters, while being transferrable to a new task in a sample efficient manner. In the latter, we find that Neural Interpreters are competitive with respect to the state-of-the-art in terms of systematic generalization. AU - Rahaman, Nasim AU - Gondal, Muhammad Waleed AU - Joshi, Shruti AU - Gehler, Peter AU - Bengio, Yoshua AU - Locatello, Francesco AU - Schölkopf, Bernhard ID - 14180 SN - 9781713845393 T2 - Advances in Neural Information Processing Systems TI - Dynamic inference with neural interpreters VL - 34 ER - TY - JOUR AB - The two fields of machine learning and graphical causality arose and are developed separately. However, there is, now, cross-pollination and increasing interest in both fields to benefit from the advances of the other. In this article, we review fundamental concepts of causal inference and relate them to crucial open problems of machine learning, including transfer and generalization, thereby assaying how causality can contribute to modern machine learning research. This also applies in the opposite direction: we note that most work in causality starts from the premise that the causal variables are given. A central problem for AI and causality is, thus, causal representation learning, that is, the discovery of high-level causal variables from low-level observations. Finally, we delineate some implications of causality for machine learning and propose key research areas at the intersection of both communities. AU - Scholkopf, Bernhard AU - Locatello, Francesco AU - Bauer, Stefan AU - Ke, Nan Rosemary AU - Kalchbrenner, Nal AU - Goyal, Anirudh AU - Bengio, Yoshua ID - 14117 IS - 5 JF - Proceedings of the IEEE KW - Electrical and Electronic Engineering SN - 0018-9219 TI - Toward causal representation learning VL - 109 ER - TY - CONF AB - Learning meaningful representations that disentangle the underlying structure of the data generating process is considered to be of key importance in machine learning. While disentangled representations were found to be useful for diverse tasks such as abstract reasoning and fair classification, their scalability and real-world impact remain questionable. We introduce a new high-resolution dataset with 1M simulated images and over 1,800 annotated real-world images of the same setup. In contrast to previous work, this new dataset exhibits correlations, a complex underlying structure, and allows to evaluate transfer to unseen simulated and real-world settings where the encoder i) remains in distribution or ii) is out of distribution. We propose new architectures in order to scale disentangled representation learning to realistic high-resolution settings and conduct a large-scale empirical study of disentangled representations on this dataset. We observe that disentanglement is a good predictor for out-of-distribution (OOD) task performance. AU - Dittadi, Andrea AU - Träuble, Frederik AU - Locatello, Francesco AU - Wüthrich, Manuel AU - Agrawal, Vaibhav AU - Winther, Ole AU - Bauer, Stefan AU - Schölkopf, Bernhard ID - 14178 T2 - The Ninth International Conference on Learning Representations TI - On the transfer of disentangled representations in realistic settings ER - TY - GEN AB - The world is structured in countless ways. It may be prudent to enforce corresponding structural properties to a learning algorithm's solution, such as incorporating prior beliefs, natural constraints, or causal structures. Doing so may translate to faster, more accurate, and more flexible models, which may directly relate to real-world impact. In this dissertation, we consider two different research areas that concern structuring a learning algorithm's solution: when the structure is known and when it has to be discovered. AU - Locatello, Francesco ID - 14221 T2 - arXiv TI - Enforcing and discovering structure in machine learning ER - TY - GEN AB - The Birkhoff conjecture says that the boundary of a strictly convex integrable billiard table is necessarily an ellipse. In this article, we consider a stronger notion of integrability, namely, integrability close to the boundary, and prove a local version of this conjecture: a small perturbation of almost every ellipse that preserves integrability near the boundary, is itself an ellipse. We apply this result to study local spectral rigidity of ellipses using the connection between the wave trace of the Laplacian and the dynamics near the boundary and establish rigidity for almost all of them. AU - Koval, Illya ID - 14278 T2 - arXiv TI - Local strong Birkhoff conjecture and local spectral rigidity of almost every ellipse ER - TY - THES AB - The design and verification of concurrent systems remains an open challenge due to the non-determinism that arises from the inter-process communication. In particular, concurrent programs are notoriously difficult both to be written correctly and to be analyzed formally, as complex thread interaction has to be accounted for. The difficulties are further exacerbated when concurrent programs get executed on modern-day hardware, which contains various buffering and caching mechanisms for efficiency reasons. This causes further subtle non-determinism, which can often produce very unintuitive behavior of the concurrent programs. Model checking is at the forefront of tackling the verification problem, where the task is to decide, given as input a concurrent system and a desired property, whether the system satisfies the property. The inherent state-space explosion problem in model checking of concurrent systems causes naïve explicit methods not to scale, thus more inventive methods are required. One such method is stateless model checking (SMC), which explores in memory-efficient manner the program executions rather than the states of the program. State-of-the-art SMC is typically coupled with partial order reduction (POR) techniques, which argue that certain executions provably produce identical system behavior, thus limiting the amount of executions one needs to explore in order to cover all possible behaviors. Another method to tackle the state-space explosion is symbolic model checking, where the considered techniques operate on a succinct implicit representation of the input system rather than explicitly accessing the system. In this thesis we present new techniques for verification of concurrent systems. We present several novel POR methods for SMC of concurrent programs under various models of semantics, some of which account for write-buffering mechanisms. Additionally, we present novel algorithms for symbolic model checking of finite-state concurrent systems, where the desired property of the systems is to ensure a formally defined notion of fairness. AU - Toman, Viktor ID - 10199 KW - concurrency KW - verification KW - model checking SN - 2663-337X TI - Improved verification techniques for concurrent systems ER - TY - JOUR AB - We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32–44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data. AU - Patxot, Marion AU - Trejo Banos, Daniel AU - Kousathanas, Athanasios AU - Orliac, Etienne J AU - Ojavee, Sven E AU - Moser, Gerhard AU - Sidorenko, Julia AU - Kutalik, Zoltan AU - Magi, Reedik AU - Visscher, Peter M AU - Ronnegard, Lars AU - Robinson, Matthew Richard ID - 8429 IS - 1 JF - Nature Communications TI - Probabilistic inference of the genetic architecture underlying functional enrichment of complex traits VL - 12 ER - TY - CONF AB - Consider a distributed task where the communication network is fixed but the local inputs given to the nodes of the distributed system may change over time. In this work, we explore the following question: if some of the local inputs change, can an existing solution be updated efficiently, in a dynamic and distributed manner? To address this question, we define the batch dynamic CONGEST model in which we are given a bandwidth-limited communication network and a dynamic edge labelling defines the problem input. The task is to maintain a solution to a graph problem on the labelled graph under batch changes. We investigate, when a batch of alpha edge label changes arrive, - how much time as a function of alpha we need to update an existing solution, and - how much information the nodes have to keep in local memory between batches in order to update the solution quickly. Our work lays the foundations for the theory of input-dynamic distributed network algorithms. We give a general picture of the complexity landscape in this model, design both universal algorithms and algorithms for concrete problems, and present a general framework for lower bounds. The diverse time complexity of our model spans from constant time, through time polynomial in alpha, and to alpha time, which we show to be enough for any task. AU - Foerster, Klaus-Tycho AU - Korhonen, Janne AU - Paz, Ami AU - Rybicki, Joel AU - Schmid, Stefan ID - 10854 SN - 9781450380720 T2 - Abstract Proceedings of the 2021 ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems TI - Input-dynamic distributed algorithms for communication networks ER - TY - JOUR AB - Consider a distributed task where the communication network is fixed but the local inputs given to the nodes of the distributed system may change over time. In this work, we explore the following question: if some of the local inputs change, can an existing solution be updated efficiently, in a dynamic and distributed manner? To address this question, we define the batch dynamic \congest model in which we are given a bandwidth-limited communication network and a dynamic edge labelling defines the problem input. The task is to maintain a solution to a graph problem on the labeled graph under batch changes. We investigate, when a batch of α edge label changes arrive, \beginitemize \item how much time as a function of α we need to update an existing solution, and \item how much information the nodes have to keep in local memory between batches in order to update the solution quickly. \enditemize Our work lays the foundations for the theory of input-dynamic distributed network algorithms. We give a general picture of the complexity landscape in this model, design both universal algorithms and algorithms for concrete problems, and present a general framework for lower bounds. In particular, we derive non-trivial upper bounds for two selected, contrasting problems: maintaining a minimum spanning tree and detecting cliques. AU - Foerster, Klaus-Tycho AU - Korhonen, Janne AU - Paz, Ami AU - Rybicki, Joel AU - Schmid, Stefan ID - 10855 IS - 1 JF - Proceedings of the ACM on Measurement and Analysis of Computing Systems KW - Computer Networks and Communications KW - Hardware and Architecture KW - Safety KW - Risk KW - Reliability and Quality KW - Computer Science (miscellaneous) SN - 2476-1249 TI - Input-dynamic distributed algorithms for communication networks VL - 5 ER - TY - JOUR AB - We consider planning problems for graphs, Markov Decision Processes (MDPs), and games on graphs in an explicit state space. While graphs represent the most basic planning model, MDPs represent interaction with nature and games on graphs represent interaction with an adversarial environment. We consider two planning problems with k different target sets: (a) the coverage problem asks whether there is a plan for each individual target set; and (b) the sequential target reachability problem asks whether the targets can be reached in a given sequence. For the coverage problem, we present a linear-time algorithm for graphs, and quadratic conditional lower bound for MDPs and games on graphs. For the sequential target problem, we present a linear-time algorithm for graphs, a sub-quadratic algorithm for MDPs, and a quadratic conditional lower bound for games on graphs. Our results with conditional lower bounds, based on the boolean matrix multiplication (BMM) conjecture and strong exponential time hypothesis (SETH), establish (i) model-separation results showing that for the coverage problem MDPs and games on graphs are harder than graphs, and for the sequential reachability problem games on graphs are harder than MDPs and graphs; and (ii) problem-separation results showing that for MDPs the coverage problem is harder than the sequential target problem. AU - Chatterjee, Krishnendu AU - Dvořák, Wolfgang AU - Henzinger, Monika H AU - Svozil, Alexander ID - 9293 IS - 8 JF - Artificial Intelligence SN - 0004-3702 TI - Algorithms and conditional lower bounds for planning problems VL - 297 ER - TY - GEN AB - We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only $\leq$ 10\% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32-44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having >95% probability of contributing >0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data. AU - Robinson, Matthew Richard ID - 13063 TI - Probabilistic inference of the genetic architecture of functional enrichment of complex traits ER - TY - JOUR AB - The high processing cost, poor mechanical properties and moderate performance of Bi2Te3–based alloys used in thermoelectric devices limit the cost-effectiveness of this energy conversion technology. Towards solving these current challenges, in the present work, we detail a low temperature solution-based approach to produce Bi2Te3-Cu2-xTe nanocomposites with improved thermoelectric performance. Our approach consists in combining proper ratios of colloidal nanoparticles and to consolidate the resulting mixture into nanocomposites using a hot press. The transport properties of the nanocomposites are characterized and compared with those of pure Bi2Te3 nanomaterials obtained following the same procedure. In contrast with most previous works, the presence of Cu2-xTe nanodomains does not result in a significant reduction of the lattice thermal conductivity of the reference Bi2Te3 nanomaterial, which is already very low. However, the introduction of Cu2-xTe yields a nearly threefold increase of the power factor associated to a simultaneous increase of the Seebeck coefficient and electrical conductivity at temperatures above 400 K. Taking into account the band alignment of the two materials, we rationalize this increase by considering that Cu2-xTe nanostructures, with a relatively low electron affinity, are able to inject electrons into Bi2Te3, enhancing in this way its electrical conductivity. The simultaneous increase of the Seebeck coefficient is related to the energy filtering of charge carriers at energy barriers within Bi2Te3 domains associated with the accumulation of electrons in regions nearby a Cu2-xTe/Bi2Te3 heterojunction. Overall, with the incorporation of a proper amount of Cu2-xTe nanoparticles, we demonstrate a 250% improvement of the thermoelectric figure of merit of Bi2Te3. AU - Zhang, Yu AU - Xing, Congcong AU - Liu, Yu AU - Li, Mengyao AU - Xiao, Ke AU - Guardia, Pablo AU - Lee, Seungho AU - Han, Xu AU - Moghaddam, Ahmad AU - Roa, Joan J AU - Arbiol, Jordi AU - Ibáñez, Maria AU - Pan, Kai AU - Prato, Mirko AU - Xie, Ying AU - Cabot, Andreu ID - 9304 IS - 8 JF - Chemical Engineering Journal SN - 1385-8947 TI - Influence of copper telluride nanodomains on the transport properties of n-type bismuth telluride VL - 418 ER - TY - JOUR AB - Astrocytes extensively infiltrate the neuropil to regulate critical aspects of synaptic development and function. This process is regulated by transcellular interactions between astrocytes and neurons via cell adhesion molecules. How astrocytes coordinate developmental processes among one another to parse out the synaptic neuropil and form non-overlapping territories is unknown. Here we identify a molecular mechanism regulating astrocyte-astrocyte interactions during development to coordinate astrocyte morphogenesis and gap junction coupling. We show that hepaCAM, a disease-linked, astrocyte-enriched cell adhesion molecule, regulates astrocyte competition for territory and morphological complexity in the developing mouse cortex. Furthermore, conditional deletion of Hepacam from developing astrocytes significantly impairs gap junction coupling between astrocytes and disrupts the balance between synaptic excitation and inhibition. Mutations in HEPACAM cause megalencephalic leukoencephalopathy with subcortical cysts in humans. Therefore, our findings suggest that disruption of astrocyte self-organization mechanisms could be an underlying cause of neural pathology. AU - Baldwin, Katherine T. AU - Tan, Christabel X. AU - Strader, Samuel T. AU - Jiang, Changyu AU - Savage, Justin T. AU - Elorza-Vidal, Xabier AU - Contreras, Ximena AU - Rülicke, Thomas AU - Hippenmeyer, Simon AU - Estévez, Raúl AU - Ji, Ru-Rong AU - Eroglu, Cagla ID - 9793 IS - 15 JF - Neuron SN - 0896-6273 TI - HepaCAM controls astrocyte self-organization and coupling VL - 109 ER -