TY - CONF AB - Despite their recent success, deep neural networks continue to perform poorly when they encounter distribution shifts at test time. Many recently proposed approaches try to counter this by aligning the model to the new distribution prior to inference. With no labels available this requires unsupervised objectives to adapt the model on the observed test data. In this paper, we propose Test-Time SelfTraining (TeST): a technique that takes as input a model trained on some source data and a novel data distribution at test time, and learns invariant and robust representations using a student-teacher framework. We find that models adapted using TeST significantly improve over baseline testtime adaptation algorithms. TeST achieves competitive performance to modern domain adaptation algorithms [4, 43], while having access to 5-10x less data at time of adaption. We thoroughly evaluate a variety of baselines on two tasks: object detection and image segmentation and find that models adapted with TeST. We find that TeST sets the new stateof-the art for test-time domain adaptation algorithms. AU - Sinha, Samarth AU - Gehler, Peter AU - Locatello, Francesco AU - Schiele, Bernt ID - 14105 SN - 9781665493475 T2 - 2023 IEEE/CVF Winter Conference on Applications of Computer Vision TI - TeST: Test-time Self-Training under distribution shift ER - TY - JOUR AB - Context. Space asteroseismology is revolutionizing our knowledge of the internal structure and dynamics of stars. A breakthrough is ongoing with the recent discoveries of signatures of strong magnetic fields in the core of red giant stars. The key signature for such a detection is the asymmetry these fields induce in the frequency splittings of observed dipolar mixed gravito-acoustic modes. Aims. We investigate the ability of the observed asymmetries of the frequency splittings of dipolar mixed modes to constrain the geometrical properties of deep magnetic fields. Methods. We used the powerful analytical Racah-Wigner algebra used in quantum mechanics to characterize the geometrical couplings of dipolar mixed oscillation modes with various realistically plausible topologies of fossil magnetic fields. We also computed the induced perturbation of their frequencies. Results. First, in the case of an oblique magnetic dipole, we provide the exact analytical expression of the asymmetry as a function of the angle between the rotation and magnetic axes. Its value provides a direct measure of this angle. Second, considering a combination of axisymmetric dipolar and quadrupolar fields, we show how the asymmetry is blind to the unraveling of the relative strength and sign of each component. Finally, in the case of a given multipole, we show that a negative asymmetry is a signature of non-axisymmetric topologies. Conclusions. Asymmetries of dipolar mixed modes provide a key bit of information on the geometrical topology of deep fossil magnetic fields, but this is insufficient on its own. Asteroseismic constraints should therefore be combined with spectropolarimetric observations and numerical simulations, which aim to predict the more probable stable large-scale geometries. AU - Mathis, S. AU - Bugnet, Lisa Annabelle ID - 14256 JF - Astronomy and Astrophysics SN - 0004-6361 TI - Asymmetries of frequency splittings of dipolar mixed modes: A window on the topology of deep magnetic fields VL - 676 ER - TY - JOUR AB - In this work, a generalized, adapted Numerov implementation capable of determining band structures of periodic quantum systems is outlined. Based on the input potential, the presented approach numerically solves the Schrödinger equation in position space at each momentum space point. Thus, in addition to the band structure, the method inherently provides information about the state functions and probability densities in position space at each momentum space point considered. The generalized, adapted Numerov framework provided reliable estimates for a variety of increasingly complex test suites in one, two, and three dimensions. The accuracy of the proposed methodology was benchmarked against results obtained for the analytically solvable Kronig-Penney model. Furthermore, the presented numerical solver was applied to a model potential representing a 2D optical lattice being a challenging application relevant, for example, in the field of quantum computing. AU - Gamper, Jakob AU - Kluibenschedl, Florian AU - Weiss, Alexander K.H. AU - Hofer, Thomas S. ID - 14261 IS - 33 JF - Journal of Physical Chemistry Letters TI - Accessing position space wave functions in band structure calculations of periodic systems - a generalized, adapted numerov implementation for one-, two-, and three-dimensional quantum problems VL - 14 ER - TY - CONF AB - This paper focuses on over-parameterized deep neural networks (DNNs) with ReLU activation functions and proves that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification while obtaining (nearly) zero-training error under the lazy training regime. For this purpose, we unify three interrelated concepts of overparameterization, benign overfitting, and the Lipschitz constant of DNNs. Our results indicate that interpolating with smoother functions leads to better generalization. Furthermore, we investigate the special case where interpolating smooth ground-truth functions is performed by DNNs under the Neural Tangent Kernel (NTK) regime for generalization. Our result demonstrates that the generalization error converges to a constant order that only depends on label noise and initialization noise, which theoretically verifies benign overfitting. Our analysis provides a tight lower bound on the normalized margin under non-smooth activation functions, as well as the minimum eigenvalue of NTK under high-dimensional settings, which has its own interest in learning theory. AU - Zhu, Zhenyu AU - Liu, Fanghui AU - Chrysos, Grigorios G AU - Locatello, Francesco AU - Cevher, Volkan ID - 14208 T2 - Proceedings of the 40th International Conference on Machine Learning TI - Benign overfitting in deep neural networks under lazy training VL - 202 ER - TY - GEN AB - Diffusion models excel at generating photorealistic images from text-queries. Naturally, many approaches have been proposed to use these generative abilities to augment training datasets for downstream tasks, such as classification. However, diffusion models are themselves trained on large noisily supervised, but nonetheless, annotated datasets. It is an open question whether the generalization capabilities of diffusion models beyond using the additional data of the pre-training process for augmentation lead to improved downstream performance. We perform a systematic evaluation of existing methods to generate images from diffusion models and study new extensions to assess their benefit for data augmentation. While we find that personalizing diffusion models towards the target data outperforms simpler prompting strategies, we also show that using the training data of the diffusion model alone, via a simple nearest neighbor retrieval procedure, leads to even stronger downstream performance. Overall, our study probes the limitations of diffusion models for data augmentation but also highlights its potential in generating new training data to improve performance on simple downstream vision tasks. AU - Burg, Max F. AU - Wenzel, Florian AU - Zietlow, Dominik AU - Horn, Max AU - Makansi, Osama AU - Locatello, Francesco AU - Russell, Chris ID - 14209 T2 - arXiv TI - A data augmentation perspective on diffusion models and retrieval ER - TY - CONF AB - Causal discovery methods are intrinsically constrained by the set of assumptions needed to ensure structure identifiability. Moreover additional restrictions are often imposed in order to simplify the inference task: this is the case for the Gaussian noise assumption on additive non-linear models, which is common to many causal discovery approaches. In this paper we show the shortcomings of inference under this hypothesis, analyzing the risk of edge inversion under violation of Gaussianity of the noise terms. Then, we propose a novel method for inferring the topological ordering of the variables in the causal graph, from data generated according to an additive non-linear model with a generic noise distribution. This leads to NoGAM (Not only Gaussian Additive noise Models), a causal discovery algorithm with a minimal set of assumptions and state of the art performance, experimentally benchmarked on synthetic data. AU - Montagna, Francesco AU - Noceti, Nicoletta AU - Rosasco, Lorenzo AU - Zhang, Kun AU - Locatello, Francesco ID - 14211 T2 - 2nd Conference on Causal Learning and Reasoning TI - Causal discovery with score matching on additive models with arbitrary noise ER - TY - CONF AB - This paper demonstrates how to discover the whole causal graph from the second derivative of the log-likelihood in observational non-linear additive Gaussian noise models. Leveraging scalable machine learning approaches to approximate the score function ∇logp(X), we extend the work of Rolland et al. (2022) that only recovers the topological order from the score and requires an expensive pruning step removing spurious edges among those admitted by the ordering. Our analysis leads to DAS (acronym for Discovery At Scale), a practical algorithm that reduces the complexity of the pruning by a factor proportional to the graph size. In practice, DAS achieves competitive accuracy with current state-of-the-art while being over an order of magnitude faster. Overall, our approach enables principled and scalable causal discovery, significantly lowering the compute bar. AU - Montagna, Francesco AU - Noceti, Nicoletta AU - Rosasco, Lorenzo AU - Zhang, Kun AU - Locatello, Francesco ID - 14212 T2 - 2nd Conference on Causal Learning and Reasoning TI - Scalable causal discovery with score matching ER - TY - CONF AB - Recent years have seen a surge of interest in learning high-level causal representations from low-level image pairs under interventions. Yet, existing efforts are largely limited to simple synthetic settings that are far away from real-world problems. In this paper, we present Causal Triplet, a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works: (i) an actionable counterfactual setting, where only certain object-level variables allow for counterfactual observations whereas others do not; (ii) an interventional downstream task with an emphasis on out-of-distribution robustness from the independent causal mechanisms principle. Through extensive experiments, we find that models built with the knowledge of disentangled or object-centric representations significantly outperform their distributed counterparts. However, recent causal representation learning methods still struggle to identify such latent structures, indicating substantial challenges and opportunities for future work. AU - Liu, Yuejiang AU - Alahi, Alexandre AU - Russell, Chris AU - Horn, Max AU - Zietlow, Dominik AU - Schölkopf, Bernhard AU - Locatello, Francesco ID - 14214 T2 - 2nd Conference on Causal Learning and Reasoning TI - Causal triplet: An open challenge for intervention-centric causal representation learning ER - TY - CONF AB - Neural networks embed the geometric structure of a data manifold lying in a high-dimensional space into latent representations. Ideally, the distribution of the data points in the latent space should depend only on the task, the data, the loss, and other architecture-specific constraints. However, factors such as the random weights initialization, training hyperparameters, or other sources of randomness in the training phase may induce incoherent latent spaces that hinder any form of reuse. Nevertheless, we empirically observe that, under the same data and modeling choices, the angles between the encodings within distinct latent spaces do not change. In this work, we propose the latent similarity between each sample and a fixed set of anchors as an alternative data representation, demonstrating that it can enforce the desired invariances without any additional training. We show how neural architectures can leverage these relative representations to guarantee, in practice, invariance to latent isometries and rescalings, effectively enabling latent space communication: from zero-shot model stitching to latent space comparison between diverse settings. We extensively validate the generalization capability of our approach on different datasets, spanning various modalities (images, text, graphs), tasks (e.g., classification, reconstruction) and architectures (e.g., CNNs, GCNs, transformers). AU - Moschella, Luca AU - Maiorca, Valentino AU - Fumero, Marco AU - Norelli, Antonio AU - Locatello, Francesco AU - Rodolà, Emanuele ID - 14217 T2 - The 11th International Conference on Learning Representations TI - Relative representations enable zero-shot latent space communication ER - TY - CONF AB - Learning generative object models from unlabelled videos is a long standing problem and required for causal scene modeling. We decompose this problem into three easier subtasks, and provide candidate solutions for each of them. Inspired by the Common Fate Principle of Gestalt Psychology, we first extract (noisy) masks of moving objects via unsupervised motion segmentation. Second, generative models are trained on the masks of the background and the moving objects, respectively. Third, background and foreground models are combined in a conditional "dead leaves" scene model to sample novel scene configurations where occlusions and depth layering arise naturally. To evaluate the individual stages, we introduce the Fishbowl dataset positioned between complex real-world scenes and common object-centric benchmarks of simplistic objects. We show that our approach allows learning generative models that generalize beyond the occlusions present in the input videos, and represent scenes in a modular fashion that allows sampling plausible scenes outside the training distribution by permitting, for instance, object numbers or densities not observed in the training set. AU - Tangemann, Matthias AU - Schneider, Steffen AU - Kügelgen, Julius von AU - Locatello, Francesco AU - Gehler, Peter AU - Brox, Thomas AU - Kümmerer, Matthias AU - Bethge, Matthias AU - Schölkopf, Bernhard ID - 14222 T2 - 2nd Conference on Causal Learning and Reasoning TI - Unsupervised object learning via common fate ER - TY - CONF AB - Humans naturally decompose their environment into entities at the appropriate level of abstraction to act in the world. Allowing machine learning algorithms to derive this decomposition in an unsupervised way has become an important line of research. However, current methods are restricted to simulated data or require additional information in the form of motion or depth in order to successfully discover objects. In this work, we overcome this limitation by showing that reconstructing features from models trained in a self-supervised manner is a sufficient training signal for object-centric representations to arise in a fully unsupervised way. Our approach, DINOSAUR, significantly out-performs existing image-based object-centric learning models on simulated data and is the first unsupervised object-centric model that scales to real-world datasets such as COCO and PASCAL VOC. DINOSAUR is conceptually simple and shows competitive performance compared to more involved pipelines from the computer vision literature. AU - Seitzer, Maximilian AU - Horn, Max AU - Zadaianchuk, Andrii AU - Zietlow, Dominik AU - Xiao, Tianjun AU - Carl-Johann Simon-Gabriel, Carl-Johann Simon-Gabriel AU - He, Tong AU - Zhang, Zheng AU - Schölkopf, Bernhard AU - Brox, Thomas AU - Locatello, Francesco ID - 14218 T2 - The 11th International Conference on Learning Representations TI - Bridging the gap to real-world object-centric learning ER - TY - CONF AB - In this paper, we show that recent advances in self-supervised feature learning enable unsupervised object discovery and semantic segmentation with a performance that matches the state of the field on supervised semantic segmentation 10 years ago. We propose a methodology based on unsupervised saliency masks and self-supervised feature clustering to kickstart object discovery followed by training a semantic segmentation network on pseudo-labels to bootstrap the system on images with multiple objects. We present results on PASCAL VOC that go far beyond the current state of the art (50.0 mIoU), and we report for the first time results on MS COCO for the whole set of 81 classes: our method discovers 34 categories with more than $20\%$ IoU, while obtaining an average IoU of 19.6 for all 81 categories. AU - Zadaianchuk, Andrii AU - Kleindessner, Matthaeus AU - Zhu, Yi AU - Locatello, Francesco AU - Brox, Thomas ID - 14219 T2 - The 11th International Conference on Learning Representations TI - Unsupervised semantic segmentation with self-supervised object-centric representations ER - TY - GEN AB - As causal ground truth is incredibly rare, causal discovery algorithms are commonly only evaluated on simulated data. This is concerning, given that simulations reflect common preconceptions about generating processes regarding noise distributions, model classes, and more. In this work, we propose a novel method for falsifying the output of a causal discovery algorithm in the absence of ground truth. Our key insight is that while statistical learning seeks stability across subsets of data points, causal learning should seek stability across subsets of variables. Motivated by this insight, our method relies on a notion of compatibility between causal graphs learned on different subsets of variables. We prove that detecting incompatibilities can falsify wrongly inferred causal relations due to violation of assumptions or errors from finite sample effects. Although passing such compatibility tests is only a necessary criterion for good performance, we argue that it provides strong evidence for the causal models whenever compatibility entails strong implications for the joint distribution. We also demonstrate experimentally that detection of incompatibilities can aid in causal model selection. AU - Faller, Philipp M. AU - Vankadara, Leena Chennuru AU - Mastakouri, Atalanti A. AU - Locatello, Francesco AU - Janzing, Dominik ID - 14333 T2 - arXiv TI - Self-compatibility: Evaluating causal discovery without ground truth ER - TY - JOUR AB - Living tissues are characterized by an intrinsically mechanochemical interplay of active physical forces and complex biochemical signaling pathways. Either feature alone can give rise to complex emergent phenomena, for example, mechanically driven glassy dynamics and rigidity transitions, or chemically driven reaction-diffusion instabilities. An important question is how to quantitatively assess the contribution of these different cues to the large-scale dynamics of biological materials. We address this in Madin-Darby canine kidney (MDCK) monolayers, considering both mechanochemical feedback between extracellular signal-regulated kinase (ERK) signaling activity and cellular density as well as a mechanically active tissue rheology via a self-propelled vertex model. We show that the relative strength of active migration forces to mechanochemical couplings controls a transition from a uniform active glass to periodic spatiotemporal waves. We parametrize the model from published experimental data sets on MDCK monolayers and use it to make new predictions on the correlation functions of cellular dynamics and the dynamics of topological defects associated with the oscillatory phase of cells. Interestingly, MDCK monolayers are best described by an intermediary parameter region in which both mechanochemical couplings and noisy active propulsion have a strong influence on the dynamics. Finally, we study how tissue rheology and ERK waves produce feedback on one another and uncover a mechanism via which tissue fluidity can be controlled by mechanochemical waves at both the local and global levels. AU - Boocock, Daniel R AU - Hirashima, Tsuyoshi AU - Hannezo, Edouard B ID - 14277 IS - 1 JF - PRX Life SN - 2835-8279 TI - Interplay between mechanochemical patterning and glassy dynamics in cellular monolayers VL - 1 ER - TY - JOUR AB - The execution of cognitive functions requires coordinated circuit activity across different brain areas that involves the associated firing of neuronal assemblies. Here, we tested the circuit mechanism behind assembly interactions between the hippocampus and the medial prefrontal cortex (mPFC) of adult rats by recording neuronal populations during a rule-switching task. We identified functionally coupled CA1-mPFC cells that synchronized their activity beyond that expected from common spatial coding or oscillatory firing. When such cell pairs fired together, the mPFC cell strongly phase locked to CA1 theta oscillations and maintained consistent theta firing phases, independent of the theta timing of their CA1 counterpart. These functionally connected CA1-mPFC cells formed interconnected assemblies. While firing together with their CA1 assembly partners, mPFC cells fired along specific theta sequences. Our results suggest that upregulated theta oscillatory firing of mPFC cells can signal transient interactions with specific CA1 assemblies, thus enabling distributed computations. AU - Nardin, Michele AU - Käfer, Karola AU - Stella, Federico AU - Csicsvari, Jozsef L ID - 14314 IS - 9 JF - Cell Reports TI - Theta oscillations as a substrate for medial prefrontal-hippocampal assembly interactions VL - 42 ER - TY - JOUR AB - During apoptosis, caspases degrade 8 out of ~30 nucleoporins to irreversibly demolish the nuclear pore complex. However, for poorly understood reasons, caspases are also activated during cell differentiation. Here, we show that sublethal activation of caspases during myogenesis results in the transient proteolysis of four peripheral Nups and one transmembrane Nup. ‘Trimmed’ NPCs become nuclear export-defective, and we identified in an unbiased manner several classes of cytoplasmic, plasma membrane, and mitochondrial proteins that rapidly accumulate in the nucleus. NPC trimming by non-apoptotic caspases was also observed in neurogenesis and endoplasmic reticulum stress. Our results suggest that caspases can reversibly modulate nuclear transport activity, which allows them to function as agents of cell differentiation and adaptation at sublethal levels. AU - Cho, Ukrae H. AU - Hetzer, Martin W ID - 14315 JF - eLife TI - Caspase-mediated nuclear pore complex trimming in cell differentiation and endoplasmic reticulum stress VL - 12 ER - TY - JOUR AB - We study multigraphs whose edge-sets are the union of three perfect matchings, M1, M2, and M3. Given such a graph G and any a1; a2; a3 2 N with a1 +a2 +a3 6 n - 2, we show there exists a matching M of G with jM \ Mij = ai for each i 2 f1; 2; 3g. The bound n - 2 in the theorem is best possible in general. We conjecture however that if G is bipartite, the same result holds with n - 2 replaced by n - 1. We give a construction that shows such a result would be tight. We also make a conjecture generalising the Ryser-Brualdi-Stein conjecture with colour multiplicities. AU - Anastos, Michael AU - Fabian, David AU - Müyesser, Alp AU - Szabó, Tibor ID - 14319 IS - 3 JF - Electronic Journal of Combinatorics TI - Splitting matchings and the Ryser-Brualdi-Stein conjecture for multisets VL - 30 ER - TY - CONF AB - Probabilistic recurrence relations (PRRs) are a standard formalism for describing the runtime of a randomized algorithm. Given a PRR and a time limit κ, we consider the tail probability Pr[T≥κ], i.e., the probability that the randomized runtime T of the PRR exceeds κ. Our focus is the formal analysis of tail bounds that aims at finding a tight asymptotic upper bound u≥Pr[T≥κ]. To address this problem, the classical and most well-known approach is the cookbook method by Karp (JACM 1994), while other approaches are mostly limited to deriving tail bounds of specific PRRs via involved custom analysis. In this work, we propose a novel approach for deriving the common exponentially-decreasing tail bounds for PRRs whose preprocessing time and random passed sizes observe discrete or (piecewise) uniform distribution and whose recursive call is either a single procedure call or a divide-and-conquer. We first establish a theoretical approach via Markov’s inequality, and then instantiate the theoretical approach with a template-based algorithmic approach via a refined treatment of exponentiation. Experimental evaluation shows that our algorithmic approach is capable of deriving tail bounds that are (i) asymptotically tighter than Karp’s method, (ii) match the best-known manually-derived asymptotic tail bound for QuickSelect, and (iii) is only slightly worse (with a loglogn factor) than the manually-proven optimal asymptotic tail bound for QuickSort. Moreover, our algorithmic approach handles all examples (including realistic PRRs such as QuickSort, QuickSelect, DiameterComputation, etc.) in less than 0.1 s, showing that our approach is efficient in practice. AU - Sun, Yican AU - Fu, Hongfei AU - Chatterjee, Krishnendu AU - Goharshady, Amir Kafshdar ID - 14318 SN - 0302-9743 T2 - Computer Aided Verification TI - Automated tail bound analysis for probabilistic recurrence relations VL - 13966 ER - TY - CONF AB - Markov decision processes can be viewed as transformers of probability distributions. While this view is useful from a practical standpoint to reason about trajectories of distributions, basic reachability and safety problems are known to be computationally intractable (i.e., Skolem-hard) to solve in such models. Further, we show that even for simple examples of MDPs, strategies for safety objectives over distributions can require infinite memory and randomization. In light of this, we present a novel overapproximation approach to synthesize strategies in an MDP, such that a safety objective over the distributions is met. More precisely, we develop a new framework for template-based synthesis of certificates as affine distributional and inductive invariants for safety objectives in MDPs. We provide two algorithms within this framework. One can only synthesize memoryless strategies, but has relative completeness guarantees, while the other can synthesize general strategies. The runtime complexity of both algorithms is in PSPACE. We implement these algorithms and show that they can solve several non-trivial examples. AU - Akshay, S. AU - Chatterjee, Krishnendu AU - Meggendorfer, Tobias AU - Zikelic, Dorde ID - 14317 SN - 0302-9743 T2 - International Conference on Computer Aided Verification TI - MDPs as distribution transformers: Affine invariant synthesis for safety objectives VL - 13966 ER - TY - JOUR AB - Clathrin-mediated vesicle trafficking plays central roles in post-Golgi transport. In yeast (Saccharomyces cerevisiae), the AP-1 complex and GGA adaptors are predicted to generate distinct transport vesicles at the trans-Golgi network (TGN), and the epsin-related proteins Ent3p and Ent5p (collectively Ent3p/5p) act as accessories for these adaptors. Recently, we showed that vesicle transport from the TGN is crucial for yeast Rab5 (Vps21p)-mediated endosome formation, and that Ent3p/5p are crucial for this process, whereas AP-1 and GGA adaptors are dispensable. However, these observations were incompatible with previous studies showing that these adaptors are required for Ent3p/5p recruitment to the TGN, and thus the overall mechanism responsible for regulation of Vps21p activity remains ambiguous. Here, we investigated the functional relationships between clathrin adaptors in post-Golgi-mediated Vps21p activation. We show that AP-1 disruption in the ent3Δ5Δ mutant impaired transport of the Vps21p guanine nucleotide exchange factor Vps9p transport to the Vps21p compartment and severely reduced Vps21p activity. Additionally, GGA adaptors, the phosphatidylinositol-4-kinase Pik1p and Rab11 GTPases Ypt31p and Ypt32p were found to have partially overlapping functions for recruitment of AP-1 and Ent3p/5p to the TGN. These findings suggest a distinct role of clathrin adaptors for Vps21p activation in the TGN–endosome trafficking pathway. AU - Nagano, Makoto AU - Aoshima, Kaito AU - Shimamura, Hiroki AU - Siekhaus, Daria E AU - Toshima, Junko Y. AU - Toshima, Jiro ID - 14316 IS - 17 JF - Journal of Cell Science SN - 0021-9533 TI - Distinct role of TGN-resident clathrin adaptors for Vps21p activation in the TGN-endosome trafficking pathway VL - 136 ER -