TY - CONF AB - We present LS-CRF, a new method for training cyclic Conditional Random Fields (CRFs) from large datasets that is inspired by classical closed-form expressions for the maximum likelihood parameters of a generative graphical model with tree topology. Training a CRF with LS-CRF requires only solving a set of independent regression problems, each of which can be solved efficiently in closed form or by an iterative solver. This makes LS-CRF orders of magnitude faster than classical CRF training based on probabilistic inference, and at the same time more flexible and easier to implement than other approximate techniques, such as pseudolikelihood or piecewise training. We apply LS-CRF to the task of semantic image segmentation, showing that it achieves on par accuracy to other training techniques at higher speed, thereby allowing efficient CRF training from very large training sets. For example, training a linearly parameterized pairwise CRF on 150,000 images requires less than one hour on a modern workstation. AU - Kolesnikov, Alexander AU - Guillaumin, Matthieu AU - Ferrari, Vittorio AU - Lampert, Christoph ED - Fleet, David ED - Pajdla, Tomas ED - Schiele, Bernt ED - Tuytelaars, Tinne ID - 2171 IS - PART 3 T2 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) TI - Closed-form approximate CRF training for scalable image segmentation VL - 8691 ER - TY - CONF AB - In this work we introduce a new approach to co-classification, i.e. the task of jointly classifying multiple, otherwise independent, data samples. The method we present, named CoConut, is based on the idea of adding a regularizer in the label space to encode certain priors on the resulting labelings. A regularizer that encourages labelings that are smooth across the test set, for instance, can be seen as a test-time variant of the cluster assumption, which has been proven useful at training time in semi-supervised learning. A regularizer that introduces a preference for certain class proportions can be regarded as a prior distribution on the class labels. CoConut can build on existing classifiers without making any assumptions on how they were obtained and without the need to re-train them. The use of a regularizer adds a new level of flexibility. It allows the integration of potentially new information at test time, even in other modalities than what the classifiers were trained on. We evaluate our framework on six datasets, reporting a clear performance gain in classification accuracy compared to the standard classification setup that predicts labels for each test sample separately. AU - Khamis, Sameh AU - Lampert, Christoph ID - 2173 T2 - Proceedings of the British Machine Vision Conference 2014 TI - CoConut: Co-classification with output space regularization ER - TY - CONF AB - Fisher Kernels and Deep Learning were two developments with significant impact on large-scale object categorization in the last years. Both approaches were shown to achieve state-of-the-art results on large-scale object categorization datasets, such as ImageNet. Conceptually, however, they are perceived as very different and it is not uncommon for heated debates to spring up when advocates of both paradigms meet at conferences or workshops. In this work, we emphasize the similarities between both architectures rather than their differences and we argue that such a unified view allows us to transfer ideas from one domain to the other. As a concrete example we introduce a method for learning a support vector machine classifier with Fisher kernel at the same time as a task-specific data representation. We reinterpret the setting as a multi-layer feed forward network. Its final layer is the classifier, parameterized by a weight vector, and the two previous layers compute Fisher vectors, parameterized by the coefficients of a Gaussian mixture model. We introduce a gradient descent based learning algorithm that, in contrast to other feature learning techniques, is not just derived from intuition or biological analogy, but has a theoretical justification in the framework of statistical learning theory. Our experiments show that the new training procedure leads to significant improvements in classification accuracy while preserving the modularity and geometric interpretability of a support vector machine setup. AU - Sydorov, Vladyslav AU - Sakurada, Mayu AU - Lampert, Christoph ID - 2172 T2 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition TI - Deep Fisher Kernels – End to end learning of the Fisher Kernel GMM parameters ER - TY - JOUR AB - When polygenic traits are under stabilizing selection, many different combinations of alleles allow close adaptation to the optimum. If alleles have equal effects, all combinations that result in the same deviation from the optimum are equivalent. Furthermore, the genetic variance that is maintained by mutation-selection balance is 2μ/S per locus, where μ is the mutation rate and S the strength of stabilizing selection. In reality, alleles vary in their effects, making the fitness landscape asymmetric and complicating analysis of the equilibria. We show that that the resulting genetic variance depends on the fraction of alleles near fixation, which contribute by 2μ/S, and on the total mutational effects of alleles that are at intermediate frequency. The inpplayfi between stabilizing selection and mutation leads to a sharp transition: alleles with effects smaller than a threshold value of 2 remain polymorphic, whereas those with larger effects are fixed. The genetic load in equilibrium is less than for traits of equal effects, and the fitness equilibria are more similar. We find p the optimum is displaced, alleles with effects close to the threshold value sweep first, and their rate of increase is bounded by Long-term response leads in general to well-adapted traits, unlike the case of equal effects that often end up at a suboptimal fitness peak. However, the particular peaks to which the populations converge are extremely sensitive to the initial states and to the speed of the shift of the optimum trait value. AU - De Vladar, Harold AU - Barton, Nicholas H ID - 2174 IS - 2 JF - Genetics TI - Stability and response of polygenic traits to stabilizing selection and mutation VL - 197 ER - TY - JOUR AB - We extend the proof of the local semicircle law for generalized Wigner matrices given in MR3068390 to the case when the matrix of variances has an eigenvalue -1. In particular, this result provides a short proof of the optimal local Marchenko-Pastur law at the hard edge (i.e. around zero) for sample covariance matrices X*X, where the variances of the entries of X may vary. AU - Ajanki, Oskari H AU - Erdös, László AU - Krüger, Torben H ID - 2179 JF - Electronic Communications in Probability TI - Local semicircle law with imprimitive variance matrix VL - 19 ER -