TY - CONF AB - We present LS-CRF, a new method for training cyclic Conditional Random Fields (CRFs) from large datasets that is inspired by classical closed-form expressions for the maximum likelihood parameters of a generative graphical model with tree topology. Training a CRF with LS-CRF requires only solving a set of independent regression problems, each of which can be solved efficiently in closed form or by an iterative solver. This makes LS-CRF orders of magnitude faster than classical CRF training based on probabilistic inference, and at the same time more flexible and easier to implement than other approximate techniques, such as pseudolikelihood or piecewise training. We apply LS-CRF to the task of semantic image segmentation, showing that it achieves on par accuracy to other training techniques at higher speed, thereby allowing efficient CRF training from very large training sets. For example, training a linearly parameterized pairwise CRF on 150,000 images requires less than one hour on a modern workstation. AU - Kolesnikov, Alexander AU - Guillaumin, Matthieu AU - Ferrari, Vittorio AU - Lampert, Christoph ED - Fleet, David ED - Pajdla, Tomas ED - Schiele, Bernt ED - Tuytelaars, Tinne ID - 2171 IS - PART 3 T2 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) TI - Closed-form approximate CRF training for scalable image segmentation VL - 8691 ER - TY - CONF AB - In this work we introduce a new approach to co-classification, i.e. the task of jointly classifying multiple, otherwise independent, data samples. The method we present, named CoConut, is based on the idea of adding a regularizer in the label space to encode certain priors on the resulting labelings. A regularizer that encourages labelings that are smooth across the test set, for instance, can be seen as a test-time variant of the cluster assumption, which has been proven useful at training time in semi-supervised learning. A regularizer that introduces a preference for certain class proportions can be regarded as a prior distribution on the class labels. CoConut can build on existing classifiers without making any assumptions on how they were obtained and without the need to re-train them. The use of a regularizer adds a new level of flexibility. It allows the integration of potentially new information at test time, even in other modalities than what the classifiers were trained on. We evaluate our framework on six datasets, reporting a clear performance gain in classification accuracy compared to the standard classification setup that predicts labels for each test sample separately. AU - Khamis, Sameh AU - Lampert, Christoph ID - 2173 T2 - Proceedings of the British Machine Vision Conference 2014 TI - CoConut: Co-classification with output space regularization ER - TY - CONF AB - Fisher Kernels and Deep Learning were two developments with significant impact on large-scale object categorization in the last years. Both approaches were shown to achieve state-of-the-art results on large-scale object categorization datasets, such as ImageNet. Conceptually, however, they are perceived as very different and it is not uncommon for heated debates to spring up when advocates of both paradigms meet at conferences or workshops. In this work, we emphasize the similarities between both architectures rather than their differences and we argue that such a unified view allows us to transfer ideas from one domain to the other. As a concrete example we introduce a method for learning a support vector machine classifier with Fisher kernel at the same time as a task-specific data representation. We reinterpret the setting as a multi-layer feed forward network. Its final layer is the classifier, parameterized by a weight vector, and the two previous layers compute Fisher vectors, parameterized by the coefficients of a Gaussian mixture model. We introduce a gradient descent based learning algorithm that, in contrast to other feature learning techniques, is not just derived from intuition or biological analogy, but has a theoretical justification in the framework of statistical learning theory. Our experiments show that the new training procedure leads to significant improvements in classification accuracy while preserving the modularity and geometric interpretability of a support vector machine setup. AU - Sydorov, Vladyslav AU - Sakurada, Mayu AU - Lampert, Christoph ID - 2172 T2 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition TI - Deep Fisher Kernels – End to end learning of the Fisher Kernel GMM parameters ER - TY - JOUR AB - When polygenic traits are under stabilizing selection, many different combinations of alleles allow close adaptation to the optimum. If alleles have equal effects, all combinations that result in the same deviation from the optimum are equivalent. Furthermore, the genetic variance that is maintained by mutation-selection balance is 2μ/S per locus, where μ is the mutation rate and S the strength of stabilizing selection. In reality, alleles vary in their effects, making the fitness landscape asymmetric and complicating analysis of the equilibria. We show that that the resulting genetic variance depends on the fraction of alleles near fixation, which contribute by 2μ/S, and on the total mutational effects of alleles that are at intermediate frequency. The inpplayfi between stabilizing selection and mutation leads to a sharp transition: alleles with effects smaller than a threshold value of 2 remain polymorphic, whereas those with larger effects are fixed. The genetic load in equilibrium is less than for traits of equal effects, and the fitness equilibria are more similar. We find p the optimum is displaced, alleles with effects close to the threshold value sweep first, and their rate of increase is bounded by Long-term response leads in general to well-adapted traits, unlike the case of equal effects that often end up at a suboptimal fitness peak. However, the particular peaks to which the populations converge are extremely sensitive to the initial states and to the speed of the shift of the optimum trait value. AU - De Vladar, Harold AU - Barton, Nicholas H ID - 2174 IS - 2 JF - Genetics TI - Stability and response of polygenic traits to stabilizing selection and mutation VL - 197 ER - TY - JOUR AB - We extend the proof of the local semicircle law for generalized Wigner matrices given in MR3068390 to the case when the matrix of variances has an eigenvalue -1. In particular, this result provides a short proof of the optimal local Marchenko-Pastur law at the hard edge (i.e. around zero) for sample covariance matrices X*X, where the variances of the entries of X may vary. AU - Ajanki, Oskari H AU - Erdös, László AU - Krüger, Torben H ID - 2179 JF - Electronic Communications in Probability TI - Local semicircle law with imprimitive variance matrix VL - 19 ER - TY - JOUR AB - Electron microscopy (EM) allows for the simultaneous visualization of all tissue components at high resolution. However, the extent to which conventional aldehyde fixation and ethanol dehydration of the tissue alter the fine structure of cells and organelles, thereby preventing detection of subtle structural changes induced by an experiment, has remained an issue. Attempts have been made to rapidly freeze tissue to preserve native ultrastructure. Shock-freezing of living tissue under high pressure (high-pressure freezing, HPF) followed by cryosubstitution of the tissue water avoids aldehyde fixation and dehydration in ethanol; the tissue water is immobilized in â ̂1/450 ms, and a close-to-native fine structure of cells, organelles and molecules is preserved. Here we describe a protocol for HPF that is useful to monitor ultrastructural changes associated with functional changes at synapses in the brain but can be applied to many other tissues as well. The procedure requires a high-pressure freezer and takes a minimum of 7 d but can be paused at several points. AU - Studer, Daniel AU - Zhao, Shanting AU - Chai, Xuejun AU - Jonas, Peter M AU - Graber, Werner AU - Nestel, Sigrun AU - Frotscher, Michael ID - 2176 IS - 6 JF - Nature Protocols TI - Capture of activity-induced ultrastructural changes at synapses by high-pressure freezing of brain tissue VL - 9 ER - TY - JOUR AB - We consider the three-state toric homogeneous Markov chain model (THMC) without loops and initial parameters. At time T, the size of the design matrix is 6 × 3 · 2T-1 and the convex hull of its columns is the model polytope. We study the behavior of this polytope for T ≥ 3 and we show that it is defined by 24 facets for all T ≥ 5. Moreover, we give a complete description of these facets. From this, we deduce that the toric ideal associated with the design matrix is generated by binomials of degree at most 6. Our proof is based on a result due to Sturmfels, who gave a bound on the degree of the generators of a toric ideal, provided the normality of the corresponding toric variety. In our setting, we established the normality of the toric variety associated to the THMC model by studying the geometric properties of the model polytope. AU - Haws, David AU - Martin Del Campo Sanchez, Abraham AU - Takemura, Akimichi AU - Yoshida, Ruriko ID - 2178 IS - 1 JF - Beitrage zur Algebra und Geometrie TI - Markov degree of the three-state toric homogeneous Markov chain model VL - 55 ER - TY - CONF AB - We give evidence for the difficulty of computing Betti numbers of simplicial complexes over a finite field. We do this by reducing the rank computation for sparse matrices with to non-zero entries to computing Betti numbers of simplicial complexes consisting of at most a constant times to simplices. Together with the known reduction in the other direction, this implies that the two problems have the same computational complexity. AU - Edelsbrunner, Herbert AU - Parsa, Salman ID - 2177 T2 - Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms TI - On the computational complexity of betti numbers reductions from matrix rank ER - TY - CONF AB - We revisit the classical problem of converting an imperfect source of randomness into a usable cryptographic key. Assume that we have some cryptographic application P that expects a uniformly random m-bit key R and ensures that the best attack (in some complexity class) against P(R) has success probability at most δ. Our goal is to design a key-derivation function (KDF) h that converts any random source X of min-entropy k into a sufficiently "good" key h(X), guaranteeing that P(h(X)) has comparable security δ′ which is 'close' to δ. Seeded randomness extractors provide a generic way to solve this problem for all applications P, with resulting security δ′ = O(δ), provided that we start with entropy k ≥ m + 2 log (1/δ) - O(1). By a result of Radhakrishnan and Ta-Shma, this bound on k (called the "RT-bound") is also known to be tight in general. Unfortunately, in many situations the loss of 2 log (1/δ) bits of entropy is unacceptable. This motivates the study KDFs with less entropy waste by placing some restrictions on the source X or the application P. In this work we obtain the following new positive and negative results in this regard: - Efficient samplability of the source X does not help beat the RT-bound for general applications. This resolves the SRT (samplable RT) conjecture of Dachman-Soled et al. [DGKM12] in the affirmative, and also shows that the existence of computationally-secure extractors beating the RT-bound implies the existence of one-way functions. - We continue in the line of work initiated by Barak et al. [BDK+11] and construct new information-theoretic KDFs which beat the RT-bound for large but restricted classes of applications. Specifically, we design efficient KDFs that work for all unpredictability applications P (e.g., signatures, MACs, one-way functions, etc.) and can either: (1) extract all of the entropy k = m with a very modest security loss δ′ = O(δ·log (1/δ)), or alternatively, (2) achieve essentially optimal security δ′ = O(δ) with a very modest entropy loss k ≥ m + loglog (1/δ). In comparison, the best prior results from [BDK+11] for this class of applications would only guarantee δ′ = O(√δ) when k = m, and would need k ≥ m + log (1/δ) to get δ′ = O(δ). - The weaker bounds of [BDK+11] hold for a larger class of so-called "square- friendly" applications (which includes all unpredictability, but also some important indistinguishability, applications). Unfortunately, we show that these weaker bounds are tight for the larger class of applications. - We abstract out a clean, information-theoretic notion of (k,δ,δ′)- unpredictability extractors, which guarantee "induced" security δ′ for any δ-secure unpredictability application P, and characterize the parameters achievable for such unpredictability extractors. Of independent interest, we also relate this notion to the previously-known notion of (min-entropy) condensers, and improve the state-of-the-art parameters for such condensers. AU - Dodis, Yevgeniy AU - Pietrzak, Krzysztof Z AU - Wichs, Daniel ED - Nguyen, Phong ED - Oswald, Elisabeth ID - 2185 TI - Key derivation without entropy waste VL - 8441 ER - TY - JOUR AB - Weighted majority votes allow one to combine the output of several classifiers or voters. MinCq is a recent algorithm for optimizing the weight of each voter based on the minimization of a theoretical bound over the risk of the vote with elegant PAC-Bayesian generalization guarantees. However, while it has demonstrated good performance when combining weak classifiers, MinCq cannot make use of the useful a priori knowledge that one may have when using a mixture of weak and strong voters. In this paper, we propose P-MinCq, an extension of MinCq that can incorporate such knowledge in the form of a constraint over the distribution of the weights, along with general proofs of convergence that stand in the sample compression setting for data-dependent voters. The approach is applied to a vote of k-NN classifiers with a specific modeling of the voters' performance. P-MinCq significantly outperforms the classic k-NN classifier, a symmetric NN and MinCq using the same voters. We show that it is also competitive with LMNN, a popular metric learning algorithm, and that combining both approaches further reduces the error. AU - Bellet, Aurélien AU - Habrard, Amaury AU - Morvant, Emilie AU - Sebban, Marc ID - 2180 IS - 1-2 JF - Machine Learning TI - Learning a priori constrained weighted majority votes VL - 97 ER -