TY - CONF AB - Deep neural networks (DNNs) have become increasingly important due to their excellent empirical performance on a wide range of problems. However, regularization is generally achieved by indirect means, largely due to the complex set of functions defined by a network and the difficulty in measuring function complexity. There exists no method in the literature for additive regularization based on a norm of the function, as is classically considered in statistical learning theory. In this work, we study the tractability of function norms for deep neural networks with ReLU activations. We provide, to the best of our knowledge, the first proof in the literature of the NP-hardness of computing function norms of DNNs of 3 or more layers. We also highlight a fundamental difference between shallow and deep networks. In the light on these results, we propose a new regularization strategy based on approximate function norms, and show its efficiency on a segmentation task with a DNN. AU - Rannen-Triki, Amal AU - Berman, Maxim AU - Kolmogorov, Vladimir AU - Blaschko, Matthew B. ID - 7639 SN - 9781728150239 T2 - Proceedings of the 2019 International Conference on Computer Vision Workshop TI - Function norms for neural networks ER - TY - JOUR AB - We consider the NP-hard problem of MAP-inference for undirected discrete graphical models. We propose a polynomial time and practically efficient algorithm for finding a part of its optimal solution. Specifically, our algorithm marks some labels of the considered graphical model either as (i) optimal, meaning that they belong to all optimal solutions of the inference problem; (ii) non-optimal if they provably do not belong to any solution. With access to an exact solver of a linear programming relaxation to the MAP-inference problem, our algorithm marks the maximal possible (in a specified sense) number of labels. We also present a version of the algorithm, which has access to a suboptimal dual solver only and still can ensure the (non-)optimality for the marked labels, although the overall number of the marked labels may decrease. We propose an efficient implementation, which runs in time comparable to a single run of a suboptimal dual solver. Our method is well-scalable and shows state-of-the-art results on computational benchmarks from machine learning and computer vision. AU - Shekhovtsov, Alexander AU - Swoboda, Paul AU - Savchynskyy, Bogdan ID - 703 IS - 7 JF - IEEE Transactions on Pattern Analysis and Machine Intelligence SN - 01628828 TI - Maximum persistency via iterative relaxed inference with graphical models VL - 40 ER - TY - CHAP AB - We prove that every congruence distributive variety has directed Jónsson terms, and every congruence modular variety has directed Gumm terms. The directed terms we construct witness every case of absorption witnessed by the original Jónsson or Gumm terms. This result is equivalent to a pair of claims about absorption for admissible preorders in congruence distributive and congruence modular varieties, respectively. For finite algebras, these absorption theorems have already seen significant applications, but until now, it was not clear if the theorems hold for general algebras as well. Our method also yields a novel proof of a result by P. Lipparini about the existence of a chain of terms (which we call Pixley terms) in varieties that are at the same time congruence distributive and k-permutable for some k. AU - Kazda, Alexandr AU - Kozik, Marcin AU - McKenzie, Ralph AU - Moore, Matthew ED - Czelakowski, J ID - 10864 SN - 2211-2758 T2 - Don Pigozzi on Abstract Algebraic Logic, Universal Algebra, and Computer Science TI - Absorption and directed Jónsson terms VL - 16 ER - TY - CONF AB - The accuracy of information retrieval systems is often measured using complex loss functions such as the average precision (AP) or the normalized discounted cumulative gain (NDCG). Given a set of positive and negative samples, the parameters of a retrieval system can be estimated by minimizing these loss functions. However, the non-differentiability and non-decomposability of these loss functions does not allow for simple gradient based optimization algorithms. This issue is generally circumvented by either optimizing a structured hinge-loss upper bound to the loss function or by using asymptotic methods like the direct-loss minimization framework. Yet, the high computational complexity of loss-augmented inference, which is necessary for both the frameworks, prohibits its use in large training data sets. To alleviate this deficiency, we present a novel quicksort flavored algorithm for a large class of non-decomposable loss functions. We provide a complete characterization of the loss functions that are amenable to our algorithm, and show that it includes both AP and NDCG based loss functions. Furthermore, we prove that no comparison based algorithm can improve upon the computational complexity of our approach asymptotically. We demonstrate the effectiveness of our approach in the context of optimizing the structured hinge loss upper bound of AP and NDCG loss for learning models for a variety of vision tasks. We show that our approach provides significantly better results than simpler decomposable loss functions, while requiring a comparable training time. AU - Mohapatra, Pritish AU - Rolinek, Michal AU - Jawahar, C V AU - Kolmogorov, Vladimir AU - Kumar, M Pawan ID - 273 SN - 9781538664209 T2 - 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition TI - Efficient optimization for rank-based loss functions ER - TY - CONF AB - We show attacks on five data-independent memory-hard functions (iMHF) that were submitted to the password hashing competition (PHC). Informally, an MHF is a function which cannot be evaluated on dedicated hardware, like ASICs, at significantly lower hardware and/or energy cost than evaluating a single instance on a standard single-core architecture. Data-independent means the memory access pattern of the function is independent of the input; this makes iMHFs harder to construct than data-dependent ones, but the latter can be attacked by various side-channel attacks. Following [Alwen-Blocki'16], we capture the evaluation of an iMHF as a directed acyclic graph (DAG). The cumulative parallel pebbling complexity of this DAG is a measure for the hardware cost of evaluating the iMHF on an ASIC. Ideally, one would like the complexity of a DAG underlying an iMHF to be as close to quadratic in the number of nodes of the graph as possible. Instead, we show that (the DAGs underlying) the following iMHFs are far from this bound: Rig.v2, TwoCats and Gambit each having an exponent no more than 1.75. Moreover, we show that the complexity of the iMHF modes of the PHC finalists Pomelo and Lyra2 have exponents at most 1.83 and 1.67 respectively. To show this we investigate a combinatorial property of each underlying DAG (called its depth-robustness. By establishing upper bounds on this property we are then able to apply the general technique of [Alwen-Block'16] for analyzing the hardware costs of an iMHF. AU - Alwen, Joel F AU - Gazi, Peter AU - Kamath Hosdurg, Chethan AU - Klein, Karen AU - Osang, Georg F AU - Pietrzak, Krzysztof Z AU - Reyzin, Lenoid AU - Rolinek, Michal AU - Rybar, Michal ID - 193 T2 - Proceedings of the 2018 on Asia Conference on Computer and Communication Security TI - On the memory hardness of data independent password hashing functions ER - TY - JOUR AB - We consider the recent formulation of the algorithmic Lov ́asz Local Lemma [N. Har-vey and J. Vondr ́ak, inProceedings of FOCS, 2015, pp. 1327–1345; D. Achlioptas and F. Iliopoulos,inProceedings of SODA, 2016, pp. 2024–2038; D. Achlioptas, F. Iliopoulos, and V. Kolmogorov,ALocal Lemma for Focused Stochastic Algorithms, arXiv preprint, 2018] for finding objects that avoid“bad features,” or “flaws.” It extends the Moser–Tardos resampling algorithm [R. A. Moser andG. Tardos,J. ACM, 57 (2010), 11] to more general discrete spaces. At each step the method picks aflaw present in the current state and goes to a new state according to some prespecified probabilitydistribution (which depends on the current state and the selected flaw). However, the recent formu-lation is less flexible than the Moser–Tardos method since it requires a specific flaw selection rule,whereas the algorithm of Moser and Tardos allows an arbitrary rule (and thus can potentially beimplemented more efficiently). We formulate a new “commutativity” condition and prove that it issufficient for an arbitrary rule to work. It also enables an efficient parallelization under an additionalassumption. We then show that existing resampling oracles for perfect matchings and permutationsdo satisfy this condition. AU - Kolmogorov, Vladimir ID - 5975 IS - 6 JF - SIAM Journal on Computing SN - 0097-5397 TI - Commutativity in the algorithmic Lovász local lemma VL - 47 ER - TY - CONF AB - We consider the MAP-inference problem for graphical models,which is a valued constraint satisfaction problem defined onreal numbers with a natural summation operation. We proposea family of relaxations (different from the famous Sherali-Adams hierarchy), which naturally define lower bounds for itsoptimum. This family always contains a tight relaxation andwe give an algorithm able to find it and therefore, solve theinitial non-relaxed NP-hard problem.The relaxations we consider decompose the original probleminto two non-overlapping parts: an easy LP-tight part and adifficult one. For the latter part a combinatorial solver must beused. As we show in our experiments, in a number of applica-tions the second, difficult part constitutes only a small fractionof the whole problem. This property allows to significantlyreduce the computational time of the combinatorial solver andtherefore solve problems which were out of reach before. AU - Haller, Stefan AU - Swoboda, Paul AU - Savchynskyy, Bogdan ID - 5978 T2 - Proceedings of the 32st AAAI Conference on Artificial Intelligence TI - Exact MAP-inference by confining combinatorial search with LP relaxation ER - TY - JOUR AB - An N-superconcentrator is a directed, acyclic graph with N input nodes and N output nodes such that every subset of the inputs and every subset of the outputs of same cardinality can be connected by node-disjoint paths. It is known that linear-size and bounded-degree superconcentrators exist. We prove the existence of such superconcentrators with asymptotic density 25.3 (where the density is the number of edges divided by N). The previously best known densities were 28 [12] and 27.4136 [17]. AU - Kolmogorov, Vladimir AU - Rolinek, Michal ID - 18 IS - 10 JF - Ars Combinatoria SN - 0381-7032 TI - Superconcentrators of density 25.3 VL - 141 ER - TY - JOUR AB - The main result of this article is a generalization of the classical blossom algorithm for finding perfect matchings. Our algorithm can efficiently solve Boolean CSPs where each variable appears in exactly two constraints (we call it edge CSP) and all constraints are even Δ-matroid relations (represented by lists of tuples). As a consequence of this, we settle the complexity classification of planar Boolean CSPs started by Dvorak and Kupec. Using a reduction to even Δ-matroids, we then extend the tractability result to larger classes of Δ-matroids that we call efficiently coverable. It properly includes classes that were known to be tractable before, namely, co-independent, compact, local, linear, and binary, with the following caveat:We represent Δ-matroids by lists of tuples, while the last two use a representation by matrices. Since an n ×n matrix can represent exponentially many tuples, our tractability result is not strictly stronger than the known algorithm for linear and binary Δ-matroids. AU - Kazda, Alexandr AU - Kolmogorov, Vladimir AU - Rolinek, Michal ID - 6032 IS - 2 JF - ACM Transactions on Algorithms TI - Even delta-matroids and the complexity of planar boolean CSPs VL - 15 ER - TY - DATA AB - Graph matching problems for large displacement optical flow of RGB-D images. AU - Alhaija, Hassan AU - Sellent, Anita AU - Kondermann, Daniel AU - Rother, Carsten ID - 5573 KW - graph matching KW - quadratic assignment problem< TI - Graph matching problems for GraphFlow – 6D Large Displacement Scene Flow ER -