TY - JOUR
AB - The concepts of faithfulness and strong-faithfulness are important for statistical learning of graphical models. Graphs are not sufficient for describing the association structure of a discrete distribution. Hypergraphs representing hierarchical log-linear models are considered instead, and the concept of parametric (strong-) faithfulness with respect to a hypergraph is introduced. Strong-faithfulness ensures the existence of uniformly consistent parameter estimators and enables building uniformly consistent procedures for a hypergraph search. The strength of association in a discrete distribution can be quantified with various measures, leading to different concepts of strong-faithfulness. Lower and upper bounds for the proportions of distributions that do not satisfy strong-faithfulness are computed for different parameterizations and measures of association.
AU - Klimova, Anna
AU - Uhler, Caroline
AU - Rudas, Tamás
ID - 2014
IS - 7
JF - Computational Statistics & Data Analysis
TI - Faithfulness and learning hypergraphs from discrete distributions
VL - 87
ER -
TY - JOUR
AB - Let G be a graph on the vertex set V(G) = {x1,…,xn} with the edge set E(G), and let R = K[x1,…, xn] be the polynomial ring over a field K. Two monomial ideals are associated to G, the edge ideal I(G) generated by all monomials xixj with {xi,xj} ∈ E(G), and the vertex cover ideal IG generated by monomials ∏xi∈Cxi for all minimal vertex covers C of G. A minimal vertex cover of G is a subset C ⊂ V(G) such that each edge has at least one vertex in C and no proper subset of C has the same property. Indeed, the vertex cover ideal of G is the Alexander dual of the edge ideal of G. In this paper, for an unmixed bipartite graph G we consider the lattice of vertex covers LG and we explicitly describe the minimal free resolution of the ideal associated to LG which is exactly the vertex cover ideal of G. Then we compute depth, projective dimension, regularity and extremal Betti numbers of R/I(G) in terms of the associated lattice.
AU - Mohammadi, Fatemeh
AU - Moradi, Somayeh
ID - 1547
IS - 3
JF - Bulletin of the Korean Mathematical Society
TI - Resolution of unmixed bipartite graphs
VL - 52
ER -
TY - JOUR
AB - We show that the Galois group of any Schubert problem involving lines in projective space contains the alternating group. This constitutes the largest family of enumerative problems whose Galois groups have been largely determined. Using a criterion of Vakil and a special position argument due to Schubert, our result follows from a particular inequality among Kostka numbers of two-rowed tableaux. In most cases, a combinatorial injection proves the inequality. For the remaining cases, we use the Weyl integral formulas to obtain an integral formula for these Kostka numbers. This rewrites the inequality as an integral, which we estimate to establish the inequality.
AU - Brooks, Christopher
AU - Martin Del Campo Sanchez, Abraham
AU - Sottile, Frank
ID - 1579
IS - 6
JF - Transactions of the American Mathematical Society
TI - Galois groups of Schubert problems of lines are at least alternating
VL - 367
ER -
TY - JOUR
AB - The topological Tverberg theorem has been generalized in several directions by setting extra restrictions on the Tverberg partitions. Restricted Tverberg partitions, defined by the idea that certain points cannot be in the same part, are encoded with graphs. When two points are adjacent in the graph, they are not in the same part. If the restrictions are too harsh, then the topological Tverberg theorem fails. The colored Tverberg theorem corresponds to graphs constructed as disjoint unions of small complete graphs. Hell studied the case of paths and cycles. In graph theory these partitions are usually viewed as graph colorings. As explored by Aharoni, Haxell, Meshulam and others there are fundamental connections between several notions of graph colorings and topological combinatorics. For ordinary graph colorings it is enough to require that the number of colors q satisfy q>Δ, where Δ is the maximal degree of the graph. It was proven by the first author using equivariant topology that if q>Δ 2 then the topological Tverberg theorem still works. It is conjectured that q>KΔ is also enough for some constant K, and in this paper we prove a fixed-parameter version of that conjecture. The required topological connectivity results are proven with shellability, which also strengthens some previous partial results where the topological connectivity was proven with the nerve lemma.
AU - Engström, Alexander
AU - Noren, Patrik
ID - 1911
IS - 1
JF - Discrete & Computational Geometry
TI - Tverberg's Theorem and Graph Coloring
VL - 51
ER -
TY - GEN
AU - Anna Klimova
AU - Rudas, Tamás
ID - 2007
TI - gIPFrm: Generalized iterative proportional fitting for relational models
ER -
TY - JOUR
AB - The protection of privacy of individual-level information in genome-wide association study (GWAS) databases has been a major concern of researchers following the publication of “an attack” on GWAS data by Homer et al. (2008). Traditional statistical methods for confidentiality and privacy protection of statistical databases do not scale well to deal with GWAS data, especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach that provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, Uhler et al. (2013) proposed new methods to release aggregate GWAS data without compromising an individual’s privacy. We extend the methods developed in Uhler et al. (2013) for releasing differentially-private χ2χ2-statistics by allowing for arbitrary number of cases and controls, and for releasing differentially-private allelic test statistics. We also provide a new interpretation by assuming the controls’ data are known, which is a realistic assumption because some GWAS use publicly available data as controls. We assess the performance of the proposed methods through a risk-utility analysis on a real data set consisting of DNA samples collected by the Wellcome Trust Case Control Consortium and compare the methods with the differentially-private release mechanism proposed by Johnson and Shmatikov (2013).
AU - Yu, Fei
AU - Fienberg, Stephen
AU - Slaković, Alexandra
AU - Uhler, Caroline
ID - 2011
JF - Journal of Biomedical Informatics
TI - Scalable privacy-preserving data sharing methodology for genome-wide association studies
VL - 50
ER -
TY - CONF
AB - The classical sphere packing problem asks for the best (infinite) arrangement of non-overlapping unit balls which cover as much space as possible. We define a generalized version of the problem, where we allow each ball a limited amount of overlap with other balls. We study two natural choices of overlap measures and obtain the optimal lattice packings in a parameterized family of lattices which contains the FCC, BCC, and integer lattice.
AU - Iglesias Ham, Mabel
AU - Kerber, Michael
AU - Uhler, Caroline
ID - 2012
TI - Sphere packing with limited overlap
ER -
TY - JOUR
AB - An asymptotic theory is developed for computing volumes of regions in the parameter space of a directed Gaussian graphical model that are obtained by bounding partial correlations. We study these volumes using the method of real log canonical thresholds from algebraic geometry. Our analysis involves the computation of the singular loci of correlation hypersurfaces. Statistical applications include the strong-faithfulness assumption for the PC algorithm and the quantification of confounder bias in causal inference. A detailed analysis is presented for trees, bow ties, tripartite graphs, and complete graphs.
AU - Lin, Shaowei
AU - Uhler, Caroline
AU - Sturmfels, Bernd
AU - Bühlmann, Peter
ID - 2013
IS - 5
JF - Foundations of Computational Mathematics
TI - Hypersurfaces and their singularities in partial correlation testing
VL - 14
ER -
TY - CONF
AB - Following the publication of an attack on genome-wide association studies (GWAS) data proposed by Homer et al., considerable attention has been given to developing methods for releasing GWAS data in a privacy-preserving way. Here, we develop an end-to-end differentially private method for solving regression problems with convex penalty functions and selecting the penalty parameters by cross-validation. In particular, we focus on penalized logistic regression with elastic-net regularization, a method widely used to in GWAS analyses to identify disease-causing genes. We show how a differentially private procedure for penalized logistic regression with elastic-net regularization can be applied to the analysis of GWAS data and evaluate our method’s performance.
AU - Yu, Fei
AU - Rybar, Michal
AU - Uhler, Caroline
AU - Fienberg, Stephen
ED - Domingo Ferrer, Josep
ID - 2047
T2 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
TI - Differentially-private logistic regression for detecting multiple-SNP association in GWAS databases
VL - 8744
ER -
TY - JOUR
AB - We consider the three-state toric homogeneous Markov chain model (THMC) without loops and initial parameters. At time T, the size of the design matrix is 6 × 3 · 2T-1 and the convex hull of its columns is the model polytope. We study the behavior of this polytope for T ≥ 3 and we show that it is defined by 24 facets for all T ≥ 5. Moreover, we give a complete description of these facets. From this, we deduce that the toric ideal associated with the design matrix is generated by binomials of degree at most 6. Our proof is based on a result due to Sturmfels, who gave a bound on the degree of the generators of a toric ideal, provided the normality of the corresponding toric variety. In our setting, we established the normality of the toric variety associated to the THMC model by studying the geometric properties of the model polytope.
AU - Haws, David
AU - Martin Del Campo Sanchez, Abraham
AU - Takemura, Akimichi
AU - Yoshida, Ruriko
ID - 2178
IS - 1
JF - Beitrage zur Algebra und Geometrie
TI - Markov degree of the three-state toric homogeneous Markov chain model
VL - 55
ER -
TY - JOUR
AB - The problem of packing ellipsoids of different sizes and shapes into an ellipsoidal container so as to minimize a measure of overlap between ellipsoids is considered. A bilevel optimization formulation is given, together with an algorithm for the general case and a simpler algorithm for the special case in which all ellipsoids are in fact spheres. Convergence results are proved and computational experience is described and illustrated. The motivating application-chromosome organization in the human cell nucleus-is discussed briefly, and some illustrative results are presented.
AU - Uhler, Caroline
AU - Wright, Stephen
ID - 2280
IS - 4
JF - SIAM Review
TI - Packing ellipsoids with overlap
VL - 55
ER -
TY - JOUR
AB - Traditional statistical methods for confidentiality protection of statistical databases do not scale well to deal with GWAS databases especially in terms of guarantees regarding protection from linkage to external information. The more recent concept of differential privacy, introduced by the cryptographic community, is an approach which provides a rigorous definition of privacy with meaningful privacy guarantees in the presence of arbitrary external information, although the guarantees may come at a serious price in terms of data utility. Building on such notions, we propose new methods to release aggregate GWAS data without compromising an individual’s privacy. We present methods for releasing differentially private minor allele frequencies, chi-square statistics and p-values. We compare these approaches on simulated data and on a GWAS study of canine hair length involving 685 dogs. We also propose a privacy-preserving method for finding genome-wide associations based on a differentially-private approach to penalized logistic regression.
AU - Uhler, Caroline
AU - Slavkovic, Aleksandra
AU - Fienberg, Stephen
ID - 2009
IS - 1
JF - Journal of Privacy and Confidentiality
TI - Privacy-preserving data sharing for genome-wide association studies
VL - 5
ER -
TY - JOUR
AB - Many algorithms for inferring causality rely heavily on the faithfulness assumption. The main justification for imposing this assumption is that the set of unfaithful distributions has Lebesgue measure zero, since it can be seen as a collection of hypersurfaces in a hypercube. However, due to sampling error the faithfulness condition alone is not sufficient for statistical estimation, and strong-faithfulness has been proposed and assumed to achieve uniform or high-dimensional consistency. In contrast to the plain faithfulness assumption, the set of distributions that is not strong-faithful has nonzero Lebesgue measure and in fact, can be surprisingly large as we show in this paper. We study the strong-faithfulness condition from a geometric and combinatorial point of view and give upper and lower bounds on the Lebesgue measure of strong-faithful distributions for various classes of directed acyclic graphs. Our results imply fundamental limitations for the PC-algorithm and potentially also for other algorithms based on partial correlation testing in the Gaussian case.
AU - Uhler, Caroline
AU - Raskutti, Garvesh
AU - Bühlmann, Peter
AU - Yu, Bin
ID - 2010
IS - 2
JF - The Annals of Statistics
TI - Geometry of the faithfulness assumption in causal inference
VL - 41
ER -
TY - JOUR
AB - We study maximum likelihood estimation in Gaussian graphical models from a geometric point of view. An algebraic elimination criterion allows us to find exact lower bounds on the number of observations needed to ensure that the maximum likelihood estimator (MLE) exists with probability one. This is applied to bipartite graphs, grids and colored graphs. We also study the ML degree, and we present the first instance of a graph for which the MLE exists with probability one, even when the number of observations equals the treewidth.
AU - Uhler, Caroline
ID - 2959
IS - 1
JF - Annals of Statistics
TI - Geometry of maximum likelihood estimation in Gaussian graphical models
VL - 40
ER -