TY - JOUR AB - The huge antlers of the extinct Irish elk have invited evolutionary speculation since Darwin. In the 1970s, Stephen Jay Gould presented the first extensive data on antler size in the Irish elk and combined these with comparative data from other deer to test the hypothesis that the gigantic antlers were the outcome of a positive allometry that constrained large-bodied deer to have proportionally even larger antlers. He concluded that the Irish elk had antlers as predicted for its size and interpreted this within his emerging framework of developmental constraints as an explanatory factor in evolution. Here we reanalyze antler allometry based on new morphometric data for 57 taxa of the family Cervidae. We also present a new phylogeny for the Cervidae, which we use for comparative analyses. In contrast to Gould, we find that the antlers of Irish elk were larger than predicted from the allometry within the true deer, Cervini, as analyzed by Gould, but follow the allometry across Cervidae as a whole. After dissecting the discrepancy, we reject the allometric-constraint hypothesis because, contrary to Gould, we find no similarity between static and evolutionary allometries, and because we document extensive non-allometric evolution of antler size across the Cervidae. AU - Tsuboi, Masahito AU - Kopperud, Bjørn Tore AU - Matschiner, Michael AU - Grabowski, Mark AU - Syrowatka, Chrsitine AU - Pélabon, Christophe AU - Hansen, Thomas F. ID - 14932 JF - Evolutionary Biology SN - 0071-3260 TI - Antler allometry, the Irish elk and Gould revisited ER - TY - JOUR AB - Background Epigenetic clocks can track both chronological age (cAge) and biological age (bAge). The latter is typically defined by physiological biomarkers and risk of adverse health outcomes, including all-cause mortality. As cohort sample sizes increase, estimates of cAge and bAge become more precise. Here, we aim to develop accurate epigenetic predictors of cAge and bAge, whilst improving our understanding of their epigenomic architecture. Methods First, we perform large-scale (N = 18,413) epigenome-wide association studies (EWAS) of chronological age and all-cause mortality. Next, to create a cAge predictor, we use methylation data from 24,674 participants from the Generation Scotland study, the Lothian Birth Cohorts (LBC) of 1921 and 1936, and 8 other cohorts with publicly available data. In addition, we train a predictor of time to all-cause mortality as a proxy for bAge using the Generation Scotland cohort (1214 observed deaths). For this purpose, we use epigenetic surrogates (EpiScores) for 109 plasma proteins and the 8 component parts of GrimAge, one of the current best epigenetic predictors of survival. We test this bAge predictor in four external cohorts (LBC1921, LBC1936, the Framingham Heart Study and the Women’s Health Initiative study). Results Through the inclusion of linear and non-linear age-CpG associations from the EWAS, feature pre-selection in advance of elastic net regression, and a leave-one-cohort-out (LOCO) cross-validation framework, we obtain cAge prediction with a median absolute error equal to 2.3 years. Our bAge predictor was found to slightly outperform GrimAge in terms of the strength of its association to survival (HRGrimAge = 1.47 [1.40, 1.54] with p = 1.08 × 10−52, and HRbAge = 1.52 [1.44, 1.59] with p = 2.20 × 10−60). Finally, we introduce MethylBrowsR, an online tool to visualise epigenome-wide CpG-age associations. Conclusions The integration of multiple large datasets, EpiScores, non-linear DNAm effects, and new approaches to feature selection has facilitated improvements to the blood-based epigenetic prediction of biological and chronological age. AU - Bernabeu, Elena AU - Mccartney, Daniel L. AU - Gadd, Danni A. AU - Hillary, Robert F. AU - Lu, Ake T. AU - Murphy, Lee AU - Wrobel, Nicola AU - Campbell, Archie AU - Harris, Sarah E. AU - Liewald, David AU - Hayward, Caroline AU - Sudlow, Cathie AU - Cox, Simon R. AU - Evans, Kathryn L. AU - Horvath, Steve AU - Mcintosh, Andrew M. AU - Robinson, Matthew Richard AU - Vallejos, Catalina A. AU - Marioni, Riccardo E. ID - 12719 JF - Genome Medicine TI - Refining epigenetic prediction of chronological and biological age VL - 15 ER - TY - JOUR AB - AlphaFold changed the field of structural biology by achieving three-dimensional (3D) structure prediction from protein sequence at experimental quality. The astounding success even led to claims that the protein folding problem is “solved”. However, protein folding problem is more than just structure prediction from sequence. Presently, it is unknown if the AlphaFold-triggered revolution could help to solve other problems related to protein folding. Here we assay the ability of AlphaFold to predict the impact of single mutations on protein stability (ΔΔG) and function. To study the question we extracted the pLDDT and metrics from AlphaFold predictions before and after single mutation in a protein and correlated the predicted change with the experimentally known ΔΔG values. Additionally, we correlated the same AlphaFold pLDDT metrics with the impact of a single mutation on structure using a large scale dataset of single mutations in GFP with the experimentally assayed levels of fluorescence. We found a very weak or no correlation between AlphaFold output metrics and change of protein stability or fluorescence. Our results imply that AlphaFold may not be immediately applied to other problems or applications in protein folding. AU - Pak, Marina A. AU - Markhieva, Karina A. AU - Novikova, Mariia S. AU - Petrov, Dmitry S. AU - Vorobyev, Ilya S. AU - Maksimova, Ekaterina AU - Kondrashov, Fyodor AU - Ivankov, Dmitry N. ID - 12758 IS - 3 JF - PLoS ONE TI - Using AlphaFold to predict the impact of single mutations on protein stability and function VL - 18 ER - TY - JOUR AU - Ing-Simmons, Elizabeth AU - Machnik, Nick N AU - Vaquerizas, Juan M. ID - 14689 IS - 12 JF - Nature Genetics SN - 1061-4036 TI - Reply to: Revisiting the use of structural similarity index in Hi-C VL - 55 ER - TY - JOUR AB - Several fixed-target experiments reported J/ψ and ϒ polarizations, as functions of Feynman x (xF) and transverse momentum (PT), in three different frames, using different combinations of beam particles, target nuclei, and collision energies. Despite the diverse and heterogeneous picture formed by these measurements, a detailed look allows us to discern qualitative physical patterns that inspire a simple empirical model. This data-driven scenario offers a good quantitative description of the J/ψ and ϒ(1S) polarizations measured in proton- and pion-nucleus collisions, in the xF 0.5 domain: more than 80 data points (not statistically independent) are well reproduced with only one free parameter. This study sets the context for future low-PT quarkonium polarization measurements in proton- and pion-nucleus collisions, such as those to be made by the AMBER experiment, and shows that such measurements provide significant constraints on the poorly-known parton distribution functions of the pion. AU - Faccioli, Pietro AU - Krätschmer, Ilse AU - Lourenço, Carlos ID - 14753 JF - Physics Letters B KW - Nuclear and High Energy Physics SN - 0370-2693 TI - Low-pT quarkonium polarization measurements: Challenges and opportunities VL - 840 ER - TY - JOUR AB - There is currently little evidence that the genetic basis of human phenotype varies significantly across the lifespan. However, time-to-event phenotypes are understudied and can be thought of as reflecting an underlying hazard, which is unlikely to be constant through life when values take a broad range. Here, we find that 74% of 245 genome-wide significant genetic associations with age at natural menopause (ANM) in the UK Biobank show a form of age-specific effect. Nineteen of these replicated discoveries are identified only by our modeling framework, which determines the time dependency of DNA-variant age-at-onset associations without a significant multiple-testing burden. Across the range of early to late menopause, we find evidence for significantly different underlying biological pathways, changes in the signs of genetic correlations of ANM to health indicators and outcomes, and differences in inferred causal relationships. We find that DNA damage response processes only act to shape ovarian reserve and depletion for women of early ANM. Genetically mediated delays in ANM were associated with increased relative risk of breast cancer and leiomyoma at all ages and with high cholesterol and heart failure for late-ANM women. These findings suggest that a better understanding of the age dependency of genetic risk factor relationships among health indicators and outcomes is achievable through appropriate statistical modeling of large-scale biobank data. AU - Ojavee, Sven E. AU - Darrous, Liza AU - Patxot, Marion AU - Läll, Kristi AU - Fischer, Krista AU - Mägi, Reedik AU - Kutalik, Zoltan AU - Robinson, Matthew Richard ID - 14258 IS - 9 JF - American Journal of Human Genetics SN - 0002-9297 TI - Genetic insights into the age-specific biological mechanisms governing human ovarian aging VL - 110 ER - TY - JOUR AB - Background: Blood-based markers of cognitive functioning might provide an accessible way to track neurodegeneration years prior to clinical manifestation of cognitive impairment and dementia. Results: Using blood-based epigenome-wide analyses of general cognitive function, we show that individual differences in DNA methylation (DNAm) explain 35.0% of the variance in general cognitive function (g). A DNAm predictor explains ~4% of the variance, independently of a polygenic score, in two external cohorts. It also associates with circulating levels of neurology- and inflammation-related proteins, global brain imaging metrics, and regional cortical volumes. Conclusions: As sample sizes increase, the ability to assess cognitive function from DNAm data may be informative in settings where cognitive testing is unreliable or unavailable. AU - McCartney, Daniel L. AU - Hillary, Robert F. AU - Conole, Eleanor L.S. AU - Banos, Daniel Trejo AU - Gadd, Danni A. AU - Walker, Rosie M. AU - Nangle, Cliff AU - Flaig, Robin AU - Campbell, Archie AU - Murray, Alison D. AU - Maniega, Susana Muñoz AU - Valdés-Hernández, María Del C. AU - Harris, Mathew A. AU - Bastin, Mark E. AU - Wardlaw, Joanna M. AU - Harris, Sarah E. AU - Porteous, David J. AU - Tucker-Drob, Elliot M. AU - McIntosh, Andrew M. AU - Evans, Kathryn L. AU - Deary, Ian J. AU - Cox, Simon R. AU - Robinson, Matthew Richard AU - Marioni, Riccardo E. ID - 10702 IS - 1 JF - Genome Biology SN - 1474-7596 TI - Blood-based epigenome-wide analyses of cognitive abilities VL - 23 ER - TY - JOUR AB - Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency–linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated h2SNP. We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average χ2 value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies. AU - Orliac, Etienne J. AU - Trejo Banos, Daniel AU - Ojavee, Sven E. AU - Läll, Kristi AU - Mägi, Reedik AU - Visscher, Peter M. AU - Robinson, Matthew Richard ID - 11733 IS - 31 JF - Proceedings of the National Academy of Sciences of the United States of America TI - Improving GWAS discovery and genomic prediction accuracy in biobank data VL - 119 ER - TY - GEN AB - Genetically informed, deep-phenotyped biobanks are an important research resource and it is imperative that the most powerful, versatile, and efficient analysis approaches are used. Here, we apply our recently developed Bayesian grouped mixture of regressions model (GMRM) in the UK and Estonian Biobanks and obtain the highest genomic prediction accuracy reported to date across 21 heritable traits. When compared to other approaches, GMRM accuracy was greater than annotation prediction models run in the LDAK or LDPred-funct software by 15% (SE 7%) and 14% (SE 2%), respectively, and was 18% (SE 3%) greater than a baseline BayesR model without single-nucleotide polymorphism (SNP) markers grouped into minor allele frequency–linkage disequilibrium (MAF-LD) annotation categories. For height, the prediction accuracy R 2 was 47% in a UK Biobank holdout sample, which was 76% of the estimated h SNP 2 . We then extend our GMRM prediction model to provide mixed-linear model association (MLMA) SNP marker estimates for genome-wide association (GWAS) discovery, which increased the independent loci detected to 16,162 in unrelated UK Biobank individuals, compared to 10,550 from BoltLMM and 10,095 from Regenie, a 62 and 65% increase, respectively. The average χ2 value of the leading markers increased by 15.24 (SE 0.41) for every 1% increase in prediction accuracy gained over a baseline BayesR model across the traits. Thus, we show that modeling genetic associations accounting for MAF and LD differences among SNP markers, and incorporating prior knowledge of genomic function, is important for both genomic prediction and discovery in large-scale individual-level studies. AU - Orliac, Etienne AU - Trejo Banos, Daniel AU - Ojavee, Sven AU - Läll, Kristi AU - Mägi, Reedik AU - Visscher, Peter AU - Robinson, Matthew Richard ID - 13064 TI - Improving genome-wide association discovery and genomic prediction accuracy in biobank data ER - TY - JOUR AB - Theory for liability-scale models of the underlying genetic basis of complex disease provides an important way to interpret, compare, and understand results generated from biological studies. In particular, through estimation of the liability-scale heritability (LSH), liability models facilitate an understanding and comparison of the relative importance of genetic and environmental risk factors that shape different clinically important disease outcomes. Increasingly, large-scale biobank studies that link genetic information to electronic health records, containing hundreds of disease diagnosis indicators that mostly occur infrequently within the sample, are becoming available. Here, we propose an extension of the existing liability-scale model theory suitable for estimating LSH in biobank studies of low-prevalence disease. In a simulation study, we find that our derived expression yields lower mean square error (MSE) and is less sensitive to prevalence misspecification as compared to previous transformations for diseases with =< 2% population prevalence and LSH of =< 0.45, especially if the biobank sample prevalence is less than that of the wider population. Applying our expression to 13 diagnostic outcomes of =< 3% prevalence in the UK Biobank study revealed important differences in LSH obtained from the different theoretical expressions that impact the conclusions made when comparing LSH across disease outcomes. This demonstrates the importance of careful consideration for estimation and prediction of low-prevalence disease outcomes and facilitates improved inference of the underlying genetic basis of =< 2% population prevalence diseases, especially where biobank sample ascertainment results in a healthier sample population. AU - Ojavee, Sven E. AU - Kutalik, Zoltan AU - Robinson, Matthew Richard ID - 12142 IS - 11 JF - The American Journal of Human Genetics KW - Genetics (clinical) KW - Genetics SN - 0002-9297 TI - Liability-scale heritability estimation for biobank studies of low-prevalence disease VL - 109 ER - TY - JOUR AB - Background: About 800 women die every day worldwide from pregnancy-related complications, including excessive blood loss, infections and high-blood pressure (World Health Organization, 2019). To improve screening for high-risk pregnancies, we set out to identify patterns of maternal hematological changes associated with future pregnancy complications. Methods: Using mixed effects models, we established changes in 14 complete blood count (CBC) parameters for 1710 healthy pregnancies and compared them to measurements from 98 pregnancy-induced hypertension, 106 gestational diabetes and 339 postpartum hemorrhage cases. Results: Results show interindividual variations, but good individual repeatability in CBC values during physiological pregnancies, allowing the identification of specific alterations in women with obstetric complications. For example, in women with uncomplicated pregnancies, haemoglobin count decreases of 0.12 g/L (95% CI −0.16, −0.09) significantly per gestation week (p value <.001). Interestingly, this decrease is three times more pronounced in women who will develop pregnancy-induced hypertension, with an additional decrease of 0.39 g/L (95% CI −0.51, −0.26). We also confirm that obstetric complications and white CBC predict the likelihood of giving birth earlier during pregnancy. Conclusion: We provide a comprehensive description of the associations between haematological changes through pregnancy and three major obstetric complications to support strategies for prevention, early-diagnosis and maternal care. AU - Patxot, Marion AU - Stojanov, Miloš AU - Ojavee, Sven Erik AU - Gobert, Rosanna Pescini AU - Kutalik, Zoltán AU - Gavillet, Mathilde AU - Baud, David AU - Robinson, Matthew Richard ID - 12235 IS - 5 JF - European Journal of Haematology KW - Hematology KW - General Medicine SN - 0902-4441 TI - Haematological changes from conception to childbirth: An indicator of major pregnancy complications VL - 109 ER - TY - GEN AB - CpGs and corresponding mean weights for DNAm-based prediction of cognitive abilities (6 traits) AU - McCartney, Daniel L AU - Hillary, Robert F AU - Conole, Eleanor LS AU - Trejo Banos, Daniel AU - Gadd, Danni A AU - Walker, Rosie M AU - Nangle, Cliff AU - Flaig, Robin AU - Campbell, Archie AU - Murray, Alison D AU - Munoz Maniega, Susana AU - del C Valdes-Hernandez, Maria AU - Harris, Mathew A AU - Bastin, Mark E AU - Wardlaw, Joanna M AU - Harris, Sarah E AU - Porteous, David J AU - Tucker-Drob, Elliot M AU - McIntosh, Andrew M AU - Evans, Kathryn L AU - Deary, Ian J AU - Cox, Simon R AU - Robinson, Matthew Richard AU - Marioni, Riccardo E ID - 13072 TI - Blood-based epigenome-wide analyses of cognitive abilities ER - TY - JOUR AB - While recent advancements in computation and modelling have improved the analysis of complex traits, our understanding of the genetic basis of the time at symptom onset remains limited. Here, we develop a Bayesian approach (BayesW) that provides probabilistic inference of the genetic architecture of age-at-onset phenotypes in a sampling scheme that facilitates biobank-scale time-to-event analyses. We show in extensive simulation work the benefits BayesW provides in terms of number of discoveries, model performance and genomic prediction. In the UK Biobank, we find many thousands of common genomic regions underlying the age-at-onset of high blood pressure (HBP), cardiac disease (CAD), and type-2 diabetes (T2D), and for the genetic basis of onset reflecting the underlying genetic liability to disease. Age-at-menopause and age-at-menarche are also highly polygenic, but with higher variance contributed by low frequency variants. Genomic prediction into the Estonian Biobank data shows that BayesW gives higher prediction accuracy than other approaches. AU - Ojavee, Sven E AU - Kousathanas, Athanasios AU - Trejo Banos, Daniel AU - Orliac, Etienne J AU - Patxot, Marion AU - Lall, Kristi AU - Magi, Reedik AU - Fischer, Krista AU - Kutalik, Zoltan AU - Robinson, Matthew Richard ID - 8430 IS - 1 JF - Nature Communications TI - Genomic architecture and prediction of censored time-to-event phenotypes with a Bayesian genome-wide analysis VL - 12 ER - TY - JOUR AB - The extent to which women differ in the course of blood cell counts throughout pregnancy, and the importance of these changes to pregnancy outcomes has not been well defined. Here, we develop a series of statistical analyses of repeated measures data to reveal the degree to which women differ in the course of pregnancy, predict the changes that occur, and determine the importance of these changes for post-partum hemorrhage (PPH) which is one of the leading causes of maternal mortality. We present a prospective cohort of 4082 births recorded at the University Hospital, Lausanne, Switzerland between 2009 and 2014 where full labour records could be obtained, along with complete blood count data taken at hospital admission. We find significant differences, at a [Formula: see text] level, among women in how blood count values change through pregnancy for mean corpuscular hemoglobin, mean corpuscular volume, mean platelet volume, platelet count and red cell distribution width. We find evidence that almost all complete blood count values show trimester-specific associations with PPH. For example, high platelet count (OR 1.20, 95% CI 1.01-1.53), high mean platelet volume (OR 1.58, 95% CI 1.04-2.08), and high erythrocyte levels (OR 1.36, 95% CI 1.01-1.57) in trimester 1 increased PPH, but high values in trimester 3 decreased PPH risk (OR 0.85, 0.79, 0.67 respectively). We show that differences among women in the course of blood cell counts throughout pregnancy have an important role in shaping pregnancy outcome and tracking blood count value changes through pregnancy improves identification of women at increased risk of postpartum hemorrhage. This study provides greater understanding of the complex changes in blood count values that occur through pregnancy and provides indicators to guide the stratification of patients into risk groups. AU - Robinson, Matthew Richard AU - Patxot, Marion AU - Stojanov, Miloš AU - Blum, Sabine AU - Baud, David ID - 10069 JF - Scientific Reports TI - Postpartum hemorrhage risk is driven by changes in blood composition through pregnancy VL - 11 ER - TY - JOUR AB - We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32–44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data. AU - Patxot, Marion AU - Trejo Banos, Daniel AU - Kousathanas, Athanasios AU - Orliac, Etienne J AU - Ojavee, Sven E AU - Moser, Gerhard AU - Sidorenko, Julia AU - Kutalik, Zoltan AU - Magi, Reedik AU - Visscher, Peter M AU - Ronnegard, Lars AU - Robinson, Matthew Richard ID - 8429 IS - 1 JF - Nature Communications TI - Probabilistic inference of the genetic architecture underlying functional enrichment of complex traits VL - 12 ER - TY - GEN AB - We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only $\leq$ 10\% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32-44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having >95% probability of contributing >0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data. AU - Robinson, Matthew Richard ID - 13063 TI - Probabilistic inference of the genetic architecture of functional enrichment of complex traits ER - TY - JOUR AB - Linking epigenetic marks to clinical outcomes improves insight into molecular processes, disease prediction, and therapeutic target identification. Here, a statistical approach is presented to infer the epigenetic architecture of complex disease, determine the variation captured by epigenetic effects, and estimate phenotype-epigenetic probe associations jointly. Implicitly adjusting for probe correlations, data structure (cell-count or relatedness), and single-nucleotide polymorphism (SNP) marker effects, improves association estimates and in 9,448 individuals, 75.7% (95% CI 71.70–79.3) of body mass index (BMI) variation and 45.6% (95% CI 37.3–51.9) of cigarette consumption variation was captured by whole blood methylation array data. Pathway-linked probes of blood cholesterol, lipid transport and sterol metabolism for BMI, and xenobiotic stimuli response for smoking, showed >1.5 times larger associations with >95% posterior inclusion probability. Prediction accuracy improved by 28.7% for BMI and 10.2% for smoking over a LASSO model, with age-, and tissue-specificity, implying associations are a phenotypic consequence rather than causal. AU - Trejo Banos, D AU - McCartney, DL AU - Patxot, M AU - Anchieri, L AU - Battram, T AU - Christiansen, C AU - Costeira, R AU - Walker, RM AU - Morris, SW AU - Campbell, A AU - Zhang, Q AU - Porteous, DJ AU - McRae, AF AU - Wray, NR AU - Visscher, PM AU - Haley, CS AU - Evans, KL AU - Deary, IJ AU - McIntosh, AM AU - Hemani, G AU - Bell, JT AU - Marioni, RE AU - Robinson, Matthew Richard ID - 7999 JF - Nature Communications SN - 2041-1723 TI - Bayesian reassessment of the epigenetic architecture of complex traits VL - 11 ER - TY - JOUR AB - The molecular factors which control circulating levels of inflammatory proteins are not well understood. Furthermore, association studies between molecular probes and human traits are often performed by linear model-based methods which may fail to account for complex structure and interrelationships within molecular datasets.In this study, we perform genome- and epigenome-wide association studies (GWAS/EWAS) on the levels of 70 plasma-derived inflammatory protein biomarkers in healthy older adults (Lothian Birth Cohort 1936; n = 876; Olink® inflammation panel). We employ a Bayesian framework (BayesR+) which can account for issues pertaining to data structure and unknown confounding variables (with sensitivity analyses using ordinary least squares- (OLS) and mixed model-based approaches). We identified 13 SNPs associated with 13 proteins (n = 1 SNP each) concordant across OLS and Bayesian methods. We identified 3 CpG sites spread across 3 proteins (n = 1 CpG each) that were concordant across OLS, mixed-model and Bayesian analyses. Tagged genetic variants accounted for up to 45% of variance in protein levels (for MCP2, 36% of variance alone attributable to 1 polymorphism). Methylation data accounted for up to 46% of variation in protein levels (for CXCL10). Up to 66% of variation in protein levels (for VEGFA) was explained using genetic and epigenetic data combined. We demonstrated putative causal relationships between CD6 and IL18R1 with inflammatory bowel disease and between IL12B and Crohn’s disease. Our data may aid understanding of the molecular regulation of the circulating inflammatory proteome as well as causal relationships between inflammatory mediators and disease. AU - Hillary, Robert F. AU - Trejo-Banos, Daniel AU - Kousathanas, Athanasios AU - Mccartney, Daniel L. AU - Harris, Sarah E. AU - Stevenson, Anna J. AU - Patxot, Marion AU - Ojavee, Sven Erik AU - Zhang, Qian AU - Liewald, David C. AU - Ritchie, Craig W. AU - Evans, Kathryn L. AU - Tucker-Drob, Elliot M. AU - Wray, Naomi R. AU - Mcrae, Allan F. AU - Visscher, Peter M. AU - Deary, Ian J. AU - Robinson, Matthew Richard AU - Marioni, Riccardo E. ID - 8133 IS - 1 JF - Genome Medicine TI - Multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults VL - 12 ER - TY - GEN AB - Additional file 2: Supplementary Tables. The association of pre-adjusted protein levels with biological and technical covariates. Protein levels were adjusted for age, sex, array plate and four genetic principal components (population structure) prior to analyses. Significant associations are emboldened. (Table S1). pQTLs associated with inflammatory biomarker levels from Bayesian penalised regression model (Posterior Inclusion Probability > 95%). (Table S2). All pQTLs associated with inflammatory biomarker levels from ordinary least squares regression model (P < 7.14 × 10− 10). (Table S3). Summary of lambda values relating to ordinary least squares GWAS and EWAS performed on inflammatory protein levels (n = 70) in Lothian Birth Cohort 1936 study. (Table S4). Conditionally significant pQTLs associated with inflammatory biomarker levels from ordinary least squares regression model (P < 7.14 × 10− 10). (Table S5). Comparison of variance explained by ordinary least squares and Bayesian penalised regression models for concordantly identified SNPs. (Table S6). Estimate of heritability for blood protein levels as well as proportion of variance explained attributable to different prior mixtures. (Table S7). Comparison of heritability estimates from Ahsan et al. (maximum likelihood) and Hillary et al. (Bayesian penalised regression). (Table S8). List of concordant SNPs identified by linear model and Bayesian penalised regression and whether they have been previously identified as eQTLs. (Table S9). Bayesian tests of colocalisation for cis pQTLs and cis eQTLs. (Table S10). Sherlock algorithm: Genes whose expression are putatively associated with circulating inflammatory proteins that harbour pQTLs. (Table S11). CpGs associated with inflammatory protein biomarkers as identified by Bayesian model (Bayesian model; Posterior Inclusion Probability > 95%). (Table S12). CpGs associated with inflammatory protein biomarkers as identified by linear model (limma) at P < 5.14 × 10− 10. (Table S13). CpGs associated with inflammatory protein biomarkers as identified by mixed linear model (OSCA) at P < 5.14 × 10− 10. (Table S14). Estimate of variance explained for blood protein levels by DNA methylation as well as proportion of explained attributable to different prior mixtures - BayesR+. (Table S15). Comparison of variance in protein levels explained by genome-wide DNA methylation data by mixed linear model (OSCA) and Bayesian penalised regression model (BayesR+). (Table S16). Variance in circulating inflammatory protein biomarker levels explained by common genetic and methylation data (joint and conditional estimates from BayesR+). Ordered by combined variance explained by genetic and epigenetic data - smallest to largest. Significant results from t-tests comparing distributions for variance explained by methylation or genetics alone versus combined estimate are emboldened. (Table S17). Genetic and epigenetic factors identified by BayesR+ when conditioning on all SNPs and CpGs together. (Table S18). Mendelian Randomisation analyses to assess whether proteins with concordantly identified genetic signals are causally associated with Alzheimer’s disease risk. (Table S19). AU - Hillary, Robert F. AU - Trejo-Banos, Daniel AU - Kousathanas, Athanasios AU - McCartney, Daniel L. AU - Harris, Sarah E. AU - Stevenson, Anna J. AU - Patxot, Marion AU - Ojavee, Sven Erik AU - Zhang, Qian AU - Liewald, David C. AU - Ritchie, Craig W. AU - Evans, Kathryn L. AU - Tucker-Drob, Elliot M. AU - Wray, Naomi R. AU - McRae, Allan F. AU - Visscher, Peter M. AU - Deary, Ian J. AU - Robinson, Matthew Richard AU - Marioni, Riccardo E. ID - 9706 TI - Additional file 2 of multi-method genome- and epigenome-wide studies of inflammatory protein levels in healthy older adults ER -