TY - JOUR AB - The fungal bioluminescence pathway can be reconstituted in other organisms allowing luminescence imaging without exogenously supplied substrate. The pathway starts from hispidin biosynthesis—a step catalyzed by a large fungal polyketide synthase that requires a posttranslational modification for activity. Here, we report identification of alternative compact hispidin synthases encoded by a phylogenetically diverse group of plants. A hybrid bioluminescence pathway that combines plant and fungal genes is more compact, not dependent on availability of machinery for posttranslational modifications, and confers autonomous bioluminescence in yeast, mammalian, and plant hosts. The compact size of plant hispidin synthases enables additional modes of delivery of autoluminescence, such as delivery with viral vectors. AU - Palkina, Kseniia A. AU - Karataeva, Tatiana A. AU - Perfilov, Maxim M. AU - Fakhranurova, Liliia I. AU - Markina, Nadezhda M. AU - Gonzalez Somermeyer, Louisa AU - Garcia-Perez, Elena AU - Vazquez-Vilar, Marta AU - Rodriguez-Rodriguez, Marta AU - Vazquez-Vilriales, Victor AU - Shakhova, Ekaterina S. AU - Mitiouchkina, Tatiana AU - Belozerova, Olga A. AU - Kovalchuk, Sergey I. AU - Alekberova, Anna AU - Malyshevskaia, Alena K. AU - Bugaeva, Evgenia N. AU - Guglya, Elena B. AU - Balakireva, Anastasia AU - Sytov, Nikita AU - Bezlikhotnova, Anastasia AU - Boldyreva, Daria I. AU - Babenko, Vladislav V. AU - Kondrashov, Fyodor AU - Choob, Vladimir V. AU - Orzaez, Diego AU - Yampolsky, Ilia V. AU - Mishin, Alexander S. AU - Sarkisyan, Karen S. ID - 15179 IS - 10 JF - Science Advances SN - 2375-2548 TI - A hybrid pathway for self-sustained luminescence VL - 10 ER - TY - JOUR AB - AlphaFold changed the field of structural biology by achieving three-dimensional (3D) structure prediction from protein sequence at experimental quality. The astounding success even led to claims that the protein folding problem is “solved”. However, protein folding problem is more than just structure prediction from sequence. Presently, it is unknown if the AlphaFold-triggered revolution could help to solve other problems related to protein folding. Here we assay the ability of AlphaFold to predict the impact of single mutations on protein stability (ΔΔG) and function. To study the question we extracted the pLDDT and metrics from AlphaFold predictions before and after single mutation in a protein and correlated the predicted change with the experimentally known ΔΔG values. Additionally, we correlated the same AlphaFold pLDDT metrics with the impact of a single mutation on structure using a large scale dataset of single mutations in GFP with the experimentally assayed levels of fluorescence. We found a very weak or no correlation between AlphaFold output metrics and change of protein stability or fluorescence. Our results imply that AlphaFold may not be immediately applied to other problems or applications in protein folding. AU - Pak, Marina A. AU - Markhieva, Karina A. AU - Novikova, Mariia S. AU - Petrov, Dmitry S. AU - Vorobyev, Ilya S. AU - Maksimova, Ekaterina AU - Kondrashov, Fyodor AU - Ivankov, Dmitry N. ID - 12758 IS - 3 JF - PLoS ONE TI - Using AlphaFold to predict the impact of single mutations on protein stability and function VL - 18 ER - TY - JOUR AB - Molecular compatibility between gametes is a prerequisite for successful fertilization. As long as a sperm and egg can recognize and bind each other via their surface proteins, gamete fusion may occur even between members of separate species, resulting in hybrids that can impact speciation. The egg membrane protein Bouncer confers species specificity to gamete interactions between medaka and zebrafish, preventing their cross-fertilization. Here, we leverage this specificity to uncover distinct amino acid residues and N-glycosylation patterns that differentially influence the function of medaka and zebrafish Bouncer and contribute to cross-species incompatibility. Curiously, in contrast to the specificity observed for medaka and zebrafish Bouncer, seahorse and fugu Bouncer are compatible with both zebrafish and medaka sperm, in line with the pervasive purifying selection that dominates Bouncer’s evolution. The Bouncer-sperm interaction is therefore the product of seemingly opposing evolutionary forces that, for some species, restrict fertilization to closely related fish, and for others, allow broad gamete compatibility that enables hybridization. AU - Gert, Krista R.B. AU - Panser, Karin AU - Surm, Joachim AU - Steinmetz, Benjamin S. AU - Schleiffer, Alexander AU - Jovine, Luca AU - Moran, Yehu AU - Kondrashov, Fyodor AU - Pauli, Andrea ID - 13164 JF - Nature Communications TI - Divergent molecular signatures in fish Bouncer proteins define cross-fertilization boundaries VL - 14 ER - TY - JOUR AB - Conflicts and natural disasters affect entire populations of the countries involved and, in addition to the thousands of lives destroyed, have a substantial negative impact on the scientific advances these countries provide. The unprovoked invasion of Ukraine by Russia, the devastating earthquake in Turkey and Syria, and the ongoing conflicts in the Middle East are just a few examples. Millions of people have been killed or displaced, their futures uncertain. These events have resulted in extensive infrastructure collapse, with loss of electricity, transportation, and access to services. Schools, universities, and research centers have been destroyed along with decades’ worth of data, samples, and findings. Scholars in disaster areas face short- and long-term problems in terms of what they can accomplish now for obtaining grants and for employment in the long run. In our interconnected world, conflicts and disasters are no longer a local problem but have wide-ranging impacts on the entire world, both now and in the future. Here, we focus on the current and ongoing impact of war on the scientific community within Ukraine and from this draw lessons that can be applied to all affected countries where scientists at risk are facing hardship. We present and classify examples of effective and feasible mechanisms used to support researchers in countries facing hardship and discuss how these can be implemented with help from the international scientific community and what more is desperately needed. Reaching out, providing accessible training opportunities, and developing collaborations should increase inclusion and connectivity, support scientific advancements within affected communities, and expedite postwar and disaster recovery. AU - Wolfsberger, Walter AU - Chhugani, Karishma AU - Shchubelka, Khrystyna AU - Frolova, Alina AU - Salyha, Yuriy AU - Zlenko, Oksana AU - Arych, Mykhailo AU - Dziuba, Dmytro AU - Parkhomenko, Andrii AU - Smolanka, Volodymyr AU - Gümüş, Zeynep H. AU - Sezgin, Efe AU - Diaz-Lameiro, Alondra AU - Toth, Viktor R. AU - Maci, Megi AU - Bortz, Eric AU - Kondrashov, Fyodor AU - Morton, Patricia M. AU - Łabaj, Paweł P. AU - Romero, Veronika AU - Hlávka, Jakub AU - Mangul, Serghei AU - Oleksyk, Taras K. ID - 13976 JF - GigaScience TI - Scientists without borders: Lessons from Ukraine VL - 12 ER - TY - GEN AU - Rella, Simon AU - Kulikova, Y AU - Minnegalieva, Aygul AU - Kondrashov, Fyodor ID - 14862 IS - Supplement_2 KW - Public Health KW - Environmental and Occupational Health SN - 1101-1262 T2 - European Journal of Public Health TI - Complex vaccination strategies prevent the emergence of vaccine resistance VL - 33 ER - TY - JOUR AB - During the COVID-19 pandemic, genomics and bioinformatics have emerged as essential public health tools. The genomic data acquired using these methods have supported the global health response, facilitated the development of testing methods and allowed the timely tracking of novel SARS-CoV-2 variants. Yet the virtually unlimited potential for rapid generation and analysis of genomic data is also coupled with unique technical, scientific and organizational challenges. Here, we discuss the application of genomic and computational methods for efficient data-driven COVID-19 response, the advantages of the democratization of viral sequencing around the world and the challenges associated with viral genome data collection and processing. AU - Knyazev, Sergey AU - Chhugani, Karishma AU - Sarwal, Varuni AU - Ayyala, Ram AU - Singh, Harman AU - Karthikeyan, Smruthi AU - Deshpande, Dhrithi AU - Baykal, Pelin Icer AU - Comarova, Zoia AU - Lu, Angela AU - Porozov, Yuri AU - Vasylyeva, Tetyana I. AU - Wertheim, Joel O. AU - Tierney, Braden T. AU - Chiu, Charles Y. AU - Sun, Ren AU - Wu, Aiping AU - Abedalthagafi, Malak S. AU - Pak, Victoria M. AU - Nagaraj, Shivashankar H. AU - Smith, Adam L. AU - Skums, Pavel AU - Pasaniuc, Bogdan AU - Komissarov, Andrey AU - Mason, Christopher E. AU - Bortz, Eric AU - Lemey, Philippe AU - Kondrashov, Fyodor AU - Beerenwinkel, Niko AU - Lam, Tommy Tsan Yuk AU - Wu, Nicholas C. AU - Zelikovsky, Alex AU - Knight, Rob AU - Crandall, Keith A. AU - Mangul, Serghei ID - 11187 IS - 4 JF - Nature Methods SN - 1548-7091 TI - Unlocking capacities of genomics for the COVID-19 response and future pandemics VL - 19 ER - TY - JOUR AB - Until recently, Shigella and enteroinvasive Escherichia coli were thought to be primate-restricted pathogens. The base of their pathogenicity is the type 3 secretion system (T3SS) encoded by the pINV virulence plasmid, which facilitates host cell invasion and subsequent proliferation. A large family of T3SS effectors, E3 ubiquitin-ligases encoded by the ipaH genes, have a key role in the Shigella pathogenicity through the modulation of cellular ubiquitination that degrades host proteins. However, recent genomic studies identified ipaH genes in the genomes of Escherichia marmotae, a potential marmot pathogen, and an E. coli extracted from fecal samples of bovine calves, suggesting that non-human hosts may also be infected by these strains, potentially pathogenic to humans. We performed a comparative genomic study of the functional repertoires in the ipaH gene family in Shigella and enteroinvasive Escherichia from human and predicted non-human hosts. We found that fewer than half of Shigella genomes had a complete set of ipaH genes, with frequent gene losses and duplications that were not consistent with the species tree and nomenclature. Non-human host IpaH proteins had a diverse set of substrate-binding domains and, in contrast to the Shigella proteins, two variants of the NEL C-terminal domain. Inconsistencies between strains phylogeny and composition of effectors indicate horizontal gene transfer between E. coli adapted to different hosts. These results provide a framework for understanding of ipaH-mediated host-pathogens interactions and suggest a need for a genomic study of fecal samples from diseased animals. AU - Dranenko, NO AU - Tutukina, MN AU - Gelfand, MS AU - Kondrashov, Fyodor AU - Bochkareva, Olga ID - 11344 JF - Scientific Reports SN - 2045-2322 TI - Chromosome-encoded IpaH ubiquitin ligases indicate non-human enteroinvasive Escherichia VL - 12 ER - TY - JOUR AB - Studies of protein fitness landscapes reveal biophysical constraints guiding protein evolution and empower prediction of functional proteins. However, generalisation of these findings is limited due to scarceness of systematic data on fitness landscapes of proteins with a defined evolutionary relationship. We characterized the fitness peaks of four orthologous fluorescent proteins with a broad range of sequence divergence. While two of the four studied fitness peaks were sharp, the other two were considerably flatter, being almost entirely free of epistatic interactions. Mutationally robust proteins, characterized by a flat fitness peak, were not optimal templates for machine-learning-driven protein design – instead, predictions were more accurate for fragile proteins with epistatic landscapes. Our work paves insights for practical application of fitness landscape heterogeneity in protein engineering. AU - Gonzalez Somermeyer, Louisa AU - Fleiss, Aubin AU - Mishin, Alexander S AU - Bozhanova, Nina G AU - Igolkina, Anna A AU - Meiler, Jens AU - Alaball Pujol, Maria-Elisenda AU - Putintseva, Ekaterina V AU - Sarkisyan, Karen S AU - Kondrashov, Fyodor ID - 11448 JF - eLife KW - General Immunology and Microbiology KW - General Biochemistry KW - Genetics and Molecular Biology KW - General Medicine KW - General Neuroscience SN - 2050-084X TI - Heterogeneity of the GFP fitness landscape and data-driven protein design VL - 11 ER - TY - JOUR AB - Empirical essays of fitness landscapes suggest that they may be rugged, that is having multiple fitness peaks. Such fitness landscapes, those that have multiple peaks, necessarily have special local structures, called reciprocal sign epistasis (Poelwijk et al. in J Theor Biol 272:141–144, 2011). Here, we investigate the quantitative relationship between the number of fitness peaks and the number of reciprocal sign epistatic interactions. Previously, it has been shown (Poelwijk et al. in J Theor Biol 272:141–144, 2011) that pairwise reciprocal sign epistasis is a necessary but not sufficient condition for the existence of multiple peaks. Applying discrete Morse theory, which to our knowledge has never been used in this context, we extend this result by giving the minimal number of reciprocal sign epistatic interactions required to create a given number of peaks. AU - Saona Urmeneta, Raimundo J AU - Kondrashov, Fyodor AU - Khudiakova, Kseniia ID - 11447 IS - 8 JF - Bulletin of Mathematical Biology KW - Computational Theory and Mathematics KW - General Agricultural and Biological Sciences KW - Pharmacology KW - General Environmental Science KW - General Biochemistry KW - Genetics and Molecular Biology KW - General Mathematics KW - Immunology KW - General Neuroscience SN - 0092-8240 TI - Relation between the number of peaks and the number of reciprocal sign epistatic interactions VL - 84 ER - TY - JOUR AB - Russia’s unprovoked attack on Ukraine has destroyed civilian infrastructure, including universities, research centers, and other academic infrastructure (1). Many Ukrainian scholars and researchers remain in Ukraine, and their work has suffered from major setbacks (2–4). We call on international scientists and institutions to support them. AU - Chhugani, Karishma AU - Frolova, Alina AU - Salyha, Yuriy AU - Fiscutean, Andrada AU - Zlenko, Oksana AU - Reinsone, Sanita AU - Wolfsberger, Walter W. AU - Ivashchenko, Oleksandra V. AU - Maci, Megi AU - Dziuba, Dmytro AU - Parkhomenko, Andrii AU - Bortz, Eric AU - Kondrashov, Fyodor AU - Łabaj, Paweł P. AU - Romero, Veronika AU - Hlávka, Jakub AU - Oleksyk, Taras K. AU - Mangul, Serghei ID - 12116 IS - 6626 JF - Science SN - 0036-8075 TI - Remote opportunities for scholars in Ukraine VL - 378 ER - TY - JOUR AB - Adult height inspired the first biometrical and quantitative genetic studies and is a test-case trait for understanding heritability. The studies of height led to formulation of the classical polygenic model, that has a profound influence on the way we view and analyse complex traits. An essential part of the classical model is an assumption of additivity of effects and normality of the distribution of the residuals. However, it may be expected that the normal approximation will become insufficient in bigger studies. Here, we demonstrate that when the height of hundreds of thousands of individuals is analysed, the model complexity needs to be increased to include non-additive interactions between sex, environment and genes. Alternatively, the use of log-normal approximation allowed us to still use the additive effects model. These findings are important for future genetic and methodologic studies that make use of adult height as an exemplar trait. AU - Slavskii, Sergei A. AU - Kuznetsov, Ivan A. AU - Shashkova, Tatiana I. AU - Bazykin, Georgii A. AU - Axenovich, Tatiana I. AU - Kondrashov, Fyodor AU - Aulchenko, Yurii S. ID - 9910 IS - 7 JF - European Journal of Human Genetics SN - 10184813 TI - The limits of normal approximation for adult height VL - 29 ER - TY - JOUR AB - Vaccines are thought to be the best available solution for controlling the ongoing SARS-CoV-2 pandemic. However, the emergence of vaccine-resistant strains may come too rapidly for current vaccine developments to alleviate the health, economic and social consequences of the pandemic. To quantify and characterize the risk of such a scenario, we created a SIR-derived model with initial stochastic dynamics of the vaccine-resistant strain to study the probability of its emergence and establishment. Using parameters realistically resembling SARS-CoV-2 transmission, we model a wave-like pattern of the pandemic and consider the impact of the rate of vaccination and the strength of non-pharmaceutical intervention measures on the probability of emergence of a resistant strain. As expected, we found that a fast rate of vaccination decreases the probability of emergence of a resistant strain. Counterintuitively, when a relaxation of non-pharmaceutical interventions happened at a time when most individuals of the population have already been vaccinated the probability of emergence of a resistant strain was greatly increased. Consequently, we show that a period of transmission reduction close to the end of the vaccination campaign can substantially reduce the probability of resistant strain establishment. Our results suggest that policymakers and individuals should consider maintaining non-pharmaceutical interventions and transmission-reducing behaviours throughout the entire vaccination period. AU - Rella, Simon AU - Kulikova, Yuliya A. AU - Dermitzakis, Emmanouil T. AU - Kondrashov, Fyodor ID - 9905 IS - 1 JF - Scientific Reports TI - Rates of SARS-CoV-2 transmission and vaccination impact the fate of vaccine-resistant strains VL - 11 ER - TY - GEN AB - The main idea behind the Core Project is to teach first year students at IST scientific communication skills and let them practice by presenting their research within an interdisciplinary environment. Over the course of the first semester, students participated in seminars, where they shared their results with the colleagues from other fields and took part in discussions on relevant subjects. The main focus during this sessions was on delivering the information in a simplified and comprehensible way, going into the very basics of a subject if necessary. At the end, the students were asked to present their research in the written form to exercise their writing skills. The reports were gathered in this document. All of them were reviewed by the teaching assistants and write-ups illustrating unique stylistic features and, in general, an outstanding level of writing skills, were honorably mentioned in the section "Selected Reports". AU - Maslov, Mikhail AU - Kondrashov, Fyodor AU - Artner, Christina AU - Hennessey-Wesen, Mike AU - Kavcic, Bor AU - Machnik, Nick N AU - Satapathy, Roshan K AU - Tomanek, Isabella ID - 8151 TI - Core Project Proceedings ER - TY - JOUR AB - In the course of sample preparation for Next Generation Sequencing (NGS), DNA is fragmented by various methods. Fragmentation shows a persistent bias with regard to the cleavage rates of various dinucleotides. With the exception of CpG dinucleotides the previously described biases were consistent with results of the DNA cleavage in solution. Here we computed cleavage rates of all dinucleotides including the methylated CpG and unmethylated CpG dinucleotides using data of the Whole Genome Sequencing datasets of the 1000 Genomes project. We found that the cleavage rate of CpG is significantly higher for the methylated CpG dinucleotides. Using this information, we developed a classifier for distinguishing cancer and healthy tissues based on their CpG islands statuses of the fragmentation. A simple Support Vector Machine classifier based on this algorithm shows an accuracy of 84%. The proposed method allows the detection of epigenetic markers purely based on mechanochemical DNA fragmentation, which can be detected by a simple analysis of the NGS sequencing data. AU - Uroshlev, Leonid A. AU - Abdullaev, Eldar T. AU - Umarova, Iren R. AU - Il’Icheva, Irina A. AU - Panchenko, Larisa A. AU - Polozov, Robert V. AU - Kondrashov, Fyodor AU - Nechipurenko, Yury D. AU - Grokhovsky, Sergei L. ID - 7931 JF - Scientific Reports TI - A method for identification of the methylation level of CpG islands from NGS data VL - 10 ER - TY - JOUR AB - Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a ‘combinatorially complete dataset’. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199 847 053 unique combinatorially complete genotype combinations of dimensionality ranging from 2 to 12. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data. AU - Esteban, Laura A AU - Lonishin, Lyubov R AU - Bobrovskiy, Daniil M AU - Leleytner, Gregory AU - Bogatyreva, Natalya S AU - Kondrashov, Fyodor AU - Ivankov, Dmitry N ID - 8645 IS - 6 JF - Bioinformatics SN - 1367-4803 TI - HypercubeME: Two hundred million combinatorially complete datasets from a single experiment VL - 36 ER - TY - JOUR AB - Autoluminescent plants engineered to express a bacterial bioluminescence gene cluster in plastids have not been widely adopted because of low light output. We engineered tobacco plants with a fungal bioluminescence system that converts caffeic acid (present in all plants) into luciferin and report self-sustained luminescence that is visible to the naked eye. Our findings could underpin development of a suite of imaging tools for plants. AU - Mitiouchkina, Tatiana AU - Mishin, Alexander S. AU - Gonzalez Somermeyer, Louisa AU - Markina, Nadezhda M. AU - Chepurnyh, Tatiana V. AU - Guglya, Elena B. AU - Karataeva, Tatiana A. AU - Palkina, Kseniia A. AU - Shakhova, Ekaterina S. AU - Fakhranurova, Liliia I. AU - Chekova, Sofia V. AU - Tsarkova, Aleksandra S. AU - Golubev, Yaroslav V. AU - Negrebetsky, Vadim V. AU - Dolgushin, Sergey A. AU - Shalaev, Pavel V. AU - Shlykov, Dmitry AU - Melnik, Olesya A. AU - Shipunova, Victoria O. AU - Deyev, Sergey M. AU - Bubyrev, Andrey I. AU - Pushin, Alexander S. AU - Choob, Vladimir V. AU - Dolgov, Sergey V. AU - Kondrashov, Fyodor AU - Yampolsky, Ilia V. AU - Sarkisyan, Karen S. ID - 7889 JF - Nature Biotechnology SN - 1087-0156 TI - Plants with genetically encoded autoluminescence VL - 38 ER - TY - JOUR AB - Characterizing the fitness landscape, a representation of fitness for a large set of genotypes, is key to understanding how genetic information is interpreted to create functional organisms. Here we determined the evolutionarily-relevant segment of the fitness landscape of His3, a gene coding for an enzyme in the histidine synthesis pathway, focusing on combinations of amino acid states found at orthologous sites of extant species. Just 15% of amino acids found in yeast His3 orthologues were always neutral while the impact on fitness of the remaining 85% depended on the genetic background. Furthermore, at 67% of sites, amino acid replacements were under sign epistasis, having both strongly positive and negative effect in different genetic backgrounds. 46% of sites were under reciprocal sign epistasis. The fitness impact of amino acid replacements was influenced by only a few genetic backgrounds but involved interaction of multiple sites, shaping a rugged fitness landscape in which many of the shortest paths between highly fit genotypes are inaccessible. AU - Pokusaeva, Victoria AU - Usmanova, Dinara R. AU - Putintseva, Ekaterina V. AU - Espinar, Lorena AU - Sarkisyan, Karen AU - Mishin, Alexander S. AU - Bogatyreva, Natalya S. AU - Ivankov, Dmitry AU - Akopyan, Arseniy AU - Avvakumov, Sergey AU - Povolotskaya, Inna S. AU - Filion, Guillaume J. AU - Carey, Lucas B. AU - Kondrashov, Fyodor ID - 6419 IS - 4 JF - PLoS Genetics TI - An experimental assay of the interactions of amino acids from orthologous sequences shaping a complex fitness landscape VL - 15 ER - TY - GEN AU - Pokusaeva, Victoria AU - Usmanova, Dinara R. AU - Putintseva, Ekaterina V. AU - Espinar, Lorena AU - Sarkisyan, Karen AU - Mishin, Alexander S. AU - Bogatyreva, Natalya S. AU - Ivankov, Dmitry AU - Akopyan, Arseniy AU - Avvakumov, Sergey AU - Povolotskaya, Inna S. AU - Filion, Guillaume J. AU - Carey, Lucas B. AU - Kondrashov, Fyodor ID - 9790 TI - A statistical summary of segment libraries and sequencing results ER - TY - GEN AU - Pokusaeva, Victoria AU - Usmanova, Dinara R. AU - Putintseva, Ekaterina V. AU - Espinar, Lorena AU - Sarkisyan, Karen AU - Mishin, Alexander S. AU - Bogatyreva, Natalya S. AU - Ivankov, Dmitry AU - Akopyan, Arseniy AU - Povolotskaya, Inna S. AU - Filion, Guillaume J. AU - Carey, Lucas B. AU - Kondrashov, Fyodor ID - 9797 TI - A statistical summary of segment libraries and sequencing results ER - TY - GEN AU - Pokusaeva, Victoria AU - Usmanova, Dinara R. AU - Putintseva, Ekaterina V. AU - Espinar, Lorena AU - Sarkisyan, Karen AU - Mishin, Alexander S. AU - Bogatyreva, Natalya S. AU - Ivankov, Dmitry AU - Akopyan, Arseniy AU - Avvakumov, Sergey AU - Povolotskaya, Inna S. AU - Filion, Guillaume J. AU - Carey, Lucas B. AU - Kondrashov, Fyodor ID - 9789 TI - Multiple alignment of His3 orthologues ER - TY - JOUR AB - Multiple sequence alignments (MSAs) are used for structural1,2 and evolutionary predictions1,2, but the complexity of aligning large datasets requires the use of approximate solutions3, including the progressive algorithm4. Progressive MSA methods start by aligning the most similar sequences and subsequently incorporate the remaining sequences, from leaf-to-root, based on a guide-tree. Their accuracy declines substantially as the number of sequences is scaled up5. We introduce a regressive algorithm that enables MSA of up to 1.4 million sequences on a standard workstation and substantially improves accuracy on datasets larger than 10,000 sequences. Our regressive algorithm works the other way around to the progressive algorithm and begins by aligning the most dissimilar sequences. It uses an efficient divide-and-conquer strategy to run third-party alignment methods in linear time, regardless of their original complexity. Our approach will enable analyses of extremely large genomic datasets such as the recently announced Earth BioGenome Project, which comprises 1.5 million eukaryotic genomes6. AU - Garriga, Edgar AU - Di Tommaso, Paolo AU - Magis, Cedrik AU - Erb, Ionas AU - Mansouri, Leila AU - Baltzis, Athanasios AU - Laayouni, Hafid AU - Kondrashov, Fyodor AU - Floden, Evan AU - Notredame, Cedric ID - 7181 IS - 12 JF - Nature Biotechnology SN - 10870156 TI - Large multiple sequence alignments with a root-to-leaf regressive method VL - 37 ER - TY - GEN AB - This dataset contains a GitHub repository containing all the data, analysis, Nextflow workflows and Jupyter notebooks to replicate the manuscript titled "Fast and accurate large multiple sequence alignments with a root-to-leaf regressive method". It also contains the Multiple Sequence Alignments (MSAs) generated and well as the main figures and tables from the manuscript. The repository is also available at GitHub (https://github.com/cbcrg/dpa-analysis) release `v1.2`. For details on how to use the regressive alignment algorithm, see the T-Coffee software suite (https://github.com/cbcrg/tcoffee). AU - Garriga, Edgar AU - di Tommaso, Paolo AU - Magis, Cedrik AU - Erb, Ionas AU - Mansouri, Leila AU - Baltzis, Athanasios AU - Laayouni, Hafid AU - Kondrashov, Fyodor AU - Floden, Evan AU - Notredame, Cedric ID - 13059 TI - Fast and accurate large multiple sequence alignments with a root-to-leaf regressive method ER - TY - JOUR AB - Bioluminescence is found across the entire tree of life, conferring a spectacular set of visually oriented functions from attracting mates to scaring off predators. Half a dozen different luciferins, molecules that emit light when enzymatically oxidized, are known. However, just one biochemical pathway for luciferin biosynthesis has been described in full, which is found only in bacteria. Here, we report identification of the fungal luciferase and three other key enzymes that together form the biosynthetic cycle of the fungal luciferin from caffeic acid, a simple and widespread metabolite. Introduction of the identified genes into the genome of the yeast Pichia pastoris along with caffeic acid biosynthesis genes resulted in a strain that is autoluminescent in standard media. We analyzed evolution of the enzymes of the luciferin biosynthesis cycle and found that fungal bioluminescence emerged through a series of events that included two independent gene duplications. The retention of the duplicated enzymes of the luciferin pathway in nonluminescent fungi shows that the gene duplication was followed by functional sequence divergence of enzymes of at least one gene in the biosynthetic pathway and suggests that the evolution of fungal bioluminescence proceeded through several closely related stepping stone nonluminescent biochemical reactions with adaptive roles. The availability of a complete eukaryotic luciferin biosynthesis pathway provides several applications in biomedicine and bioengineering. AU - Kotlobay, Alexey A. AU - Sarkisyan, Karen AU - Mokrushina, Yuliana A. AU - Marcet-Houben, Marina AU - Serebrovskaya, Ekaterina O. AU - Markina, Nadezhda M. AU - Gonzalez Somermeyer, Louisa AU - Gorokhovatsky, Andrey Y. AU - Vvedensky, Andrey AU - Purtov, Konstantin V. AU - Petushkov, Valentin N. AU - Rodionova, Natalja S. AU - Chepurnyh, Tatiana V. AU - Fakhranurova, Liliia AU - Guglya, Elena B. AU - Ziganshin, Rustam AU - Tsarkova, Aleksandra S. AU - Kaskova, Zinaida M. AU - Shender, Victoria AU - Abakumov, Maxim AU - Abakumova, Tatiana O. AU - Povolotskaya, Inna S. AU - Eroshkin, Fedor M. AU - Zaraisky, Andrey G. AU - Mishin, Alexander S. AU - Dolgov, Sergey V. AU - Mitiouchkina, Tatiana Y. AU - Kopantzev, Eugene P. AU - Waldenmaier, Hans E. AU - Oliveira, Anderson G. AU - Oba, Yuichi AU - Barsova, Ekaterina AU - Bogdanova, Ekaterina A. AU - Gabaldón, Toni AU - Stevani, Cassius V. AU - Lukyanov, Sergey AU - Smirnov, Ivan V. AU - Gitelson, Josef I. AU - Kondrashov, Fyodor AU - Yampolsky, Ilia V. ID - 5780 IS - 50 JF - Proceedings of the National Academy of Sciences of the United States of America SN - 00278424 TI - Genetically encodable bioluminescent system from fungi VL - 115 ER - TY - JOUR AB - Background: Natural selection shapes cancer genomes. Previous studies used signatures of positive selection to identify genes driving malignant transformation. However, the contribution of negative selection against somatic mutations that affect essential tumor functions or specific domains remains a controversial topic. Results: Here, we analyze 7546 individual exomes from 26 tumor types from TCGA data to explore the portion of the cancer exome under negative selection. Although we find most of the genes neutrally evolving in a pan-cancer framework, we identify essential cancer genes and immune-exposed protein regions under significant negative selection. Moreover, our simulations suggest that the amount of negative selection is underestimated. We therefore choose an empirical approach to identify genes, functions, and protein regions under negative selection. We find that expression and mutation status of negatively selected genes is indicative of patient survival. Processes that are most strongly conserved are those that play fundamental cellular roles such as protein synthesis, glucose metabolism, and molecular transport. Intriguingly, we observe strong signals of selection in the immunopeptidome and proteins controlling peptide exposition, highlighting the importance of immune surveillance evasion. Additionally, tumor type-specific immune activity correlates with the strength of negative selection on human epitopes. Conclusions: In summary, our results show that negative selection is a hallmark of cell essentiality and immune response in cancer. The functional domains identified could be exploited therapeutically, ultimately allowing for the development of novel cancer treatments. AU - Zapata, Luis AU - Pich, Oriol AU - Serrano, Luis AU - Kondrashov, Fyodor AU - Ossowski, Stephan AU - Schaefer, Martin ID - 279 JF - Genome Biology TI - Negative selection in tumor genome evolution acts on essential cellular functions and the immunopeptidome VL - 19 ER - TY - GEN AB - This document contains the full list of genes with their respective significance and dN/dS values. (TXT 4499Â kb) AU - Zapata, Luis AU - Pich, Oriol AU - Serrano, Luis AU - Kondrashov, Fyodor AU - Ossowski, Stephan AU - Schaefer, Martin ID - 9812 TI - Additional file 2: Of negative selection in tumor genome evolution acts on essential cellular functions and the immunopeptidome ER - TY - GEN AB - This document contains additional supporting evidence presented as supplemental tables. (XLSX 50Â kb) AU - Zapata, Luis AU - Pich, Oriol AU - Serrano, Luis AU - Kondrashov, Fyodor AU - Ossowski, Stephan AU - Schaefer, Martin ID - 9811 TI - Additional file 1: Of negative selection in tumor genome evolution acts on essential cellular functions and the immunopeptidome ER - TY - JOUR AB - Motivation Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results especially when exploring the effects of combination of different mutations. Results Here we use a protocol to measure the bias as a function of the number of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without experimentally measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using additional relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here. AU - Usmanova, Dinara R AU - Bogatyreva, Natalya S AU - Ariño Bernad, Joan AU - Eremina, Aleksandra A AU - Gorshkova, Anastasiya A AU - Kanevskiy, German M AU - Lonishin, Lyubov R AU - Meister, Alexander V AU - Yakupova, Alisa G AU - Kondrashov, Fyodor AU - Ivankov, Dmitry ID - 5995 IS - 21 JF - Bioinformatics SN - 1367-4803 TI - Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation VL - 34 ER - TY - JOUR AB - Fitness landscapes depict how genotypes manifest at the phenotypic level and form the basis of our understanding of many areas of biology, yet their properties remain elusive. Previous studies have analysed specific genes, often using their function as a proxy for fitness, experimentally assessing the effect on function of single mutations and their combinations in a specific sequence or in different sequences. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here we visualize an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea Victoria (avGFP) by measuring the native function (fluorescence) of tens of thousands of derivative genotypes of avGFP. We show that the fitness landscape of avGFP is narrow, with 3/4 of the derivatives with a single mutation showing reduced fluorescence and half of the derivatives with four mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations and mostly occurred through the cumulative effect of slightly deleterious mutations causing a threshold-like decrease in protein stability and a concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for several fields including molecular evolution, population genetics and protein design. AU - Karen Sarkisyan AU - Bolotin, Dmitry A AU - Meer, Margarita V AU - Usmanova, Dinara R AU - Mishin, Alexander S AU - Sharonov, George V AU - Ivankov, Dmitry N AU - Bozhanova, Nina G AU - Baranov, Mikhail S AU - Soylemez, Onuralp AU - Bogatyreva, Natalya S AU - Vlasov, Peter K AU - Egorov, Evgeny S AU - Logacheva, Maria D AU - Kondrashov, Alexey S AU - Chudakov, Dmitriy M AU - Putintseva, Ekaterina V AU - Mamedov, Ilgar Z AU - Tawfik, Dan S AU - Lukyanov, Konstantin A AU - Fyodor Kondrashov ID - 850 JF - Nature TI - Local fitness landscape of the green fluorescent protein VL - 533 ER - TY - JOUR AB - Multicellular eukaryotes have evolved a range of mechanisms for immune recognition. A widespread family involved in innate immunity are the NACHT-domain and leucine-rich-repeat-containing (NLR) proteins.Mammals have small numbers of NLR proteins, whereas in some species, mostly those without adaptive immune systems, NLRs have expanded into very large families.We describe a family of nearly 400NLR proteins encoded in the zebrafish genome. The proteins share a defining overall structure, which arose in fishes after a fusion of the core NLR domains with a B30.2 domain, but can be subdivided into four groups based on their NACHT domains. Gene conversion acting differentially on the NACHT and B30.2 domains has shaped the family and created the groups. Evidence of positive selection in the B30.2 domain indicates that this domain rather than the leucine-rich repeats acts as the pathogen recognition module. In an unusual chromosomal organization, the majority of the genes are located on one chromosome arm, interspersed with other large multigene families, including a new family encoding zinc-finger proteins. The NLR-B30.2 proteins represent a new family with diversity in the specific recognition module that is present in fishes in spite of the parallel existence of an adaptive immune system. AU - Howe, Kerstin L AU - Schiffer, Philipp H AU - Zielinski, Julia G AU - Wiehe, Thomas H AU - Laird, Gavin K AU - Marioni, John C AU - Soylemez, Onuralp AU - Fyodor Kondrashov AU - Leptin, Maria ID - 896 IS - 4 JF - Open Biology TI - Structure and evolutionary history of a large family of NLR proteins in the zebrafish VL - 6 ER - TY - JOUR AB - Understanding the principles that led to the current complexity of the genetic code is a central question in evolution. Expansion of the genetic code required the selection of new transfer RNAs (tRNAs) with specific recognition signals that allowed them to be matured, modified, aminoacylated, and processed by the ribosome without compromising the fidelity or efficiency of protein synthesis. We show that saturation of recognition signals blocks the emergence of new tRNA identities and that the rate of nucleotide substitutions in tRNAs is higher in species with fewer tRNA genes. We propose that the growth of the genetic code stalled because a limit was reached in the number of identity elements that can be effectively used in the tRNA structure. AU - Saint-Léger, Adélaïde AU - Bello, Carla AU - Dans, Pablo D AU - Torres, Adrian G AU - Novoa, Eva M AU - Camacho, Noelia AU - Orozco, Modesto AU - Fyodor Kondrashov AU - Ribas De Pouplana, Lluís ID - 849 IS - 4 JF - Science advances TI - Saturation of recognition elements blocks evolution of new tRNA identities VL - 2 ER - TY - JOUR AB - A comparative analysis of the metagenomes from two 30 000-year-old permafrost samples, one of lake-alluvial origin and the other from late Pleistocene Ice Complex sediments, revealed significant differences within microbial communities. The late Pleistocene Ice Complex sediments (which have been characterized by the absence of methane with lower values of redox potential and Fe2+ content) showed a low abundance of methanogenic archaea and enzymes from both the carbon and nitrogen cycles, but a higher abundance of enzymes associated with the sulfur cycle. The metagenomic and geochemical analyses described in the paper provide evidence that the formation of the sampled late Pleistocene Ice Complex sediments likely took place under much more aerobic conditions than lake-alluvial sediments. AU - Rivkina, Elizaveta AU - Petrovskaya, Lada E AU - Vishnivetskaya, Tatiana A AU - Krivushin, Kirill V AU - Shmakova, Lyubov A AU - Tutukina, Maria AU - Meyers, Arthur J AU - Fyodor Kondrashov ID - 853 IS - 7 JF - Biogeosciences TI - Metagenomic analyses of the late Pleistocene permafrost - Additional tools for reconstruction of environmental conditions VL - 13 ER - TY - JOUR AB - The nature of factors governing the tempo and mode of protein evolution is a fundamental issue in evolutionary biology. Specifically, whether or not interactions between different sites, or epistasis, are important in directing the course of evolution became one of the central questions. Several recent reports have scrutinized patterns of long-term protein evolution claiming them to be compatible only with an epistatic fitness landscape. However, these claims have not yet been substantiated with a formal model of protein evolution. Here, we formulate a simple covarion-like model of protein evolution focusing on the rate at which the fitness impact of amino acids at a site changes with time. We then apply the model to the data on convergent and divergent protein evolution to test whether or not the incorporation of epistatic interactions is necessary to explain the data. We find that convergent evolution cannot be explained without the incorporation of epistasis and the rate at which an amino acid state switches from being acceptable at a site to being deleterious is faster than the rate of amino acid substitution. Specifically, for proteins that have persisted in modern prokaryotic organisms since the last universal common ancestor for one amino acid substitution approximately ten amino acid states switch from being accessible to being deleterious, or vice versa. Thus, molecular evolution can only be perceived in the context of rapid turnover of which amino acids are available for evolution. AU - Usmanova, Dinara AU - Ferretti, Luca AU - Povolotskaya, Inna AU - Vlasov, Peter AU - Kondrashov, Fyodor ID - 848 IS - 2 JF - Molecular Biology and Evolution TI - A model of substitution trajectories in sequence space and long-term protein evolution VL - 32 ER - TY - JOUR AB - The origin and evolution of novel biochemical functions remains one of the key questions in molecular evolution. We study recently emerged methacrylate reductase function that is thought to have emerged in the last century and reported in Geobacter sulfurreducens strain AM-1. We report the sequence and study the evolution of the operon coding for the flavin-containing methacrylate reductase (Mrd) and tetraheme cytochrome (Mcc) in the genome of G. sulfurreducens AM-1. Different types of signal peptides in functionally interlinked proteins Mrd and Mcc suggest a possible complex mechanism of biogenesis for chromoproteids of the methacrylate redox system. The homologs of the Mrd and Mcc sequence found in δ-Proteobacteria and Deferribacteres are also organized into an operon and their phylogenetic distribution suggested that these two genes tend to be horizontally transferred together. Specifically, the mrd and mcc genes from G. sulfurreducens AM-1 are not monophyletic with any of the homologs found in other Geobacter genomes. The acquisition of methacrylate reductase function by G. sulfurreducens AM-1 appears linked to a horizontal gene transfer event. However, the new function of the products of mrd and mcc may have evolved either prior or subsequent to their acquisition by G. sulfurreducens AM-1. AU - Arkhipova, Oksana V AU - Meer, Margarita V AU - Mikoulinskaia, Galina V AU - Zakharova, Marina V AU - Galushko, Alexander S AU - Akimenko, Vasilii K AU - Fyodor Kondrashov ID - 906 IS - 5 JF - PLoS One TI - Recent origin of the methacrylate redox system in Geobacter sulfurreducens AM-1 through horizontal gene transfer VL - 10 ER - TY - JOUR AB - Proteases play important roles in many biologic processes and are key mediators of cancer, inflammation, and thrombosis. However, comprehensive and quantitative techniques to define the substrate specificity profile of proteases are lacking. The metalloprotease ADAMTS13 regulates blood coagulation by cleaving von Willebrand factor (VWF), reducing its procoagulant activity. A mutagenized substrate phage display library based on a 73-amino acid fragment of VWF was constructed, and the ADAMTS13-dependent change in library complexity was evaluated over reaction time points, using high-throughput sequencing. Reaction rate constants (kcat/KM) were calculated for nearly every possible single amino acid substitution within this fragment. This massively parallel enzyme kinetics analysis detailed the specificity of ADAMTS13 and demonstrated the critical importance of the P1-P1' substrate residues while defining exosite binding domains. These data provided empirical evidence for the propensity for epistasis within VWF and showed strong correlation to conservation across orthologs, highlighting evolutionary selective pressures for VWF. AU - Kretz, Colin A AU - Dai, Manhong AU - Soylemez, Onuralp AU - Yee, Andrew AU - Desch, Karl C AU - Siemieniak, David R AU - Tomberg, Kärt AU - Fyodor Kondrashov AU - Meng, Fan AU - Ginsburg, David B ID - 866 IS - 30 JF - PNAS TI - Massively parallel enzyme kinetics reveals the substrate recognition landscape of the metalloprotease ADAMTS13 VL - 112 ER - TY - JOUR AB - The factors that determine the tempo and mode of protein evolution continue to be a central question in molecular evolution. Traditionally, studies of protein evolution focused on the rates of amino acid substitutions. More recently, with the availability of sequence data and advanced experimental techniques, the focus of attention has shifted toward the study of evolutionary trajectories and the overall layout of protein fitness landscapes. In this review we describe the effect of epistasis on the topology of evolutionary pathways that are likely to be found in fitness landscapes and develop a simple theory to connect the number of maladapted genotypes to the topology of fitness landscapes with epistatic interactions. Finally, we review recent studies that have probed the extent of epistatic interactions and have begun to chart the fitness landscapes in protein sequence space. AU - Kondrashov, Dmitry A AU - Fyodor Kondrashov ID - 886 IS - 1 JF - Trends in Genetics TI - Topological features of rugged fitness landscapes in sequence space VL - 31 ER - TY - JOUR AB - Rapid divergence of gene copies after duplication is thought to determine the fate of the copies and evolution of novel protein functions. However, data on howlong the gene copies continue to experience an elevated rate of evolution remain scarce. Standard theory of gene duplications based on some level of genetic redundancy of gene copies predicts that the period of accelerated evolutionmust end relatively quickly. Using a maximum-likelihood approach we estimate preduplication, initial postduplication, and recent postduplication rates of evolution that occurred in themammalian lineage.Wefind that both gene copies experience a similar in magnitude acceleration in their rate of evolution. The copy located in the original genomic position typically returns to the preduplication rates of evolution in a short period of time. The burst of faster evolution of the copy that is located in a new genomic position typically lasts longer. Furthermore, the fast-evolving copies on average continue to evolve faster than the preduplication rates far longer than predicted by standard theory of gene duplications.We hypothesize that the prolonged elevated rates of evolution are determined by functional properties that were acquired during, or soon after, the gene duplication event. AU - Rosello, Oriol P AU - Fyodor Kondrashov ID - 852 IS - 8 JF - Genome Biology and Evolution TI - Long-Term asymmetrical acceleration of protein evolution after gene duplication VL - 6 ER - TY - JOUR AB - The emergence of new genes throughout evolution requires rewiring and extension of regulatory networks. However, the molecular details of how the transcriptional regulation of new gene copies evolves remain largely unexplored. Here we show how duplication of a transcription factor gene allowed the emergence of two independent regulatory circuits. Interestingly, the ancestral transcription factor was promiscuous and could bind different motifs in its target promoters. After duplication, one paralogue evolved increased binding specificity so that it only binds one type of motif, whereas the other copy evolved a decreased activity so that it only activates promoters that contain multiple binding sites. Interestingly, only a few mutations in both the DNA-binding domains and in the promoter binding sites were required to gradually disentangle the two networks. These results reveal how duplication of a promiscuous transcription factor followed by concerted cis and trans mutations allows expansion of a regulatory network. AU - Pougach, Ksenia S AU - Voet, Arnout R AU - Fyodor Kondrashov AU - Voordeckers, Karin AU - Christiaens, Joaquin F AU - Baying, Bianka AU - Bénès, Vladimı́r AU - Sakai, Ryo AU - Aerts, Jan A AU - Zhu, Bo AU - Van Dijck, Patrick AU - Verstrepen, Kevin J ID - 856 JF - Nature Communications TI - Duplication of a promiscuous transcription factor drives the emergence of a new regulatory network VL - 5 ER - TY - JOUR AB - The origins of neural systems remain unresolved. In contrast to other basal metazoans, ctenophores (comb jellies) have both complex nervous and mesoderm-derived muscular systems. These holoplanktonic predators also have sophisticated ciliated locomotion, behaviour and distinct development. Here we present the draft genome of Pleurobrachia bachei, Pacific sea gooseberry, together with ten other ctenophore transcriptomes, and show that they are remarkably distinct from other animal genomes in their content of neurogenic, immune and developmental genes. Our integrative analyses place Ctenophora as the earliest lineage within Metazoa. This hypothesis is supported by comparative analysis of multiple gene families, including the apparent absence of HOX genes, canonical microRNA machinery, and reduced immune complement in ctenophores. Although two distinct nervous systems are well recognized in ctenophores, many bilaterian neuron-specific genes and genes of 'classical' neurotransmitter pathways either are absent or, if present, are not expressed in neurons. Our metabolomic and physiological data are consistent with the hypothesis that ctenophore neural systems, and possibly muscle specification, evolved independently from those in other animals. AU - Moroz, Leonid L AU - Kocot, Kevin M AU - Citarella, Mathew R AU - Dosung, Sohn AU - Norekian, Tigran P AU - Povolotskaya, Inna AU - Grigorenko, Anastasia P AU - Dailey, Christopher A AU - Berezikov, Eugene AU - Buckley, Katherine M AU - Ptitsyn, Andrey A AU - Reshetov, Denis A AU - Mukherjee, Krishanu AU - Moroz, Tatiana P AU - Bobkova, Yelena V AU - Yu, Fahong AU - Kapitonov, Vladimir V AU - Jurka, Jerzy W AU - Bobkov, Yuriy V AU - Swore, Joshua J AU - Girardo, David O AU - Fodor, Alexander AU - Gusev, Fedor E AU - Sanford, Rachel S AU - Bruders, Rebecca AU - Kittler, Ellen L AU - Mills, Claudia E AU - Rast, Jonathan P AU - Derelle, Romain AU - Solovyev, Victor AU - Fyodor Kondrashov AU - Swalla, Billie J AU - Sweedler, Jonathan V AU - Rogaev, Evgeny I AU - Halanych, Kenneth M AU - Kohn, Andrea B ID - 863 IS - 7503 JF - Nature TI - The ctenophore genome and the evolutionary origins of neural systems VL - 510 ER - TY - JOUR AB - Research on existing drugs often discovers novel mechanisms of their action and leads to the expansion of their therapeutic scope and subsequent remarketing. The Wnt signaling pathway is of the immediate therapeutic relevance, as it plays critical roles in cancer development and progression. However, drugs which disrupt this pathway are unavailable despite the high demand. Here we report an attempt to identify antagonists of the Wnt-FZD interaction among the library of the FDA-approved drugs. We performed an in silico screening which brought up several potential antagonists of the ligand-receptor interaction. 14 of these substances were tested using the TopFlash luciferase reporter assay and four of them identified as active and specific inhibitors of the Wnt3a-induced signaling. However, further analysis through GTP-binding and β-catenin stabilization assays showed that the compounds do not target the Wnt-FZD pair, but inhibit the signaling at downstream levels. We further describe the previously unknown inhibitory activity of an anti-leprosy drug clofazimine in the Wnt pathway and provide data demonstrating its efficiency in suppressing growth of Wnt-dependent triple-negative breast cancer cells. These data provide a basis for further investigations of the efficiency of clofazimine in treatment of Wnt-dependent cancers. AU - Koval, Alexey V AU - Vlasov, Peter K AU - Shichkova, Polina AU - Khunderyakova, S AU - Markov, Yury AU - Panchenko, J AU - Volodina, A AU - Fyodor Kondrashov AU - Katanaev, Vladimir L ID - 865 IS - 4 JF - Biochemical Pharmacology TI - Anti leprosy drug clofazimine inhibits growth of triple-negative breast cancer cells via inhibition of canonical Wnt signaling VL - 87 ER - TY - JOUR AB - Recombination between double-stranded DNA molecules is a key genetic process which occurs in a wide variety of organisms. Usually, crossing-over (CO) occurs during meiosis between genotypes with 98.0-99.9% sequence identity, because within-population nucleotide diversity only rarely exceeds 2%. However, some species are hypervariable and it is unclear how CO can occur between genotypes with less than 90% sequence identity. Here, we study CO in Schizophyllum commune, a hypervariable cosmopolitan basidiomycete mushroom, a frequently encountered decayer of woody substrates. We crossed two haploid individuals, from the United States and from Russia, and obtained genome sequences for their 17 offspring. The average genetic distance between the parents was 14%, making it possible to study CO at very high resolution. We found reduced levels of linkage disequilibrium between loci flanking the CO sites indicating that they are mostly confined to hotspots of recombination. Furthermore, CO events preferentially occurred in regions under stronger negative selection, in particular within exons that showed reduced levels of nucleotide diversity. Apparently, in hypervariable species CO must avoid regions of higher divergence between the recombining genomes due to limitations imposed by the mismatch repair system, with regions under strong negative selection providing the opportunity for recombination. These patterns are opposite to those observed in a number of less variable species indicating that population genomics of hypervariable species may reveal novel biological phenomena. AU - Seplyarskiy, Vladimir B AU - Logacheva, Maria D AU - Penin, Aleksey A AU - Baranová, Maria A AU - Leushkin, Evgeny V AU - Demidenko, Natalia V AU - Klepikova, Anna V AU - Fyodor Kondrashov AU - Kondrashov, Alexey S AU - James, Timothy Y ID - 845 IS - 11 JF - Molecular Biology and Evolution TI - Crossing-over in a hypervariable species preferentially occurs in regions of high local similarity VL - 31 ER - TY - JOUR AB - The study of molecular evolution is important because it reveals how protein functions emerge and evolve. Recently, several types of studies indicated that substitutions in molecular evolution occur in a compensatory manner, whereby the occurrence of a substitution depends on the amino acid residues at other sites. However, a molecular or structural basis behind the compensation often remains obscure. Here, we review studies on the interface of structural biology and molecular evolution that revealed novel aspects of compensatory evolution. In many cases structural studies benefit from evolutionary data while structural data often add a functional dimension to the study of molecular evolution. AU - Ivankov, Dmitry N AU - Finkelstein, Alexei V AU - Fyodor Kondrashov ID - 892 IS - 1 JF - Current Opinion in Structural Biology TI - A structural perspective of compensatory evolution VL - 26 ER - TY - JOUR AB - Understanding fitness landscapes, a conceptual depiction of the genotype-to-phenotype relationship, is crucial to many areas of biology. Two aspects of fitness landscapes are the focus of contemporary studies of molecular evolution. First, the local shape of the fitness landscape defined by the contribution of individual alleles to fitness that is independent of all genetic interactions. Second, the global, multidimensional fitness landscape shape determined by how interactions between alleles at different loci change each other’s fitness impact, or epistasis. In explaining the high amino-acid usage (u), we focused on the global shape of the fitness landscape, ignoring the perturbations at individual sites. AU - Breen, Michael S AU - Kemena, Carsten AU - Vlasov, Peter K AU - Notredame, Cédric AU - Fyodor Kondrashov ID - 899 IS - 7451 JF - Nature TI - Breen et al. reply VL - 497 ER - TY - JOUR AB - Background: Genetic variation at the melanocortin-1 receptor (MC1R) gene is correlated with melanin color variation in many birds. Feral pigeons (Columba livia) show two major melanin-based colorations: a red coloration due to pheomelanic pigment and a black coloration due to eumelanic pigment. Furthermore, within each color type, feral pigeons display continuous variation in the amount of melanin pigment present in the feathers, with individuals varying from pure white to a full dark melanic color. Coloration is highly heritable and it has been suggested that it is under natural or sexual selection, or both. Our objective was to investigate whether MC1R allelic variants are associated with plumage color in feral pigeons. Findings. We sequenced 888 bp of the coding sequence of MC1R among pigeons varying both in the type, eumelanin or pheomelanin, and the amount of melanin in their feathers. We detected 10 non-synonymous substitutions and 2 synonymous substitution but none of them were associated with a plumage type. It remains possible that non-synonymous substitutions that influence coloration are present in the short MC1R fragment that we did not sequence but this seems unlikely because we analyzed the entire functionally important region of the gene. Conclusions: Our results show that color differences among feral pigeons are probably not attributable to amino acid variation at the MC1R locus. Therefore, variation in regulatory regions of MC1R or variation in other genes may be responsible for the color polymorphism of feral pigeons. AU - Derelle, Romain AU - Kondrashov, Fyodor AU - Arkhipov, Vladimir AU - Corbel, Hélène AU - Frantz, Adrien AU - Gasparini, Julien AU - Jacquin, Lisa AU - Jacob, Gwenaël AU - Thibault, Sophie AU - Baudry, Emmanuelle ID - 894 IS - 1 JF - BMC Research Notes TI - Color differences among feral pigeons (Columba livia) are not attributable to sequence variation in the coding region of the melanocortin-1 receptor gene MC1R VL - 6 ER - TY - JOUR AB - A survey of avifauna was carried out in the Mys Shmidta area, north Chukotka, Russia from 8 June to 12 July 2011. A total of 90 species was recorded in the area, which together with literature data made a final list of 104 species. For several species this area is beyond the northern, north-eastern or north-western limits of their known distribution. We collected new data for 19 globally or locally threatened species. Tundra Swan Cygnus columbianus, Emperor Goose Anser canagica, American Golden Plover Pluvialis dominica, Western Sandpiper Calidris mauri, Semipalmated Sandpiper C. pusilla, Northern House Martin Delichon urbica and Barn Swallow Hirundo rustica were all confirmed to be breeding. Breeding of Brent Goose Branta bernicla nigricans, Spectacled Eider Somateria fischeri and Steller's Eider Polysticta stelleri was judged to be 'very likely'. There was no evidence for breeding of Ross's Gull Rhodostethia rosea despite several records. Two Eurasian Dotterels Eudromias morinellus were recorded displaying for the first time in the area, but the status of the species is unclear. The area is important for Snowy Owl Nyctea scandiaca, and as moulting grounds for Emperor Goose. Canada Goose Branta canadensis, Baikal Teal Anas formosa, Bar-tailed Godwit Limosa lapponica, Slaty-backed Gull Larus schistisagus, Thayer's Gull L. thayeri, Black-headed Gull L. ridibundus, White-tailed Eagle Haliaeetus albicilla, Steller's Sea Eagle H. pelagicus, Osprey Pandion haliaetus, Arctic Warbler Phylloscopus borealis and House Sparrow Passer domesticus are more likely to be rare vagrants or migrants. An observation of a Pine Siskin Carduelis pinus is the first record for Eurasia. AU - Arkhipov, Vladimir Y AU - Noah T AU - Koschkar, Steffen AU - Fyodor Kondrashov ID - 905 IS - 29 JF - Forktail TI - Birds of Mys Shmidta, north Chukotka, Russia ER - TY - JOUR AB - Whether or not evolutionary change is inherently irreversible remains a controversial topic. Some examples of evolutionary irreversibility are known; however, this question has not been comprehensively addressed at the molecular level. Here, we use data from 221 human genes with known pathogenic mutations to estimate the rate of irreversibility in protein evolution. For these genes, we reconstruct ancestral amino acid sequences along the mammalian phylogeny and identify ancestral amino acid states that match known pathogenic mutations. Such cases represent inherent evolutionary irreversibility because, at the present moment, reversals to these ancestral amino acid states are impossible for the human lineage. We estimate that approximately 10% of all amino acid substitutions along the mammalian phylogeny are irreversible, such that a return to the ancestral amino acid state would lead to a pathogenic phenotype. For a subset of 51 genes with high rates of irreversibility, as much as 40% of all amino acid evolution was estimated to be irreversible. Because pathogenic phenotypes do not resemble ancestral phenotypes, the molecular nature of the high rate of irreversibility in proteins is best explained by evolution with a high prevalence of compensatory, epistatic interactions between amino acid sites. Under such mode of protein evolution, once an amino acid substitution is fixed, the probability of its reversal declines as the protein sequence accumulates changes that affect the phenotypic manifestation of the ancestral state. The prevalence of epistasis in evolution indicates that the observed high rate of irreversibility in protein evolution is an inherent property of protein structure and function. AU - Soylemez, Onuralp AU - Fyodor Kondrashov ID - 846 IS - 12 JF - Genome Biology and Evolution TI - Estimating the rate of irreversibility in protein evolution VL - 4 ER - TY - JOUR AB - ackground: The evolution and genomic stop codon frequencies have not been rigorously studied with the exception of coding of non-canonical amino acids. Here we study the rate of evolution and frequency distribution of stop codons in bacterial genomes.Results: We show that in bacteria stop codons evolve slower than synonymous sites, suggesting the action of weak negative selection. However, the frequency of stop codons relative to genomic nucleotide content indicated that this selection regime is not straightforward. The frequency of TAA and TGA stop codons is GC-content dependent, with TAA decreasing and TGA increasing with GC-content, while TAG frequency is independent of GC-content. Applying a formal, analytical model to these data we found that the relationship between stop codon frequencies and nucleotide content cannot be explained by mutational biases or selection on nucleotide content. However, with weak nucleotide content-dependent selection on TAG, -0.5 < Nes < 1.5, the model fits all of the data and recapitulates the relationship between TAG and nucleotide content. For biologically plausible rates of mutations we show that, in bacteria, TAG stop codon is universally associated with lower fitness, with TAA being the optimal for G-content < 16% while for G-content > 16% TGA has a higher fitness than TAG.Conclusions: Our data indicate that TAG codon is universally suboptimal in the bacterial lineage, such that TAA is likely to be the preferred stop codon for low GC content while the TGA is the preferred stop codon for high GC content. The optimization of stop codon usage may therefore be useful in genome engineering or gene expression optimization applications.Reviewers: This article was reviewed by Michail Gelfand, Arcady Mushegian and Shamil Sunyaev. For the full reviews, please go to the Reviewers' Comments section. AU - Povolotskaya, Inna AU - Fyodor Kondrashov AU - Ledda, Alice AU - Vlasov, Peter K ID - 858 JF - Biology Direct TI - Stop codons in bacteria are not selectively equivalent VL - 7 ER - TY - JOUR AB - The main forces directing long-term molecular evolution remain obscure. A sizable fraction of amino-acid substitutions seem to be fixed by positive selection, but it is unclear to what degree long-term protein evolution is constrained by epistasis, that is, instances when substitutions that are accepted in one genotype are deleterious in another. Here we obtain a quantitative estimate of the prevalence of epistasis in long-term protein evolution by relating data on amino-acid usage in 14 organelle proteins and 2 nuclear-encoded proteins to their rates of short-term evolution. We studied multiple alignments of at least 1,000 orthologues for each of these 16 proteins from species from a diverse phylogenetic background and found that an average site contained approximately eight different amino acids. Thus, without epistasis an average site should accept two-fifths of all possible amino acids, and the average rate of amino-acid substitutions should therefore be about three-fifths lower than the rate of neutral evolution. However, we found that the measured rate of amino-acid substitution in recent evolution is 20 times lower than the rate of neutral evolution and an order of magnitude lower than that expected in the absence of epistasis. These data indicate that epistasis is pervasive throughout protein evolution: about 90 per cent of all amino-acid substitutions have a neutral or beneficial impact only in the genetic backgrounds in which they occur, and must therefore be deleterious in a different background of other species. Our findings show that most amino-acid substitutions have different fitness effects in different species and that epistasis provides the primary conceptual framework to describe the tempo and mode of long-term protein evolution. AU - Breen, Michael S AU - Kemena, Carsten AU - Vlasov, Peter K AU - Notredame, Cédric AU - Fyodor Kondrashov ID - 900 IS - 7421 JF - Nature TI - Epistasis as the primary factor in molecular evolution VL - 490 ER - TY - JOUR AB - A subject of extensive study in evolutionary theory has been the issue of how neutral, redundant copies can be maintained in the genome for long periods of time. Concurrently, examples of adaptive gene duplications to various environmental conditions in different species have been described. At this point, it is too early to tell whether or not a substantial fraction of gene copies have initially achieved fixation by positive selection for increased dosage. Nevertheless, enough examples have accumulated in the literature that such a possibility should be considered. Here, I review the recent examples of adaptive gene duplications and make an attempt to draw generalizations on what types of genes may be particularly prone to be selected for under certain environmental conditions. The identification of copy-number variation in ecological field studies of species adapting to stressful or novel environmental conditions may improve our understanding of gene duplications as a mechanism of adaptation and its relevance to the long-term persistence of gene duplications. AU - Fyodor Kondrashov ID - 887 IS - 1749 JF - Proceedings of the Royal Society of London Series B Biological Sciences TI - Gene duplication as a mechanism of genomic adaptation to a changing environment VL - 279 ER - TY - JOUR AB - Recent discovery of the Large-billed Reed Warbler (Acrocephalus orinus) in museums and in the wild significantly expanded our knowledge of its morphological traits and genetic variability, and revealed new data on geographical distribution of the breeding grounds, migration routes and wintering locations of this species. It is now certain that A. orinus is breeding in Central Asia; however, the precise area of distribution remains unclear. The difficulty in the further study of this species lies in the small number of known specimens, with only 13 currently available in museums, and in the relative uncertainty of the breeding area and habitat of this species. Following morphological and genetic analyses from Svensson, et al, we describe 14 new A. orinus specimens from collections of Zoological Museums of the former USSR from the territory of Central Asian states. All of these specimens were erroneously labeled as Blyth's Reed Warbler (A. dumetorum), which is thought to be a breeding species in these areas. The 14 new A. orinus specimens were collected during breeding season while most of the 85 A. dumetorum specimens from the same area were collected during the migration period. Our data indicate that the Central Asian territory previously attributed as breeding grounds of A. dumetorum is likely to constitute the breeding territory of A. orinus. This rare case of a re-description of the breeding territory of a lost species emphasizes the importance of maintenance of museum collections around the world. If the present data on the breeding grounds of A. orinus are confirmed with field observations and collections, the literature on the biology of A. dumetorum from the southern part of its range may have to be reconsidered. AU - Koblik, Evgeniy A AU - Red'Kin, Yaroslav A AU - Meer, Margarita S AU - Derelle, Romain AU - Golenkina, Sofia A AU - Fyodor Kondrashov AU - Arkhipov, Vladimir Y ID - 890 IS - 4 JF - PLoS One TI - Acrocephalus orinus: A case of Mistaken identity VL - 6 ER - TY - CHAP AU - Fyodor Kondrashov ID - 881 T2 - Evolution after Gene Duplication TI - Gene Dosage and Duplication ER - TY - JOUR AB - Gene duplications and their subsequent divergence play an important part in the evolution of novel gene functions. Several models for the emergence, maintenance and evolution of gene copies have been proposed. However, a clear consensus on how gene duplications are fixed and maintained in genomes is lacking. Here, we present a comprehensive classification of the models that are relevant to all stages of the evolution of gene duplications. Each model predicts a unique combination of evolutionary dynamics and functional properties. Setting out these predictions is an important step towards identifying the main mechanisms that are involved in the evolution of gene duplications. AU - Innan, Hideki AU - Fyodor Kondrashov ID - 891 IS - 2 JF - Nature Reviews Genetics TI - The evolution of gene duplications: Classifying and distinguishing between models VL - 11 ER - TY - JOUR AB - Background: Surveying deleterious variation in human populations is crucial for our understanding, diagnosis and potential treatment of human genetic pathologies. A number of recent genome-wide analyses focused on the prevalence of segregating deleterious alleles in the nuclear genome. However, such studies have not been conducted for the mitochondrial genome.Results: We present a systematic survey of polymorphisms in the human mitochondrial genome, including those predicted to be deleterious and those that correspond to known pathogenic mutations. Analyzing 4458 completely sequenced mitochondrial genomes we characterize the genetic diversity of different types of single nucleotide polymorphisms (SNPs) in African (L haplotypes) and non-African (M and N haplotypes) populations. We find that the overall level of polymorphism is higher in the mitochondrial compared to the nuclear genome, although the mitochondrial genome appears to be under stronger selection as indicated by proportionally fewer nonsynonymous than synonymous substitutions. The African mitochondrial genomes show higher heterozygosity, a greater number of polymorphic sites and higher frequencies of polymorphisms for synonymous, benign and damaging polymorphism than non-African genomes. However, African genomes carry significantly fewer SNPs that have been previously characterized as pathogenic compared to non-African genomes.Conclusions: Finding SNPs classified as pathogenic to be the only category of polymorphisms that are more abundant in non-African genomes is best explained by a systematic ascertainment bias that favours the discovery of pathogenic polymorphisms segregating in non-African populations. This further suggests that, contrary to the common disease-common variant hypothesis, pathogenic mutations are largely population-specific and different SNPs may be associated with the same disease in different populations. Therefore, to obtain a comprehensive picture of the deleterious variability in the human population, as well as to improve the diagnostics of individuals carrying African mitochondrial haplotypes, it is necessary to survey different populations independently.Reviewers: This article was reviewed by Dr Mikhail Gelfand, Dr Vasily Ramensky (nominated by Dr Eugene Koonin) and Dr David Rand (nominated by Dr Laurence Hurst). AU - Breen, Michael S AU - Fyodor Kondrashov ID - 901 JF - Biology Direct TI - Mitochondrial pathogenic mutations are population-specific VL - 5 ER - TY - JOUR AB - The need to maintain the structural and functional integrity of an evolving protein severely restricts the repertoire of acceptable amino-acid substitutions. However, it is not known whether these restrictions impose a global limit on how far homologous protein sequences can diverge from each other. Here we explore the limits of protein evolution using sequence divergence data. We formulate a computational approach to study the rate of divergence of distant protein sequences and measure this rate for ancient proteins, those that were present in the last universal common ancestor. We show that ancient proteins are still diverging from each other, indicating an ongoing expansion of the protein sequence universe. The slow rate of this divergence is imposed by the sparseness of functional protein sequences in sequence space and the ruggedness of the protein fitness landscape: 98 per cent of sites cannot accept an amino-acid substitution at any given moment but a vast majority of all sites may eventually be permitted to evolve when other, compensatory, changes occur. Thus, 3.5 × 10 9 yr has not been enough to reach the limit of divergent evolution of proteins, and for most proteins the limit of sequence similarity imposed by common function may not exceed that of random sequences. AU - Povolotskaya, Inna AU - Fyodor Kondrashov ID - 857 IS - 7300 JF - Nature TI - Sequence space and the ongoing expansion of the protein universe VL - 465 ER - TY - JOUR AB - A long-standing controversy in evolutionary biology is whether or not evolving lineages can cross valleys on the fitness landscape that correspond to low-fitness genotypes, which can eventually enable them to reach isolated fitness peaks1-9. Here we study the fitness landscapes traversed by switches between different AU and GC Watson-Crick nucleotide pairs at complementary sites of mitochondrial transfer RNA stem regions in 83 mammalian species. We find that such Watson-Crick switches occur 30-40 times more slowly than pairs of neutral substitutions, and that alleles corresponding to GU and AC non-Watson-Crick intermediate states segregate within human populations at low frequencies, similar to those of non-synonymous alleles. Substitutions leading to a Watson-Crick switch are strongly correlated, especially in mitochondrial tRNAs encoded on the GT-nucleotide-rich strand of the mitochondrial genome. Using these data we estimate that a typical Watson-Crick switch involves crossing a fitness valley of a depth of about 10-3 or even about 10-2, with AC intermediates being slightly more deleterious than GU intermediates. This compensatory evolution must proceed through rare intermediate variants that never reach fixation. The ubiquitous nature of compensatory evolution in mammalian mitochondrial tRNAs and other molecules implies that simultaneous fixation of two alleles that are individually deleterious may be a common phenomenon at the molecular level. AU - Meer, Margarita V AU - Kondrashov, Alexey S AU - Artzy-Randrup, Yael AU - Fyodor Kondrashov ID - 862 IS - 7286 JF - Nature TI - Compensatory evolution in mitochondrial tRNAs navigates valleys of low fitness VL - 464 ER - TY - JOUR AB - The rate of spontaneous mutation in natural populations is a fundamental parameter for many evolutionary phenomena. Because the rate of mutation is generally low, most of what is currently known about mutation has been obtained through indirect, complex and imprecise methodological approaches. However, in the past few years genome-wide sequencing of closely related individuals has made it possible to estimate the rates of mutation directly at the level of the DNA, avoiding most of the problems associated with using indirect methods. Here, we review the methods used in the past with an emphasis on next generation sequencing, which may soon make the accurate measurement of spontaneous mutation rates a matter of routine. AU - Fyodor Kondrashov AU - Kondrashov, Alexey S ID - 872 IS - 1544 JF - Philosophical Transactions of the Royal Society of London. Series B, Biological Sciences TI - Measurements of spontaneous rates of mutations in the recent past and the near future VL - 365 ER - TY - JOUR AB - Background: Divergence of two independently evolving sequences that originated from a common ancestor can be described by two parameters, the asymptotic level of divergence E and the rate r at which this level of divergence is approached. Constant negative selection impedes allele replacements and, therefore, is routinely assumed to decelerate sequence divergence. However, its impact on E and on r has not been formally investigated.Results: Strong selection that favors only one allele can make E arbitrarily small and r arbitrarily large. In contrast, in the case of 4 possible alleles and equal mutation rates, the lowest value of r, attained when two alleles confer equal fitnesses and the other two are strongly deleterious, is only two times lower than its value under selective neutrality.Conclusions: Constant selection can strongly constrain the level of sequence divergence, but cannot reduce substantially the rate at which this level is approached. In particular, under any constant selection the divergence of sequences that accumulated one substitution per neutral site since their origin from the common ancestor must already constitute at least one half of the asymptotic divergence at sites under such selection.Reviewers: This article was reviewed by Drs. Nicolas Galtier, Sergei Maslov, and Nick Grishin. AU - Kondrashov, Alexey S AU - Povolotskaya, Inna AU - Ivankov, Dmitry N AU - Fyodor Kondrashov ID - 884 JF - Biology Direct TI - Rate of sequence divergence under constant selection VL - 5 ER - TY - JOUR AB - Although some data link archaeal and eukaryotic translation, the overall mechanism of protein synthesis in archaea remains largely obscure. Both archaeal (aRF1) and eukaryotic (eRF1) single release factors recognize all three stop codons. The archaeal genus Methanosarcinaceae contains two aRF1 homologs, and also uses the UAG stop to encode the 22nd amino acid, pyrrolysine. Here we provide an analysis of the last stage of archaeal translation in pyrrolysine-utilizing species. We demonstrated that only one of two Methanosarcina barkeri aRF1 homologs possesses activity and recognizes all three stop codons. The second aRF1 homolog may have another unknown function. The mechanism of pyrrolysine incorporation in the Methanosarcinaceae is discussed. AU - Alkalaeva, Elena Z AU - Eliseev, Boris D AU - Ambrogelly, Alexandre AU - Vlasov, Peter K AU - Fyodor Kondrashov AU - Gundllapalli, Sarath B AU - Frolova, Ludmila Y AU - Söll, Dieter G AU - Kisselev, Lev L ID - 908 IS - 21 JF - FEBS Letters TI - Translation termination in pyrrolysine-utilizing archaea VL - 583 ER - TY - JOUR AB - Mutation rate varies greatly between nucleotide sites of the human genome and depends both on the global genomic location and the local sequence context of a site. In particular, CpG context elevates the mutation rate by an order of magnitude. Mutations also vary widely in their effect on the molecular function, phenotype, and fitness. Independence of the probability of occurrence of a new mutation's effect has been a fundamental premise in genetics. However, highly mutable contexts may be preserved by negative selection at important sites but destroyed by mutation at sites under no selection. Thus, there may be a positive correlation between the rate of mutations at a nucleotide site and the magnitude of their effect on fitness. We studied the impact of CpG context on the rate of human-chimpanzee divergence and on intrahuman nucleotide diversity at non-synonymous coding sites. We compared nucleotides that occupy identical positions within codons of identical amino acids and only differ by being within versus outside CpG context. Nucleotides within CpG context are under a stronger negative selection, as revealed by their lower, proportionally to the mutation rate, rate of evolution and nucleotide diversity. In particular, the probability of fixation of a non-synonymous transition at a CpG site is two times lower than at a CpG site. Thus, sites with different mutation rates are not necessarily selectively equivalent. This suggests that the mutation rate may complement sequence conservation as a characteristic predictive of functional importance of nucleotide sites. AU - Schmidt, Steffen AU - Gerasimova, Anna AU - Fyodor Kondrashov AU - Adzuhbei, Ivan A AU - Kondrashov, Alexey S AU - Sunyaev, Shamil R ID - 844 IS - 11 JF - PLoS Genetics TI - Hypermutable non-synonymous sites are under stronger negative selection VL - 4 ER - TY - JOUR AB - Background. The arginine vasopressin V1a receptor (V1aR) modulates social cognition and behavior in a wide variety of species. Variation in a repetitive microsatellite element in the 5′ flanking region of the V1aR gene (AVPR1A) in rodents has been associated with variation in brain V1aR expression and in social behavior. In humans, the 5′ flanking region of AVPR1A contains a tandem duplication of two ∼350 bp, microsatellite-containing elements located approximately 3.5 kb upstream of the transcription start site. The first block, referred to as DupA, contains a polymorphic (GT) 25microsatellite; the second block, DupB, has a complex (CT) 4-(TT)-(CT)8-(GT)24polymorphic motif, known as RS3. Polymorphisms in RS3 have been associated with variation in sociobehavioral traits in humans, including autism spectrum disorders. Thus, evolution of these regions may have contributed to variation in social behavior in primates. We examined the structure of these regions in six ape, six monkey, and one prosimian species. Results. Both tandem repeat blocks are present upstream of the AVPR1A coding region in five of the ape species we investigated, while monkeys have only one copy of this region. As in humans, the microsatellites within DupA and DupB are polymorphic in many primate species. Furthermore, both single (lacking DupB) and duplicated alleles (containing both DupA and DupB) are present in chimpanzee (Pan troglodytes) populations with allele frequencies of 0.795 and 0.205 for the single and duplicated alleles, respectively, based on the analysis of 47 wild-caught individuals. Finally, a phylogenetic reconstruction suggests two alternate evolutionary histories for this locus. Conclusion. There is no obvious relationship between the presence of the RS3 duplication and social organization in primates. However, polymorphisms identified in some species may be useful in future genetic association studies. In particular, the presence of both single and duplicated alleles in chimpanzees provides a unique opportunity to assess the functional role of this duplication in contributing to variation in social behavior in primates. While our initial studies show no signs of directional selection on this locus in chimps, pharmacological and genetic association studies support a potential role for this region in influencing V1aR expression and social behavior. AU - Donaldson, Zoe R AU - Fyodor Kondrashov AU - Putnam, Andrea S AU - Bai, Yaohui AU - Stoinski, Tara S AU - Hammock, Elizabeth A AU - Young, Larry ID - 895 IS - 1 JF - BMC Evolutionary Biology TI - Evolution of a behavior-linked microsatellite-containing element in the 5′ flanking region of the primate AVPR1A gene VL - 8 ER - TY - JOUR AB - The most common form of protein-coding gene overlap in eukaryotes is a simple nested structure, whereby one gene is embedded in an intron of another. Analysis of nested protein-coding genes in vertebrates, fruit flies and nematodes revealed substantially higher rates of evolutionary gains than losses. The accumulation of nested gene structures could not be attributed to any obvious functional relationships between the genes involved and represents an increase of the organizational complexity of animal genomes via a neutral process. AU - Assis, Raquel AU - Kondrashov, Alexey S AU - Koonin, Eugene V AU - Fyodor Kondrashov ID - 907 IS - 10 JF - Trends in Genetics TI - Nested genes and increasing organizational complexity of metazoan genomes VL - 24 ER - TY - JOUR AB - We identified a mutation in the CRYGD gene (P23S) of the γ-crystallin gene cluster that is associated with a polymorphic congenital cataract that occurs with frequency of ∼0.3% in a human population. To gain insight into the molecular mechanism of the pathogenesis of γ-crystallin isoforms, we undertook an evolutionary analysis of the available mammalian and newly obtained primate sequences of the γ-crystallin genes. The cataract-associated serine at site 23 corresponds to the ancestral state, since it was found in CRYGD of a lower primate and all the surveyed nonprimate mammals. Crystallin proteins include two structurally similar domains, and substitutions in mammalian CRYGD protein at site 23 of the first domain were always associated with substitutions in the structurally reciprocal sites 109 and 136 of the second domain. These data suggest that the cataractogenic effect of serine at site 23 in the N-terminal domain of CRYGD may be compensated indirectly by amino acid changes in a distal domain. We also found that gene conversion was a factor in the evolution of the γ-crystallin gene cluster throughout different mammalian clades. The high rate of gene conversion observed between the functional CRYGD gene and two primate γ-crystallin pseudogenes (CRYGEP1 and CRYGFP1) coupled with a surprising finding of apparent negative selection in primate pseudogenes suggest a deleterious impact of recently derived pseudogenes involved in gene conversion in the γ-crystallin gene cluster. AU - Plotnikova, Olga V AU - Fyodor Kondrashov AU - Vlasov, Peter K AU - Grigorenko, Anastasia P AU - Ginter, Evgeny K AU - Rogaev, Evgeny I ID - 860 IS - 1 JF - American Journal of Human Genetics TI - Conversion and compensatory evolution of the γ-crystallin genes and identification of a cataractogenic mutation that reverses the sequence of the human CRYGD gene to an ancestral state VL - 81 ER - TY - JOUR AB - Having an extra copy of a gene is thought to provide some functional redundancy, which results in a higher rate of evolution in duplicated genes. In this article, we estimate the impact of gene duplication on the selection of tuf paralogs, and we find that in the absence of gene conversion, tuf paralogs have evolved significantly slower than when gene conversion has been a factor in their evolution. Thus, tuf gene copies evolve under a selective pressure that ensures their functional uniformity, and gene conversion reduces selection against amino acid substitutions that affect the function of the encoded protein, EF-Tu. AU - Fyodor Kondrashov AU - Gurbich, Tatiana A AU - Vlasov, Peter K ID - 879 IS - 5 JF - Trends in Genetics TI - Selection for functional uniformity of tuf duplicates in γ-proteobacteria VL - 23 ER - TY - JOUR AB - Background: Independently evolving lineages mostly accumulate different changes, which leads to their gradual divergence. However, parallel accumulation of identical changes is also common, especially in traits with only a small number of possible states. Results: We characterize parallelism in evolution of coding sequences in three four-species sets of genomes of mammals, Drosophila, and yeasts. Each such set contains two independent evolutionary paths, which we call paths I and II. An amino acid replacement which occurred along path I also occurs along path II with the probability 50-8211;80% of that expected under selective neutrality. Thus, the per site rate of parallel evolution of proteins is several times higher than their average rate of evolution, but still lower than the rate of evolution of neutral sequences. This deficit may be caused by changes in the fitness landscape, leading to a replacement being possible along path I but not along path II. However, constant, weak selection assumed by the nearly neutral model of evolution appears to be a more likely explanation. Then, the average coefficient of selection associated with an amino acid replacement, in the units of the effective population size, must exceed ∼0.4, and the fraction of effectively neutral replacements must be below ∼30%. At a majority of evolvable amino acid sites, only a relatively small number of different amino acids is permitted. Conclusion: High, but below-neutral, rates of parallel amino acid replacements suggest that a majority of amino acid replacements that occur in evolution are subject to weak, but non-trivial, selection, as predicted by Ohta's nearly-neutral theory. AU - Bazykin, Georgii A AU - Fyodor Kondrashov AU - Brudno, Michael AU - Poliakov, Alexander V AU - Dubchak, Inna L AU - Kondrashov, Alexey S ID - 904 JF - Biology Direct TI - Extensive parallelism in protein evolution VL - 2 ER - TY - JOUR AB - Background: Mitochondrial tRNAs have been the subject of study for structural biologists interested in their secondary structure characteristics, evolutionary biologists have researched patterns of compensatory and structural evolution and medical studies have been directed towards understanding the basis of human disease. However, an up to date, manually curated database of mitochondrially encoded tRNAs from higher animals is currently not available. Description: We obtained the complete mitochondrial sequence for 277 tetrapod species from GenBank and re-annotated all of the tRNAs based on a multiple alignment of each tRNA gene and secondary structure prediction made independently for each tRNA. The mitochondrial (mt) tRNA sequences and the secondary structure based multiple alignments are freely available as Supplemental Information online. Conclusion: We compiled a manually curated database of mitochondrially encoded tRNAs from tetrapods with completely sequenced genomes. In the course of our work, we reannotated more than 10% of all tetrapod mt-tRNAs and subsequently predicted the secondary structures of 6060 mitochondrial tRNAs. This carefully constructed database can be utilized to enhance our knowledge in several different fields including the evolution of mt-tRNA secondary structure and prediction of pathogenic mt-tRNA mutations. In addition, researchers reporting novel mitochondrial genome sequences should check their tRNA gene annotations against our database to ensure a higher level of fidelity of their annotation. AU - Popadin, Konstantin Yu AU - Mamirova, Leila A AU - Fyodor Kondrashov ID - 861 JF - BMC Bioinformatics TI - A manually curated database of tetrapod mitochondrially encoded tRNA sequences and secondary structures VL - 8 ER - TY - JOUR AB - Phylogenetic relationships between the extinct woolly mammoth (Mammuthus primigenius), and the Asian (Elephas maximus) and African savanna (Loxodonta africana) elephants remain unresolved. Here, we report the sequence of the complete mitochondrial genome (16,842 base pairs) of a woolly mammoth extracted from permafrost-preserved remains from the Pleistocene epoch - the oldest mitochondrial genome sequence determined to date. We demonstrate that well-preserved mitochondrial genome fragments, as long as ∼1,600-1700 base pairs, can be retrieved from pre-Holocene remains of an extinct species. Phylogenetic reconstruction of the Elephantinae clade suggests that M. primigenius and E. maximus are sister species that diverged soon after their common ancestor split from the L. africana lineage. Low nucleotide diversity found between independently determined mitochondrial genomic sequences of woolly mammoths separated geographically and in time suggests that north-eastern Siberia was occupied by a relatively homogeneous population of M. primigenius throughout the late Pleistocene. AU - Rogaev, Evgeny I AU - Moliaka, Yuri K AU - Malyarchuk, Boris A AU - Fyodor Kondrashov AU - Derenko, Miroslava V AU - Chumakov, Ilya M AU - Grigorenko, Anastasia P ID - 854 IS - 3 JF - PLoS Biology TI - Complete mitochondrial genome and phylogeny of pleistocene mammoth Mammuthus primigenius VL - 4 ER - TY - JOUR AB - Background: The glyoxylate cycle is thought to be present in bacteria, protists, plants, fungi, and nematodes, but not in other Metazoa. However, activity of the glyoxylate cycle enzymes, malate synthase (MS) and isocitrate lyase (ICL), in animal tissues has been reported. In order to clarify the status of the MS and ICL genes in animals and get an insight into their evolution, we undertook a comparative-genomic study. Results: Using sequence similarity searches, we identified MS genes in arthropods, echinoderms, and vertebrates, including platypus and opossum, but not in the numerous sequenced genomes of placental mammals. The regions of the placental mammals' genomes expected to code for malate synthase, as determined by comparison of the gene orders in vertebrate genomes, show clear similarity to the opossum MS sequence but contain stop codons, indicating that the MS gene became a pseudogene in placental mammals. By contrast, the ICL gene is undetectable in animals other than the nematodes that possess a bifunctional, fused ICL-MS gene. Examination of phylogenetic trees of MS and ICL suggests multiple horizontal gene transfer events that probably went in both directions between several bacterial and eukaryotic lineages. The strongest evidence was obtained for the acquisition of the bifunctional ICL-MS gene from an as yet unknown bacterial source with the corresponding operonic organization by the common ancestor of the nematodes. Conclusion: The distribution of the MS and ICL genes in animals suggests that either they encode alternative enzymes of the glyoxylate cycle that are not orthologous to the known MS and ICL or the animal MS acquired a new function that remains to be characterized. Regardless of the ultimate solution to this conundrum, the genes for the glyoxylate cycle enzymes present a remarkable variety of evolutionary events including unusual horizontal gene transfer from bacteria to animals. AU - Fyodor Kondrashov AU - Koonin, Eugene V AU - Morgunov, Igor G AU - Finogenova, Tatiana V AU - Kondrashova, Marie N ID - 868 JF - Biology Direct TI - Evolution of glyoxylate cycle enzymes in Metazoa Evidence of multiple horizontal transfer events and pseudogene formation VL - 1 ER - TY - JOUR AB - New genes commonly appear through complete or partial duplications of pre-existing genes. Duplications of long DNA segments are constantly produced by rare mutations, may become fixed in a population by selection or random drift, and are subject to divergent evolution of the paralogous sequences after fixation, although gene conversion can impede this process. New data shed some light on each of these processes. Mutations which involve duplications can occur through at least two different mechanisms, backward strand slippage during DNA replication and unequal crossing-over. The background rate of duplication of a complete gene in humans is 10-9-10-10 per generation, although many genes located within hot-spots of large-scale mutation are duplicated much more often. Many gene duplications affect fitness strongly, and are responsible, through gene dosage effects, for a number of genetic diseases. However, high levels of intrapopulation polymorphism caused by presence or absence of long, gene-containing DNA segments imply that some duplications are not under strong selection. The polymorphism to fixation ratios appear to be approximately the same for gene duplications and for presumably selectively neutral nucleotide substitutions, which, according to the McDonald-Kreitman test, is consistent with selective neutrality of duplications. However, this pattern can also be due to negative selection against most of segregating duplications and positive selection for at least some duplications which become fixed. Patterns in post-fixation evolution of duplicated genes do not easily reveal the causes of fixations. Many gene duplications which became fixed recently in a variety of organisms were positively selected because the increased expression of the corresponding genes was beneficial. The effects of gene dosage provide a unified framework for studying all phases of the life history of a gene duplication. Application of well-known methods of evolutionary genetics to accumulating data on new, polymorphic, and fixed duplication will enhance our understanding of the role of natural selection in the evolution by gene duplication. AU - Fyodor Kondrashov AU - Kondrashov, Alexey S ID - 873 IS - 2 JF - Journal of Theoretical Biology TI - Role of selection in fixation of gene duplications VL - 239 ER - TY - JOUR AB - The impact of synonymous nucleotide substitutions on fitness in mammals remains controversial. Despite some indications of selective constraint, synonymous sites are often assumed to be neutral, and the rate of their evolution is used as a proxy for mutation rate. We subdivide all sites into four classes in terms of the mutable CpG context, nonCpG, postC, preG, and postCpreG, and compare four-fold synonymous sites and intron sites residing outside transposable elements. The distribution of the rate of evolution across all synonymous sites is trimodal. Rate of evolution at nonCpG synonymous sites, not preceded by C and not followed by G, is ∼10% below that at such intron sites. In contrast, rate of evolution at postCpreG synonymous sites is ∼30% above that at such intron sites. Finally, synonymous and intron postC and preG sites evolve at similar rates. The relationship between the levels of polymorphism at the corresponding synonymous and intron sites is very similar to that between their rates of evolution. Within every class, synonymous sites are occupied by G or C much more often than intron sites, whose nucleotide composition is consistent with neutral mutation-drift equilibrium. These patterns suggest that synonymous sites are under weak selection in favor of G and C, with the average coefficient s∼0.25/Ne∼10-5, where Ne is the effective population size. Such selection decelerates evolution and reduces variability at sites with symmetric mutation, but has the opposite effects at sites where the favored nucleotides are more mutable. The amino-acid composition of proteins dictates that many synonymous sites are CpGprone, which causes them, on average, to evolve faster and to be more polymorphic than intron sites. An average genotype carries ∼107 suboptimal nucleotides at synonymous sites, implying synergistic epistasis in selection against them. AU - Fyodor Kondrashov AU - Ogurtsov, Aleksey Yu AU - Kondrashov, Alexey S ID - 869 IS - 4 JF - Journal of Theoretical Biology TI - Selection in favor of nucleotides G and C diversifies evolution rates and levels of polymorphism at mammalian synonymous sites VL - 240 ER - TY - JOUR AB - Background: Carcinogenesis typically involves multiple somatic mutations in caretaker (DNA repair) and gatekeeper (tumor suppressors and oncogenes) genes. Analysis of mutation spectra of the tumor suppressor that is most commonly mutated in human cancers, p53, unexpectedly suggested that somatic evolution of the p53 gene during tumorigenesis is dominated by positive selection for gain of function. This conclusion is supported by accumulating experimental evidence of evolution of new functions of p53 in tumors. These findings prompted a genome-wide analysis of possible positive selection during tumor evolution. Methods: A comprehensive analysis of probable somatic mutations in the sequences of Expressed Sequence Tags (ESTs) from malignant tumors and normal tissues was performed in order to access the prevalence of positive selection in cancer evolution. For each EST, the numbers of synonymous and non-synonymous substitutions were calculated. In order to identify genes with a signature of positive selection in cancers, these numbers were compared to: i) expected numbers and ii) the numbers for the respective genes in the ESTs from normal tissues. Results: We identified 112 genes with a signature of positive selection in cancers, i.e., a significantly elevated ratio of non-synonymous to synonymous substitutions, in tumors as compared to 37 such genes in an approximately equal-sized EST collection from normal tissues. A substantial fraction of the tumor-specific positive-selection candidates have experimentally demonstrated or strongly predicted links to cancer. Conclusion: The results of EST analysis should be interpreted with extreme caution given the noise introduced by sequencing errors and undetected polymorphisms. Furthermore, an inherent limitation of EST analysis is that multiple mutations amenable to statistical analysis can be detected only in relatively highly expressed genes. Nevertheless, the present results suggest that positive selection might affect a substantial number of genes during tumorigenic somatic evolution. AU - Babenko, Vladimir N AU - Basu, Malay K AU - Fyodor Kondrashov AU - Rogozin, Igor B AU - Koonin, Eugene V ID - 903 JF - BMC Cancer TI - Signs of positive selection of somatic mutations in human cancers detected by EST sequence analysis VL - 6 ER - TY - JOUR AB - The impact of an amino acid replacement on the organism's fitness can vary from lethal to selectively neutral and even, in rare cases, beneficial. Substantial data are available on either pathogenic or acceptable replacements. However, the whole distribution of coefficients of selection against individual replacements is not known for any organism. To ascertain this distribution for human proteins, we combined data on pathogenic missense mutations, on human non-synonymous SNPs and on human-chimpanzee divergence of orthologous proteins. Fractions of amino acid replacements which reduce fitness by >10-2, 10-2-10-4, 10-4-10-5 and <10-5 are 25, 49, 14 and 12%, respectively. On average, the strength of selection against a replacement is substantially higher when chemically dissimilar amino acids are involved, and the Grantham's index of a replacement explains 35% of variance in the average logarithm of selection coefficients associated with different replacements. Still, the impact of a replacement depends on its context within the protein more than on its own nature. Reciprocal replacements are often associated with rather different selection coefficients, in particular, replacements of non-polar amino acids with polar ones are typically much more deleterious than replacements in the opposite direction. However, differences between evolutionary fluxes of reciprocal replacements are only weakly correlated with the differences between the corresponding selection coefficients. AU - Yampolsky, Lev Y AU - Fyodor Kondrashov AU - Kondrashov, Alexey S ID - 843 IS - 21 JF - Human Molecular Genetics TI - Distribution of the strength of selection against amino acid replacements in human proteins VL - 14 ER - TY - JOUR AB - Sequence analysis of protein and mitochondrially encoded tRNA genes shows that substitutions producing pathogenic effects in humans are often found in normal, healthy individuals from other species. Analysis of stability of protein and tRNA structures shows that the disease-causing effects of pathogenic mutations can be neutralized by other, compensatory substitutions that restore the structural stability of the molecule. Further study of such substitutions will, hopefully, lead to new methods for curing genetic dis- eases that may be based on the correction of molecule stability as a whole instead of reversing an individual pathogenic mutation. AU - Kondrashov, Fyodor ID - 877 IS - 3 JF - Biofizika TI - The analysis of monomer sequences in protein and tRNA and the manifestation of the compensation of pathogenic deviations in their evolution VL - 50 ER - TY - JOUR AB - Negative trade-offs are thought to be a pervasive phenomenon and to inhibit evolution at all levels. New evidence shows that at the molecular level, there may be no trade-offs preventing the emergence of an enzyme with multiple functions. AU - Fyodor Kondrashov ID - 878 IS - 1 JF - Nature Genetics TI - In search of the limits of evolution VL - 37 ER - TY - JOUR AB - Some mutations in human mitochondrial tRNAs are severely pathogenic. The available computational methods have a poor record of predicting the impact of a tRNA mutation on the phenotype and fitness. Here patterns of evolution at tRNA sites that harbor pathogenic mutations and at sites that harbor phenotypically cryptic polymorphisms were compared. Mutations that are pathogenic to humans occupy more conservative sites, are only rarely fixed in closely related species, and, when located in stem structures, often disrupt Watson-Crick pairing and display signs of compensatory evolution. These observations make it possible to classify ∼90% of all known pathogenic mutations as deleterious together with only ∼30% of polymorphisms. These polymorphisms segregate at frequencies that are more than two times lower than frequencies of polymorphisms classified as benign, indicating that at least ∼30% of known polymorphisms in mitochondrial tRNAs affect fitness negatively. AU - Fyodor Kondrashov ID - 882 IS - 16 JF - Human Molecular Genetics TI - Prediction of pathogenic mutations in mitochondrially encoded human tRNAs VL - 14 ER - TY - JOUR AB - Here, I describe a case of loss of the D-arm by mitochondrial cysteine tRNA in the nine-banded armadillo (Dasypus novemcinctus) convergent with mt tRNASer(AGY). Such evolution sheds light on the relationship between structure and function of tRNA molecules and its impact on the patterns of molecular evolution. AU - Kondrashov, Fyodor ID - 880 IS - 3 JF - Biofizika TI - The convergent evolution of the secondary structure of mitochondrial cysteine tRNA in the nine-banded armadillo Dasypus novemcinctus VL - 50 ER - TY - JOUR AB - Amino acid composition of proteins varies substantially between taxa and, thus, can evolve. For example, proteins from organisms with (G+C)-rich (or (A+T)-rich) genomes contain more (or fewer) amino acids encoded by (G+C)-rich codons. However, no universal trends in ongoing changes of amino acid frequencies have been reported. We compared sets of orthologous proteins encoded by triplets of closely related genomes from 15 taxa representing all three domains of life (Bacteria, Archaea and Eukaryota), and used phylogenies to polarize amino acid substitutions. Cys, Met, His, Ser and Phe accrue in at least 14 taxa, whereas Pro, Ala, Glu and Gly are consistently lost. The same nine amino acids are currently accrued or lost in human proteins, as shown by analysis of non-synonymous single-nucleotide polymorphisms. All amino acids with declining frequencies are thought to be among the first incorporated into the genetic code; conversely, all amino acids with increasing frequencies, except Ser, were probably recruited late. Thus, expansion of initially under-represented amino acids, which began over 3,400 million years ago, apparently continues to this day. AU - Jordan, Ingo K AU - Fyodor Kondrashov AU - Adzhubeǐ, Ivan A AU - Wolf, Yuri I AU - Koonin, Eugene V AU - Kondrashov, Alexey S AU - Sunyaev, Shamil R ID - 893 IS - 7026 JF - Nature TI - A universal trend of amino acid gain and loss in protein evolution VL - 433 ER - TY - JOUR AB - We present a method for prediction of functional sites in a set of aligned protein sequences. The method selects sites which are both well conserved and clustered together in space, as inferred from the 3D structures of proteins included in the alignment. We tested the method using 86 alignments from the NCBI CDD database, where the sites of experimentally determined ligand and/or macromolecular interactions are annotated. In agreement with earlier investigations, we found that functional site predictions are most successful when overall background sequence conservation is low, such that sites under evolutionary constraint become apparent. In addition, we found that averaging of conservation values across spatially clustered sites improves predictions under certain conditions: that is, when overall conservation is relatively high and when the site in question involves a large macromolecular binding interface. Under these conditions it is better to look for clusters of conserved sites than to look for particular conserved sites. AU - Panchenko, Anna R AU - Fyodor Kondrashov AU - Bryant, Stephen H ID - 864 IS - 4 JF - Protein Science TI - Prediction of functional sites by analysis of sequence and structure conservation VL - 13 ER - TY - JOUR AB - Only a fraction of eukaryotic genes affect the phenotype drastically. We compared 18 parameters in 1273 human morbid genes, known to cause diseases, and in the remaining 16 580 unambiguous human genes. Morbid genes evolve more slowly, have wider phylogenetic distributions, are more similar to essential genes of Drosophila melanogaster, code for longer proteins containing more alanine and glycine and less histidine, lysine and methionine, possess larger numbers of longer introns with more accurate splicing signals and have higher and broader expressions. These differences make it possible to classify as non-morbid 34% of human genes with unknown morbidity, when only 5% of known morbid genes are incorrectly classified as non-morbid. This classification can help to identify disease-causing genes among multiple candidates. AU - Fyodor Kondrashov AU - Ogurtsov, Aleksey Yu AU - Kondrashov, Alexey S ID - 870 IS - 5 JF - Nucleic Acids Research TI - Bioinformatical assay of human gene morbidity VL - 32 ER - TY - JOUR AB - The dominance of wild-type alleles and the concomitant recessivity of deleterious mutant alleles might have evolved by natural selection or could be a by-product of the molecular and physiological mechanisms of gene action. We compared the properties of human haplosufficient genes, whose wild-type alleles are dominant over loss-of-function alleles, with haploinsufficient (recessive wild-type) genes, which produce an abnormal phenotype when heterozygous for a loss-of-function allele. The fraction of haplosufficient genes is the highest among the genes that encode enzymes, which is best compatible with the physiological theory. Haploinsufficient genes, on average, have more paralogs than haplosufficient genes, supporting the idea that gene dosage could be important for the initial fixation of duplications. Thus, haplo(in)sufficiency of a gene and its propensity for duplication might have a common evolutionary basis. AU - Fyodor Kondrashov AU - Koonin, Eugene V ID - 875 IS - 7 JF - Trends in Genetics TI - A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications VL - 20 ER - TY - JOUR AB - The function of protein and RNA molecules depends on complex epistatic interactions between sites. Therefore, the deleterious effect of a mutation can be suppressed by a compensatory second-site substitution. In relating a list of 86 pathogenic mutations in human IRNAs encoded by mitochondrial genes to the sequences of their mammalian orthologs, we noted that 52 pathogenic mutations were present in normal tRNAs of one or several nonhuman mammals. We found at least five mechanisms of compensation for 32 pathogenic mutations that destroyed a Watson-Crick pair in one of the four tRNA stems: restoration of the affected Watson-Crick interaction (25 cases), strengthening of another pair (4 cases), creation of a new pair (8 cases), changes of multiple interactions in the affected stem (11 cases) and changes involving the interaction between the loop and stem structures (3 cases). A pathogenic mutation and its compensating substitution are fixed in a lineage in rapid succession, and often a compensatory interaction evolves convergently in different clades. At least 10%, and perhaps as many as 50%, of all nucleotide substitutions in evolving mammalian (RNAs participate in such interactions, indicating that the evolution of tRNAs proceeds along highly epistatic fitness ridges. AU - Kern, Andrew D AU - Fyodor Kondrashov ID - 889 IS - 11 JF - Nature Genetics TI - Mechanisms and convergence of compensatory evolution in mammalian mitochondrial tRNAs VL - 36 ER - TY - JOUR AB - New alleles become fixed owing to random drift of nearly neutral mutations or to positive selection of substantially advantageous mutations. After decades of debate, the fraction of fixations driven by selection remains uncertain. Within 9,390 genes, we analysed 28,196 codons at which rat and mouse differ from each other at two nucleotide sites and 1,982 codons with three differences. At codons where rat-mouse divergence involved two non-synonymous substitutions, both of them occurred in the same lineage, either rat or mouse, in 64% of cases; however, independent substitutions would occur in the same lineage with a probability of only 50%. All three non-synonymous substitutions occurred in the same lineage for 46% of codons, instead of the 25% expected. Furthermore, comparison of 12 pairs of prokaryotic genomes also shows clumping of multiple non-synonymous substitutions in the same lineage. This pattern cannot be explained by correlated mutation or episodes of relaxed negative selection, but instead indicates that positive selection acts at many sites of rapid, successive amino acid replacement. AU - Bazykin, Georgii A AU - Fyodor Kondrashov AU - Ogurtsov, Aleksey Yu AU - Sunyaev, Shamil R AU - Kondrashov, Alexey S ID - 898 IS - 6991 JF - Nature TI - Positive selection at sites of multiple amino acid replacements since rat-mouse divergence VL - 429 ER - TY - JOUR AB - We compare the functional spectrum of protein evolution in two separate animal lineages with respect to two hypotheses: (1) rates of divergence are distributed similarly among functional classes within both lineages, indicating that selective pressure on the proteome is largely independent of organismic-level biological requirements; and (2) rates of divergence are distributed differently among functional classes within each lineage, indicating species-specific selective regimes impact genome-wide substitutional patterns. Integrating comparative genome sequence with data from tissue-specific expressed-sequence-tag (EST) libraries and detailed database annotations, we find a functional genomic signature of rapid evolution and selective constraint shared between mammalian and nematode lineages despite their extensive morphological and ecological differences and distant common ancestry. In both phyla, we find evidence of accelerated evolution among components of molecular systems involved in coevolutionary change. In mammals, lineage-specific fast evolving genes include those involved in reproduction, immunity, and possibly, maternal-fetal conflict. Likelihood ratio tests provide evidence for positive selection in these rapidly evolving functional categories in mammals. In contrast, slowly evolving genes, in terms of amino acid or insertion/deletion (indel) change, in both phyla are involved in core molecular processes such as transcription, translation, and protein transport. Thus, strong purifying selection appears to act on the same core cellular processes in both mammalian and nematode lineages, whereas positive and/or relaxed selection acts on different biological processes in each lineage. AU - Castillo-Davis, Cristian I AU - Fyodor Kondrashov AU - Hartl, Daniel L AU - Kulathinal, Rob J ID - 902 IS - 5 JF - Genome Research TI - The functional genomic distribution of protein divergence in two animal phyla: Coevolution, genomic conflict, and constraint VL - 14 ER - TY - JOUR AB - The accumulation of genome-wide information on single nucleotide polymorphisms in humans provides an unprecedented opportunity to detect the evolutionary forces responsible for heterogeneity of the level of genetic variability across loci. Previous studies have shown that history of recombination events has produced long haplotype blocks in the human genome, which contribute to this heterogeneity. Other factors, however, such as natural selection or the heterogeneity of mutation rates across loci, may also lead to heterogeneity of genetic variability. We compared synonymous and non-synonymous variability within human genes with their divergence from murine orthologs. We separately analyzed the non-synonymous variants predicted to damage protein structure or function and the variants predicted to be functionally benign. The predictions were based on comparative sequence analysis and, in some cases, on the analysis of protein structure. A strong correlation between non-synonymous, benign variability and non-synonymous human-mouse divergence suggests that selection played an important role in shaping the pattern of variability in coding regions of human genes. However, the lack of correlation between deleterious variability and evolutionary divergence shows that a substantial proportion of the observed non-synonymous single-nucleotide polymorphisms reduces fitness and never reaches fixation. Evolutionary and medical implications of the impact of selection on human polymorphisms are discussed. AU - Sunyaev, Shamil R AU - Fyodor Kondrashov AU - Bork, Peer AU - Ramensky, Vasily ID - 847 IS - 24 JF - Human Molecular Genetics TI - Impact of selection, mutation rate and genetic drift on human genetic variation VL - 12 ER - TY - JOUR AB - Alternative splicing is thought to be a major source of functional diversity in animal proteins. We analyzed the evolutionary conservation of proteins encoded by alternatively spliced genes and predicted the ancestral state for 73 cases of alternative splicing (25 insertions and 48 deletions). The amino acid sequences of most of the inserts in proteins produced by alternative splicing are as conserved as the surrounding sequences. Thus, alternative splicing often creates novel isoforms by the insertion of new, functional protein sequences that probably originated from noncoding sequences of introns. AU - Fyodor Kondrashov AU - Koonin, Eugene V ID - 876 IS - 3 JF - Trends in Genetics TI - Evolution of alternative splicing: Deletions, insertions and origin of functional parts of proteins from intron sequences VL - 19 ER - TY - JOUR AB - We study fitness landscape in the space of protein sequences by relating sets of human pathogenic missense mutations in 32 proteins to amino acid substitutions that occurred in the course of evolution of these proteins. On average, ≈10% of deviations of a nonhuman protein from its human ortholog are compensated pathogenic deviations (CPDs), i.e., are caused by an amino acid substitution that, at this site, would be pathogenic to humans. Normal functioning of a CPD-containing protein must be caused by other, compensatory deviations of the nonhuman species from humans. Together, a CPD and the corresponding compensatory deviation form a Dobzhansky-Muller incompatibility that can be visualized as the corner on a fitness ridge. Thus, proteins evolve along fitness ridges which contain only ≈10 steps between sucessive corners. The fraction of CPDs among all deviations of a protein from its human ortholog does not increase with the evolutionary distance between the proteins, indicating that subtitutions that carry evolving proteins around these corners occur in rapid succession, driven by positive selection. Data on fitness of interspecies hybrids suggest that the compensatory change that makes a CPD fit usually occurs within the same protein. Data on protein structures and on cooccurrence of amino acids at different sites of multiple orthologous proteins often make it possible to provisionally identify the substitution that compensates a partiCUlar CPD. AU - Kondrashov, Alexey AU - Sunyaev, Shamil AU - Kondrashov, Fyodor ID - 885 IS - 23 JF - PNAS SN - 0027-8424 TI - Dobzhansky-Muller incompatibilities in protein evolution VL - 99 ER - TY - JOUR AB - Transcription is a slow and expensive process: in eukaryotes, approximately 20 nucleotides can be transcribed per second at the expense of at least two ATP molecules per nucleotide. Thus, at least for highly expressed genes, transcription of long introns, which are particularly common in mammals, is costly. Using data on the expression of genes that encode proteins in Caenorhabditis elegans and Homo sapiens, we show that introns in highly expressed genes are substantially shorter than those in genes that are expressed at low levels. This difference is greater in humans, such that introns are, on average, 14 times shorter in highly expressed genes than in genes with low expression, whereas in C. Elegans the difference in intron length is only twofold. In contrast, the density of introns in a gene does not strongly depend on the level of gene expression. Thus, natural selection appears to favor short introns in highly expressed genes to minimize the cost of transcription and other molecular processes, such as splicing. AU - Castillo Davis, Cristian AU - Mekhedov, Sergei AU - Hartl, Daniel AU - Koonin, Eugene AU - Kondrashov, Fyodor ID - 897 IS - 4 JF - Nature Genetics TI - Selection for short introns in highly expressed genes VL - 31 ER - TY - JOUR AB - BACKGROUND: Gene duplications have a major role in the evolution of new biological functions. Theoretical studies often assume that a duplication per se is selectively neutral and that, following a duplication, one of the gene copies is freed from purifying (stabilizing) selection, which creates the potential for evolution of a new function. RESULTS: In search of systematic evidence of accelerated evolution after duplication, we used data from 26 bacterial, six archaeal, and seven eukaryotic genomes to compare the mode and strength of selection acting on recently duplicated genes (paralogs) and on similarly diverged, unduplicated orthologous genes in different species. We find that the ratio of nonsynonymous to synonymous substitutions (Kn/Ks) in most paralogous pairs is <<1 and that paralogs typically evolve at similar rates, without significant asymmetry, indicating that both paralogs produced by a duplication are subject to purifying selection. This selection is, however, substantially weaker than the purifying selection affecting unduplicated orthologs that have diverged to the same extent as the analyzed paralogs. Most of the recently duplicated genes appear to be involved in various forms of environmental response; in particular, many of them encode membrane and secreted proteins. CONCLUSIONS: The results of this analysis indicate that recently duplicated paralogs evolve faster than orthologs with the same level of divergence and similar functions, but apparently do not experience a phase of neutral evolution. We hypothesize that gene duplications that persist in an evolving lineage are beneficial from the time of their origin, due primarily to a protein dosage effect in response to variable environmental conditions; duplications are likely to give rise to new functions at a later phase of their evolution once a higher level of divergence is reached. AU - Kondrashov, Fyodor AU - Rogozin, Igor AU - Wolf, Yuri AU - Koonin, Eugene ID - 871 IS - 2 JF - Genome Biology SN - 1465-6906 TI - Selection in the evolution of gene duplications VL - 3 ER - TY - JOUR AB - The polymeric ubiquitin (poly-u) genes are composed of tandem 228-bp repeats with no spacer sequences between individual monomer units. Ubiquitin is one of the most conserved proteins known to date, and the individual units within a number of poly-u genes are significantly more similar to each other than would be expected if each unit evolved independently. It has been proposed that the rather striking similarity among poly-u monomers in some lineages is caused by a series of homogenization events. Here we report the sequences of the polyubiquitin-C (Ubc) genes in two mouse strains. Analysis of these sequences, as well as those of the previously reported Chinese hamster and rat poly-u genes, supports the assertion that the homogenization of the ubiquitin-C gene in rodents is due to unequal crossing-over events. The sequence divergence of noncoding DNA was used to estimate the frequency of unequal crossing-over events (6.3 x 10-5 events per generation) in the Ubc gene, as well as to provide evidence of apparent selection in the poly-u gene. AU - Perelygin, Andrey AU - Kondrashov, Fyodor AU - Rogozin, Igor AU - Brinton, Margo ID - 859 IS - 2 JF - Journal of Molecular Evolution SN - 0022-2844 TI - Evolution of the mouse polyubiquitin C gene VL - 55 ER - TY - JOUR AB - BACKGROUND: Detection of changes in a protein's evolutionary rate may reveal cases of change in that protein's function. We developed and implemented a simple relative rates test in an attempt to assess the rate constancy of protein evolution and to detect cases of functional diversification between orthologous proteins. The test was performed on clusters of orthologous protein sequences from complete bacterial genomes (Chlamydia trachomatis, C. muridarum and Chlamydophila pneumoniae), complete archaeal genomes (Pyrococcus horikoshii, P. abyssi and P. furiosus) and partially sequenced mammalian genomes (human, mouse and rat). RESULTS: Amino-acid sequence evolution rates are significantly correlated on different branches of phylogenetic trees representing the great majority of analyzed orthologous protein sets from all three domains of life. However, approximately 1% of the proteins from each group of species deviates from this pattern and instead shows variation that is consistent with an acceleration of the rate of amino-acid substitution, which may be due to functional diversification. Most of the putative functionally diversified proteins from all three species groups are predicted to function at the periphery of the cells and mediate their interaction with the environment. CONCLUSIONS: Relative rates of protein evolution are remarkably constant for the three species groups analyzed here. Deviations from this rate constancy are probably due to changes in selective constraints associated with diversification between orthologs. Functional diversification between orthologs is thought to be a relatively rare event. However, the resolution afforded by the test designed specifically for genomic-scale datasets allowed us to identify numerous cases of possible functional diversification between orthologous proteins. AU - Jordan, Ingo AU - Kondrashov, Fyodor AU - Rogozin, Igor AU - Tatusov, Roman AU - Wolf, Yuri AU - Koonin, Eugene ID - 888 IS - 12 JF - Genome Biology SN - 1465-6906 TI - Constant relative rate of protein evolution and detection of functional diversification among bacterial, archaeal and eukaryotic proteins VL - 2 ER - TY - JOUR AB - Motivation: The context of the start codon (typically, AUG) and the features of the 5′ Untranslated Regions (5′ UTRs) are important for understanding translation regulation in eukaryotic mRNAs and for accurate prediction of the coding region in genomic and cDNA sequences. The presence of AUG triplets in 5′ UTRs (upstream AUGs) might effect the initiation rate and, in the context of gene prediction, could reduce the accuracy of the identification of the authentic start. To reveal potential connections between the presence of upstream AUGs and other features of 5′ UTRs, such as their length and the start codon context, we undertook a systematic analysis of the available eukaryotic 5′ UTR sequences. Results: We show that a large fraction of 5′ UTRs in the available cDNA sequences, 15-53% depending on the organism, contain upstream ATGs. A negative correlation was observed between the information content of the translation start signal and the length of the 5′ UTR. Similarly, a negative correlation exists between the 'strength' of the start context and the number of upstream ATGs. Typically, cDNAs containing long 5′ UTRs with multiple upstream ATGs have a 'weak' start context, and in contrast, cDNAs containing short 5′ UTRs without ATGs have 'strong' starts. These counter-intuitive results may be interpreted in terms of upstream AUGs having an important role in the regulation of translation efficiency by ensuring low basal translation level via double negative control and creating the potential for additional regulatory mechanisms. One of such mechanisms, supported by experimental studies of some mRNAs, includes removal of the AUG-containing portion of the 5′ UTR by alternative splicing. AU - Rogozin, Igor AU - Kochetov, Alex AU - Kondrashov, Fyodor AU - Koonin, Eugene AU - Milanesi, Luciano ID - 855 IS - 10 JF - Bioinformatics SN - 1367-4803 TI - Presence of ATG triplets in 5′ untranslated regions of eukaryotic cDNAs correlates with a 'weak'context of the start codon VL - 17 ER - TY - JOUR AB - Sex is thought to facilitate accumulation of initially rare beneficial mutations by allowing simultaneous allele replacements at many loci. However, this advantage of sex depends on a restrictive assumption that the fitness of a genotype is determined by fitness potential, a single intermediate variable to which all loci contribute additively, so that new alleles can accumulate in any order. Individual-based simulations of sexual and asexual populations reveal that under generic selection, sex often retards adaptive evolution. When new alleles are beneficial only if they accumulate in a prescribed order, a sexual population may evolve two or more times slower than an asexual population because only asexual reproduction allows some overlap of successive allele replacements. Many other fitness surfaces lead to an even greater disadvantage of sex. Thus, either sex exists in spite of its impact on the rate of adaptive allele replacements, or natural fitness surfaces have rather specific properties, at least at the scale of intrapopulation genetic variability. AU - Kondrashov, Fyodor AU - Kondrashov, Alexey ID - 874 IS - 21 JF - PNAS SN - 0027-8424 TI - Multidimensional epistasis and the disadvantage of sex VL - 98 ER - TY - JOUR AB - Genes with new functions often evolve by gene duplication. Alternative splicing is another means of evolutionary innovation in eukaryotes, which allows a single gene to encode functionally diverse proteins. We investigate a connection between these two evolutionary phenomena. For ∼10% of the described cases of substitution alternative splicing, such that either one or another amino acid sequence is included into the protein, evidence of origin by tandem exon duplication was found. This is a conservative estimate because alternative exons are typically short and, on many occasions, duplicates may have diverged beyond recognition. Dating exon duplications through a combination of the available experimental data on alternative splicing in orthologous genes from different species and computational analysis indicates that most of the duplications antedate at least the radiation of mammalian orders or even the radiation of vertebrate classes. At present, tandem exon duplication is the only mechanism of evolution of substitution alternative splicing that can be specifically demonstrated. Along with gene duplication, this could be a major route for generating functional diversity during evolution of multicellular eukaryotes. AU - Kondrashov, Fyodor AU - Koonin, Eugene ID - 867 IS - 23 JF - Human Molecular Genetics SN - 0964-6906 TI - Origin of alternative splicing by tandem exon duplication VL - 10 ER - TY - JOUR AB - The study and comparison of mutation(al) spectra is an important problem in molecular biology, because these spectra often reflect on important features of mutations and their fixation. Such features include the interaction of DNA with various mutagens, the function of repair/replication enzymes, and properties of target proteins. It is known that mutability varies significantly along nucleotide sequences, such that mutations often concentrate at certain positions, called "hotspots," in a sequence. In this paper, we discuss in detail two approaches for mutation spectra analysis: the comparison of mutation spectra with a HG-PUBL program, (FTP: sunsite.unc.edu/pub/academic/ biology/dna-mutations/hyperg) and hotspot prediction with the CLUSTERM program (www.itba.mi.cnr.it/webmutation; ftp.bionet.nsc.ru/pub/biology/dbms/clusterm.zip). Several other approaches for mutational spectra analysis, such as the analysis of a target protein structure, hotspot context revealing, multiple spectra comparisons, as well as a number of mutation databases are briefly described. Mutation spectra in the lacI gene of E. coli and the human p53 gene are used for illustration of various difficulties of such analysis. AU - Rogozin, Igor AU - Kondrashov, Fyodor AU - Glazko, Galina ID - 851 IS - 2 JF - Human Mutation SN - 1059-7794 TI - Use of mutation spectra analysis software VL - 17 ER - TY - JOUR AU - Wolf, Yuri AU - Kondrashov, Fyodor AU - Koonin, Eugene ID - 841 IS - 9 JF - Trends in Genetics SN - 0168-9479 TI - Footprints of primordial introns on the eukaryotic genome: still no clear traces VL - 17 ER - TY - JOUR AU - Wolf, Yuri AU - Kondrashov, Fyodor AU - Koonin, Eugene ID - 842 IS - 8 JF - Trends in Genetics SN - 0168-9479 TI - No footprints of primordial introns in a eukaryotic genome VL - 16 ER - TY - JOUR AB - Sympatric speciation, the origin of two or more species from a single local population, has almost certainly been involved in formation of several species flocks, and may be fairly common in nature. The most straightforward scenario for sympatric speciation requires disruptive selection favouring two substantially different phenotypes, and consists of the evolution of reproductive isolation between them followed by the elimination of all intermediate phenotypes. Here we use the hypergeometric phenotypic model to show that sympatric speciation is possible even when fitness and mate choice depend on different quantitative traits, so that speciation must involve formation of covariance between these traits. The increase in the number of variable loci affecting fitness facilitates sympatric speciation, whereas the increase in the number of variable loci affecting mate choice has the opposite effect. These predictions may enable more cases of sympatric speciation to be identified. AU - Kondrashov, Alexey AU - Kondrashov, Fyodor ID - 883 IS - 6742 JF - Nature SN - 0028-0836 TI - Interactions among quantitative traits in the course of sympatric speciation VL - 400 ER -