Esteban, Laura A; Lonishin, Lyubov R; Bobrovskiy, Daniil M; Leleytner, Gregory; Bogatyreva, Natalya S; Kondrashov, FyodorIST Austria ; Ivankov, Dmitry N
Epistasis, the context-dependence of the contribution of an amino acid substitution to fitness, is common in evolution. To detect epistasis, fitness must be measured for at least four genotypes: the reference genotype, two different single mutants and a double mutant with both of the single mutations. For higher-order epistasis of the order n, fitness has to be measured for all 2n genotypes of an n-dimensional hypercube in genotype space forming a ‘combinatorially complete dataset’. So far, only a handful of such datasets have been produced by manual curation. Concurrently, random mutagenesis experiments have produced measurements of fitness and other phenotypes in a high-throughput manner, potentially containing a number of combinatorially complete datasets. We present an effective recursive algorithm for finding all hypercube structures in random mutagenesis experimental data. To test the algorithm, we applied it to the data from a recent HIS3 protein dataset and found all 199 847 053 unique combinatorially complete genotype combinations of dimensionality ranging from 2 to 12. The algorithm may be useful for researchers looking for higher-order epistasis in their high-throughput experimental data.
This work was supported by the European Research Council under the European Union’s Seventh Framework Programme (FP7/2007-2013, ERC grant agreement 335980_EinME) and Startup package to the Ivankov laboratory at Skolkovo Institute of Science and Technology. The work was started at the School of Molecular and Theoretical Biology 2017 supported by the Zimin Foundation. N.S.B. was supported by the Woman Scientists Support Grant in Centre for Genomic Regulation (CRG).
Esteban LA, Lonishin LR, Bobrovskiy DM, et al. HypercubeME: Two hundred million combinatorially complete datasets from a single experiment. Bioinformatics. 2020;36(6):1960-1962. doi:10.1093/bioinformatics/btz841
Esteban, L. A., Lonishin, L. R., Bobrovskiy, D. M., Leleytner, G., Bogatyreva, N. S., Kondrashov, F., & Ivankov, D. N. (2020). HypercubeME: Two hundred million combinatorially complete datasets from a single experiment. Bioinformatics, 36(6), 1960–1962. https://doi.org/10.1093/bioinformatics/btz841
Esteban, Laura A, Lyubov R Lonishin, Daniil M Bobrovskiy, Gregory Leleytner, Natalya S Bogatyreva, Fyodor Kondrashov, and Dmitry N Ivankov. “HypercubeME: Two Hundred Million Combinatorially Complete Datasets from a Single Experiment.” Bioinformatics 36, no. 6 (2020): 1960–62. https://doi.org/10.1093/bioinformatics/btz841.
L. A. Esteban et al., “HypercubeME: Two hundred million combinatorially complete datasets from a single experiment,” Bioinformatics, vol. 36, no. 6, pp. 1960–1962, 2020.
Esteban LA, Lonishin LR, Bobrovskiy DM, Leleytner G, Bogatyreva NS, Kondrashov F, Ivankov DN. 2020. HypercubeME: Two hundred million combinatorially complete datasets from a single experiment. Bioinformatics. 36(6), 1960–1962.
Esteban, Laura A., et al. “HypercubeME: Two Hundred Million Combinatorially Complete Datasets from a Single Experiment.” Bioinformatics, vol. 36, no. 6, Oxford Academic, 2020, pp. 1960–62, doi:10.1093/bioinformatics/btz841.
All files available under the following license(s):
Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0):
2020_Bioinformatics_Esteban.pdf 308.34 KB