Dr Valentina Iotchkova

Research Area: Bioinformatics & Stats (inc. Modelling and Computational Biology)
Technology Exchange: Bioinformatics, Computational biology, Human genetics and Statistical genetics
Scientific Themes: Bioinformatics, Statistics & Computational Biology and Genes, Genetics, Epigenetics & Genomics

Over the last decade genetic and epidemiological research has generated a vast amount of data and expertise for evaluating the genetic contribution to human complex traits. These efforts have identified thousands of (mostly common) genetic variants predisposing to disease risk factors, however, they have been limited in understanding the impact of rare variants, identifying the causal biological mechanisms underlying the observed genotype-phenotype associations and prioritizing new biological targets in drug discovery. Despite the enormous opportunities coming from the availability of large-scale data, novel computational approaches are necessary for efficient data integration and analysis of different layers of information and will play an instrumental role in utilizing data to its full potential.

The Statistical Genetics group is currently focused on the development and application of such new approaches by exploring multivariate modelling, integration of functional enrichment information and appropriate handling of missing phenotype data. In particular we are keen to develop methodological advancements to (i) accelerate the discovery and interpretation of rare genetic variation underlying complex disease risk, (ii) increase the power for discovery and interpretation of multidimensional phenotypic consequences of common genetic variation and (iii) increase the identification of causal variants for complex and disease traits, the inference on directionality of causal links between different molecular layers and the interpretability of the underlying biological mechanisms.

There are no collaborations listed for this principal investigator.

Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yan Y, Kundu K, Ecker S et al. 2016. Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell, 167 (5), pp. 1398-1414.e24. | Show Abstract | Read more

Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.

Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA et al. 2016. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell, 167 (5), pp. 1415-1429.e19. | Show Abstract | Read more

Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.

Breeze CE, Paul DS, van Dongen J, Butcher LM, Ambrose JC, Barrett JE, Lowe R, Rakyan VK, Iotchkova V, Frontini M et al. 2016. eFORGE: A Tool for Identifying Cell Type-Specific Signal in Epigenomic Data. Cell Rep, 17 (8), pp. 2137-2150. | Show Abstract | Read more

Epigenome-wide association studies (EWAS) provide an alternative approach for studying human disease through consideration of non-genetic variants such as altered DNA methylation. To advance the complex interpretation of EWAS, we developed eFORGE (http://eforge.cs.ucl.ac.uk/), a new standalone and web-based tool for the analysis and interpretation of EWAS data. eFORGE determines the cell type-specific regulatory component of a set of EWAS-identified differentially methylated positions. This is achieved by detecting enrichment of overlap with DNase I hypersensitive sites across 454 samples (tissues, primary cell types, and cell lines) from the ENCODE, Roadmap Epigenomics, and BLUEPRINT projects. Application of eFORGE to 20 publicly available EWAS datasets identified disease-relevant cell types for several common diseases, a stem cell-like signature in cancer, and demonstrated the ability to detect cell-composition effects for EWAS performed on heterogeneous tissues. Our approach bridges the gap between large-scale epigenomics data and EWAS-derived target selection to yield insight into disease etiology.

Iotchkova V, Huang J, Morris JA, Jain D, Barbieri C, Walter K, Min JL, Chen L, Astle W, Cocca M et al. 2016. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat Genet, 48 (11), pp. 1303-1312. | Show Abstract | Read more

Large-scale whole-genome sequence data sets offer novel opportunities to identify genetic variation underlying human traits. Here we apply genotype imputation based on whole-genome sequence data from the UK10K and 1000 Genomes Project into 35,981 study participants of European ancestry, followed by association analysis with 20 quantitative cardiometabolic and hematological traits. We describe 17 new associations, including 6 rare (minor allele frequency (MAF) < 1%) or low-frequency (1% < MAF < 5%) variants with platelet count (PLT), red blood cell indices (MCH and MCV) and HDL cholesterol. Applying fine-mapping analysis to 233 known and new loci associated with the 20 traits, we resolve the associations of 59 loci to credible sets of 20 or fewer variants and describe trait enrichments within regions of predicted regulatory function. These findings improve understanding of the allelic architecture of risk factors for cardiometabolic and hematological diseases and provide additional functional insights with the identification of potentially novel biological targets.

Okada Y, Muramatsu T, Suita N, Kanai M, Kawakami E, Iotchkova V, Soranzo N, Inazawa J, Tanaka T. 2016. Significant impact of miRNA-target gene networks on genetics of human complex traits. Sci Rep, 6 (1), pp. 22223. | Show Abstract | Read more

The impact of microRNA (miRNA) on the genetics of human complex traits, especially in the context of miRNA-target gene networks, has not been fully assessed. Here, we developed a novel analytical method, MIGWAS, to comprehensively evaluate enrichment of genome-wide association study (GWAS) signals in miRNA-target gene networks. We applied the method to the GWAS results of the 18 human complex traits from >1.75 million subjects, and identified significant enrichment in rheumatoid arthritis (RA), kidney function, and adult height (P < 0.05/18 = 0.0028, most significant enrichment in RA with P = 1.7 × 10(-4)). Interestingly, these results were consistent with current literature-based knowledge of the traits on miRNA obtained through the NCBI PubMed database search (adjusted P = 0.024). Our method provided a list of miRNA and target gene pairs with excess genetic association signals, part of which included drug target genes. We identified a miRNA (miR-4728-5p) that downregulates PADI2, a novel RA risk gene considered as a promising therapeutic target (rs761426, adjusted P = 2.3 × 10(-9)). Our study indicated the significant impact of miRNA-target gene networks on the genetics of human complex traits, and provided resources which should contribute to drug discovery and nucleic acid medicine.

Dahl A, Iotchkova V, Baud A, Johansson Å, Gyllensten U, Soranzo N, Mott R, Kranis A, Marchini J. 2016. A multiple-phenotype imputation method for genetic studies. Nat Genet, 48 (4), pp. 466-472. | Show Abstract | Read more

Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple-phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real data sets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of association.

UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, Perry JRB, Xu C, Futema M et al. 2015. The UK10K project identifies rare variants in health and disease. Nature, 526 (7571), pp. 82-90. | Show Abstract | Read more

The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.

Timpson NJ, Walter K, Min JL, Tachmazidou I, Malerba G, Shin S-Y, Chen L, Futema M, Southam L, Iotchkova V et al. 2015. Erratum: A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans. Nat Commun, 6 pp. 7171. | Read more

Vuckovic D, Gasparini P, Soranzo N, Iotchkova V. 2015. MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies. Bioinformatics, 31 (16), pp. 2754-2756. | Show Abstract | Read more

UNLABELLED: As new methods for multivariate analysis of genome wide association studies become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance-based method for meta-analysis, generalized to an n-dimensional setting. AVAILABILITY AND IMPLEMENTATION: The R package MultiMeta can be downloaded from CRAN. CONTACT: dragana.vuckovic@burlo.trieste.it; vi1@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Timpson NJ, Walter K, Min JL, Tachmazidou I, Malerba G, Shin S-Y, Chen L, Futema M, Southam L, Iotchkova V et al. 2014. A rare variant in APOC3 is associated with plasma triglyceride and VLDL levels in Europeans. Nat Commun, 5 pp. 4871. | Show Abstract | Read more

The analysis of rich catalogues of genetic variation from population-based sequencing provides an opportunity to screen for functional effects. Here we report a rare variant in APOC3 (rs138326449-A, minor allele frequency ~0.25% (UK)) associated with plasma triglyceride (TG) levels (-1.43 s.d. (s.e.=0.27 per minor allele (P-value=8.0 × 10(-8))) discovered in 3,202 individuals with low read-depth, whole-genome sequence. We replicate this in 12,831 participants from five additional samples of Northern and Southern European origin (-1.0 s.d. (s.e.=0.173), P-value=7.32 × 10(-9)). This is consistent with an effect between 0.5 and 1.5 mmol l(-1) dependent on population. We show that a single predicted splice donor variant is responsible for association signals and is independent of known common variants. Analyses suggest an independent relationship between rs138326449 and high-density lipoprotein (HDL) levels. This represents one of the first examples of a rare, large effect variant identified from whole-genome sequencing at a population scale.

Dahl A, Hore V, Iotchkova V, Marchini J. Network inference in matrix-variate Gaussian models with non-independent noise | Show Abstract

Inferring a graphical model or network from observational data from a large number of variables is a well studied problem in machine learning and computational statistics. In this paper we consider a version of this problem that is relevant to the analysis of multiple phenotypes collected in genetic studies. In such datasets we expect correlations between phenotypes and between individuals. We model observations as a sum of two matrix normal variates such that the joint covariance function is a sum of Kronecker products. This model, which generalizes the Graphical Lasso, assumes observations are correlated due to known genetic relationships and corrupted with non-independent noise. We have developed a computationally efficient EM algorithm to fit this model. On simulated datasets we illustrate substantially improved performance in network reconstruction by allowing for a general noise distribution.

Tachmazidou I, Süveges D, Min JL, Ritchie GRS, Steinberg J, Walter K, Iotchkova V, Schwartzentruber J, Huang J, Memari Y et al. 2017. Whole-Genome Sequencing Coupled to Imputation Discovers Genetic Signals for Anthropometric Traits. Am J Hum Genet, 100 (6), pp. 865-884. | Show Abstract | Read more

Deep sequence-based imputation can enhance the discovery power of genome-wide association studies by assessing previously unexplored variation across the common- and low-frequency spectra. We applied a hybrid whole-genome sequencing (WGS) and deep imputation approach to examine the broader allelic architecture of 12 anthropometric traits associated with height, body mass, and fat distribution in up to 267,616 individuals. We report 106 genome-wide significant signals that have not been previously identified, including 9 low-frequency variants pointing to functional candidates. Of the 106 signals, 6 are in genomic regions that have not been implicated with related traits before, 28 are independent signals at previously reported regions, and 72 represent previously reported signals for a different anthropometric trait. 71% of signals reside within genes and fine mapping resolves 23 signals to one or two likely causal variants. We confirm genetic overlap between human monogenic and polygenic anthropometric traits and find signal enrichment in cis expression QTLs in relevant tissues. Our results highlight the potential of WGS strategies to enhance biologically relevant discoveries across the frequency spectrum.

Chen L, Ge B, Casale FP, Vasquez L, Kwan T, Garrido-Martín D, Watt S, Yan Y, Kundu K, Ecker S et al. 2016. Genetic Drivers of Epigenetic and Transcriptional Variation in Human Immune Cells. Cell, 167 (5), pp. 1398-1414.e24. | Show Abstract | Read more

Characterizing the multifaceted contribution of genetic and epigenetic factors to disease phenotypes is a major challenge in human genetics and medicine. We carried out high-resolution genetic, epigenetic, and transcriptomic profiling in three major human immune cell types (CD14+ monocytes, CD16+ neutrophils, and naive CD4+ T cells) from up to 197 individuals. We assess, quantitatively, the relative contribution of cis-genetic and epigenetic factors to transcription and evaluate their impact as potential sources of confounding in epigenome-wide association studies. Further, we characterize highly coordinated genetic effects on gene expression, methylation, and histone variation through quantitative trait locus (QTL) mapping and allele-specific (AS) analyses. Finally, we demonstrate colocalization of molecular trait QTLs at 345 unique immune disease loci. This expansive, high-resolution atlas of multi-omics changes yields insights into cell-type-specific correlation between diverse genomic inputs, more generalizable correlations between these inputs, and defines molecular events that may underpin complex disease risk.

Astle WJ, Elding H, Jiang T, Allen D, Ruklisa D, Mann AL, Mead D, Bouman H, Riveros-Mckay F, Kostadima MA et al. 2016. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell, 167 (5), pp. 1415-1429.e19. | Show Abstract | Read more

Many common variants have been associated with hematological traits, but identification of causal genes and pathways has proven challenging. We performed a genome-wide association analysis in the UK Biobank and INTERVAL studies, testing 29.5 million genetic variants for association with 36 red cell, white cell, and platelet properties in 173,480 European-ancestry participants. This effort yielded hundreds of low frequency (<5%) and rare (<1%) variants with a strong impact on blood cell phenotypes. Our data highlight general properties of the allelic architecture of complex traits, including the proportion of the heritable component of each blood trait explained by the polygenic signal across different genome regulatory domains. Finally, through Mendelian randomization, we provide evidence of shared genetic pathways linking blood cell indices with complex pathologies, including autoimmune diseases, schizophrenia, and coronary heart disease and evidence suggesting previously reported population associations between blood cell indices and cardiovascular disease may be non-causal.

Iotchkova V, Huang J, Morris JA, Jain D, Barbieri C, Walter K, Min JL, Chen L, Astle W, Cocca M et al. 2016. Discovery and refinement of genetic loci associated with cardiometabolic risk using dense imputation maps. Nat Genet, 48 (11), pp. 1303-1312. | Show Abstract | Read more

Large-scale whole-genome sequence data sets offer novel opportunities to identify genetic variation underlying human traits. Here we apply genotype imputation based on whole-genome sequence data from the UK10K and 1000 Genomes Project into 35,981 study participants of European ancestry, followed by association analysis with 20 quantitative cardiometabolic and hematological traits. We describe 17 new associations, including 6 rare (minor allele frequency (MAF) < 1%) or low-frequency (1% < MAF < 5%) variants with platelet count (PLT), red blood cell indices (MCH and MCV) and HDL cholesterol. Applying fine-mapping analysis to 233 known and new loci associated with the 20 traits, we resolve the associations of 59 loci to credible sets of 20 or fewer variants and describe trait enrichments within regions of predicted regulatory function. These findings improve understanding of the allelic architecture of risk factors for cardiometabolic and hematological diseases and provide additional functional insights with the identification of potentially novel biological targets.

Dahl A, Iotchkova V, Baud A, Johansson Å, Gyllensten U, Soranzo N, Mott R, Kranis A, Marchini J. 2016. A multiple-phenotype imputation method for genetic studies. Nat Genet, 48 (4), pp. 466-472. | Show Abstract | Read more

Genetic association studies have yielded a wealth of biological discoveries. However, these studies have mostly analyzed one trait and one SNP at a time, thus failing to capture the underlying complexity of the data sets. Joint genotype-phenotype analyses of complex, high-dimensional data sets represent an important way to move beyond simple genome-wide association studies (GWAS) with great potential. The move to high-dimensional phenotypes will raise many new statistical problems. Here we address the central issue of missing phenotypes in studies with any level of relatedness between samples. We propose a multiple-phenotype mixed model and use a computationally efficient variational Bayesian algorithm to fit the model. On a variety of simulated and real data sets from a range of organisms and trait types, we show that our method outperforms existing state-of-the-art methods from the statistics and machine learning literature and can boost signals of association.

UK10K Consortium, Walter K, Min JL, Huang J, Crooks L, Memari Y, McCarthy S, Perry JRB, Xu C, Futema M et al. 2015. The UK10K project identifies rare variants in health and disease. Nature, 526 (7571), pp. 82-90. | Show Abstract | Read more

The contribution of rare and low-frequency variants to human traits is largely unexplored. Here we describe insights from sequencing whole genomes (low read depth, 7×) or exomes (high read depth, 80×) of nearly 10,000 individuals from population-based and disease collections. In extensively phenotyped cohorts we characterize over 24 million novel sequence variants, generate a highly accurate imputation reference panel and identify novel alleles associated with levels of triglycerides (APOB), adiponectin (ADIPOQ) and low-density lipoprotein cholesterol (LDLR and RGAG1) from single-marker and rare variant aggregation tests. We describe population structure and functional annotation of rare and low-frequency variants, use the data to estimate the benefits of sequencing for association studies, and summarize lessons from disease-specific collections. Finally, we make available an extensive resource, including individual-level genetic and phenotypic data and web-based tools to facilitate the exploration of association results.

Vuckovic D, Gasparini P, Soranzo N, Iotchkova V. 2015. MultiMeta: an R package for meta-analyzing multi-phenotype genome-wide association studies. Bioinformatics, 31 (16), pp. 2754-2756. | Show Abstract | Read more

UNLABELLED: As new methods for multivariate analysis of genome wide association studies become available, it is important to be able to combine results from different cohorts in a meta-analysis. The R package MultiMeta provides an implementation of the inverse-variance-based method for meta-analysis, generalized to an n-dimensional setting. AVAILABILITY AND IMPLEMENTATION: The R package MultiMeta can be downloaded from CRAN. CONTACT: dragana.vuckovic@burlo.trieste.it; vi1@sanger.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

3306