Prof Jim R Hughes
|Research Area:||Cell and Molecular Biology|
|Scientific Themes:||Genes, Genetics, Epigenetics & Genomics and Haematology|
|Keywords:||Gene Regulation, Genomics, Capture-C and Bioinformatics|
The epigenetic and transcriptional landscape of the Nprl3 locus in mouse erythroid cells, which cont ...
Basic classification of all of the active elements (Dnase1 Hypersensitive Sites or DHS) in mouse ery ...
Schematic representation of the Capture-C Chromatin Conformation Capture method. Hughes et al Nat ...
Molecular biology and the biological sciences in general have undergone a technical revolution over the last decade, founded upon the ability to sequence and reconstruct an organism's genomic blueprint in its entirety. Subsequent technical advances such as expression or tiled genomic microarrays and now high-throughput sequencing technologies (HTS) allow us to investigate, on the scale of the whole genome, how and in what situations particular parts of that blueprint are actually used. Although the biological questions remain the same as those asked at an individual gene or genomic loci, the methods to generate, analyse and combine these whole-genome data-types are different and require specialist approaches and skills.
One of the most fundamental questions in molecular biology is how are specific parts of the genomic blueprint used in specific situations when the same underlying genomic sequence is used whether a cell becomes a neuronal cell or a blood cell. The most basic expression of a genome's activity is the RNA it produces or "expresses" as in the form of mRNA this will go on to determine which proteins are produced in the cell. It has also become clear that RNA which does not produce protein (non-coding or ncRNA) also has a vital and complex regulatory function within the genome.
One of the main research interests of the Genome Biology group is to study the processes that determine whether RNA is or is not produced from a genomic locus as cells develop into red blood cells (erythropoiesis) and which factors determine the rate at which it is produced. We employ most of the current genome-wide methods to determine which parts of the genome are being transcribed into RNA (RNA-seq), investigating both the stable fractions (mRNA) and raw output of the genome (nascent). We correlate this activity (transcription) with changes in the distribution and chemical modifications of the nucleosomal proteins associated with genomic DNA (DNase-seq and ChIP-seq) and which regulatory proteins (transcription factor ChIP-seq) are bound to the DNA, in an effort to determine how these changes regulate RNA expression (Figure 1 and 2).
Although we use many existing methodologies the group also develops novel assays where needed to fill many of the current deficiencies in our ability to assess genome behaviour. One of the most difficult problems when trying to understand gene regulation on the scale of genome or at individual genes is to determine which regions of the genome control the expression patterns and levels of any particular gene. To address this problem we developed the Capture-C 3C method which allows us to interrogate the regulatory landscapes of hundreds of genes in a single experiment (Figure 3). We are now using the Capture-C method, in combination with our genomics and transcriptomics data, to link genes and regulatory elements en masse in the erythroid system.
We are at present trying to determine which parts of the surrounding genome are functionally required to regulate the transcription of a particular gene or transcript (cis-regulatory elements) and how the molecular events at these regions lead to production of RNA at a remote gene promoter. This represents a fundamental lack in our current understanding of gene regulation and is a necessary step to a complete understanding of this process.
Due to the size and complexity of the datasets produced the group is heavily reliant on bioinformatics to analyse and correlate these data and has a lot of experience in using and developing these types of tools in its own right and as a strong collaboration with the Oxford Computational Biology Research Group (CBRG). The group is very collaborative in structure and works closely with other groups within the MHU department in particular and the WIMM as a whole. This efficient structure allows observations derived from genome-wide observations to be functionally tested in well-understood paradigms of gene regulation such as the α globin locus and facilitates the genome-wide analysis of concepts gained from the careful interrogation of the model loci.
|Prof Doug Higgs FRS||Nuffield Division of Clinical Laboratory Sciences||University of Oxford||United Kingdom|
|Prof Thomas Milne||Nuffield Division of Clinical Laboratory Sciences||University of Oxford||United Kingdom|
Long non-coding (lnc) RNAs can regulate gene expression and protein functions. However, the proportion of lncRNAs with biological activities among the thousands expressed in mammalian cells is controversial. We studied Lockd (lncRNA downstream of Cdkn1b), a 434-nt polyadenylated lncRNA originating 4 kb 3' to the Cdkn1b gene. Deletion of the 25-kb Lockd locus reduced Cdkn1b transcription by approximately 70% in an erythroid cell line. In contrast, homozygous insertion of a polyadenylation cassette 80 bp downstream of the Lockd transcription start site reduced the entire lncRNA transcript level by >90% with no effect on Cdkn1b transcription. The Lockd promoter contains a DNase-hypersensitive site, binds numerous transcription factors, and physically associates with the Cdkn1b promoter in chromosomal conformation capture studies. Therefore, the Lockd gene positively regulates Cdkn1b transcription through an enhancer-like cis element, whereas the lncRNA itself is dispensable, which may be the case for other lncRNAs. Hide abstract
Methods for analyzing chromosome conformation in mammalian cells are either low resolution or low throughput and are technically challenging. In next-generation (NG) Capture-C, we have redesigned the Capture-C method to achieve unprecedented levels of sensitivity and reproducibility. NG Capture-C can be used to analyze many genetic loci and samples simultaneously. High-resolution data can be produced with as few as 100,000 cells, and single-nucleotide polymorphisms can be used to generate allele-specific tracks. The method is straightforward to perform and should greatly facilitate the investigation of many questions related to gene regulation as well as the functional dissection of traits examined in genome-wide association studies. Hide abstract
© 2015 The Authors. Histone H3.3 is a replication-independent histone variant, which replaces histones that are turned over throughout the entire cell cycle. H3.3 deposition at euchromatin is dependent on HIRA, whereas ATRX/Daxx deposits H3.3 at pericentric heterochromatin and telomeres. The role of H3.3 at heterochromatic regions is unknown, but mutations in the ATRX/Daxx/H3.3 pathway are linked to aberrant telomere lengthening in certain cancers. In this study, we show that ATRX-dependent deposition of H3.3 is not limited to pericentric heterochromatin and telomeres but also occurs at heterochromatic sites throughout the genome. Notably, ATRX/H3.3 specifically localizes to silenced imprinted alleles in mouse ESCs. ATRX KO cells failed to deposit H3.3 at these sites, leading to loss of the H3K9me3 heterochromatin modification, loss of repression, and aberrant allelic expression. We propose a model whereby ATRX-dependent deposition of H3.3 into heterochromatin is normally required to maintain the memory of silencing at imprinted loci. Hide abstract
Gene expression during development and differentiation is regulated in a cell- and stage-specific manner by complex networks of intergenic and intragenic cis-regulatory elements whose numbers and representation in the genome far exceed those of structural genes. Using chromosome conformation capture, it is now possible to analyze in detail the interaction between enhancers, silencers, boundary elements and promoters at individual loci, but these techniques are not readily scalable. Here we present a high-throughput approach (Capture-C) to analyze cis interactions, interrogating hundreds of specific interactions at high resolution in a single experiment. We show how this approach will facilitate detailed, genome-wide analysis to elucidate the general principles by which cis-acting sequences control gene expression. In addition, we show how Capture-C will expedite identification of the target genes and functional effects of SNPs that are associated with complex diseases, which most frequently lie in intergenic cis-acting regulatory elements. Hide abstract
BACKGROUND: Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. RESULTS: We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5' capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. CONCLUSIONS: We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci. Hide abstract
SUMMARY: Multi-Image Genome (MIG) viewer is a web-based application for visualizing, querying and filtering many thousands of genome browser regions as well as for exporting the data in a variety of formats. This methodology has been used successfully to analyze ChIP-Seq data and RNA-Seq data and to detect somatic mutations in genome resequencing projects. AVAILABILITY: MIG is available at https://mig.molbiol.ox.ac.uk/mig/ Hide abstract
Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits. Hide abstract
The extracellular signal-related kinases 1 and 2 (ERK1/2) are key proteins mediating mitogen-activated protein kinase signaling downstream of RAS: phosphorylation of ERK1/2 leads to nuclear uptake and modulation of multiple targets. Here, we show that reduced dosage of ERF, which encodes an inhibitory ETS transcription factor directly bound by ERK1/2 (refs. 2,3,4,5,6,7), causes complex craniosynostosis (premature fusion of the cranial sutures) in humans and mice. Features of this newly recognized clinical disorder include multiple-suture synostosis, craniofacial dysmorphism, Chiari malformation and language delay. Mice with functional Erf levels reduced to ∼30% of normal exhibit postnatal multiple-suture synostosis; by contrast, embryonic calvarial development appears mildly delayed. Using chromatin immunoprecipitation in mouse embryonic fibroblasts and high-throughput sequencing, we find that ERF binds preferentially to elements away from promoters that contain RUNX or AP-1 motifs. This work identifies ERF as a novel regulator of osteogenic stimulation by RAS-ERK signaling, potentially by competing with activating ETS factors in multifactor transcriptional complexes. © 2013 Nature America, Inc. All rights reserved. Hide abstract
We have combined the circular chromosome conformation capture protocol with high-throughput, genome-wide sequence analysis to characterize the cis-acting regulatory network at a single locus. In contrast to methods which identify large interacting regions (10-1000 kb), the 4C approach provides a comprehensive, high-resolution analysis of a specific locus with the aim of defining, in detail, the cis-regulatory elements controlling a single gene or gene cluster. Using the human α-globin locus as a model, we detected all known local and long-range interactions with this gene cluster. In addition, we identified two interactions with genes located 300 kb (NME4) and 625 kb (FAM173a) from the α-globin cluster. Hide abstract
The role of DNA sequence in determining chromatin state is incompletely understood. We have previously demonstrated that large chromosomal segments from human cells recapitulate their native chromatin state in mouse cells, but the relative contribution of local sequences versus their genomic context remains unknown. In this study, we compare orthologous chromosomal regions for which the human locus establishes prominent sites of Polycomb complex recruitment in pluripotent stem cells, whereas the corresponding mouse locus does not. Using recombination-mediated cassette exchange at the mouse locus, we establish the primacy of local sequences in the encoding of chromatin state. We show that the signal for chromatin bivalency is redundantly encoded across a bivalent domain and that this reflects competition between Polycomb complex recruitment and transcriptional activation. Furthermore, our results suggest that a high density of unmethylated CpG dinucleotides is sufficient for vertebrate Polycomb recruitment. This model is supported by analysis of DNA methyltransferase-deficient embryonic stem cells. Hide abstract
BACKGROUND: In self-renewing, pluripotent cells, bivalent chromatin modification is thought to silence (H3K27me3) lineage control genes while 'poising' (H3K4me3) them for subsequent activation during differentiation, implying an important role for epigenetic modification in directing cell fate decisions. However, rather than representing an equivalently balanced epigenetic mark, the patterns and levels of histone modifications at bivalent genes can vary widely and the criteria for identifying this chromatin signature are poorly defined. RESULTS: Here, we initially show how chromatin status alters during lineage commitment and differentiation at a single well characterised bivalent locus. In addition we have determined how chromatin modifications at this locus change with gene expression in both ensemble and single cell analyses. We also show, on a global scale, how mRNA expression may be reflected in the ratio of H3K4me3/H3K27me3. CONCLUSIONS: While truly 'poised' bivalently modified genes may exist, the original hypothesis that all bivalent genes are epigenetically premarked for subsequent expression might be oversimplistic. In fact, from the data presented in the present work, it is equally possible that many genes that appear to be bivalent in pluripotent and multipotent cells may simply be stochastically expressed at low levels in the process of multilineage priming. Although both situations could be considered to be forms of 'poising', the underlying mechanisms and the associated implications are clearly different. Hide abstract
ATRX is an X-linked gene of the SWI/SNF family, mutations in which cause syndromal mental retardation and downregulation of α-globin expression. Here we show that ATRX binds to tandem repeat (TR) sequences in both telomeres and euchromatin. Genes associated with these TRs can be dysregulated when ATRX is mutated, and the change in expression is determined by the size of the TR, producing skewed allelic expression. This reveals the characteristics of the affected genes, explains the variable phenotypes seen with identical ATRX mutations, and illustrates a new mechanism underlying variable penetrance. Many of the TRs are G rich and predicted to form non-B DNA structures (including G-quadruplex) in vivo. We show that ATRX binds G-quadruplex structures in vitro, suggesting a mechanism by which ATRX may play a role in various nuclear processes and how this is perturbed when ATRX is mutated. Hide abstract
Coordination of cellular processes through the establishment of tissue-specific gene expression programs is essential for lineage maturation. The basic helix-loop-helix hemopoietic transcriptional regulator TAL1 (formerly SCL) is required for terminal differentiation of red blood cells. To gain insight into TAL1 function and mechanisms of action in erythropoiesis, we performed ChIP-sequencing and gene expression analyses from primary fetal liver erythroid cells. We show that TAL1 coordinates expression of genes in most known red cell-specific processes. The majority of TAL1's genomic targets require direct DNA-binding activity. However, one-fifth of TAL1's target sequences, mainly among those showing high affinity for TAL1, can recruit the factor independently of its DNA binding activity. An unbiased DNA motif search of sequences bound by TAL1 identified CAGNTG as TAL1-preferred E-box motif in erythroid cells. Novel motifs were also characterized that may help distinguish activated from repressed genes and suggest a new mechanism by which TAL1 may be recruited to DNA. Finally, analysis of recruitment of GATA1, a protein partner of TAL1, to sequences occupied by TAL1 suggests that TAL1's binding is necessary prior or simultaneous to that of GATA1. This work provides the framework to study regulatory networks leading to erythroid terminal maturation and to model mechanisms of action of tissue-specific transcription factors. Hide abstract
It is well established that all of the cis-acting sequences required for fully regulated human alpha-globin expression are contained within a region of approximately 120 kb of conserved synteny. Here, we show that activation of this cluster in erythroid cells dramatically affects expression of apparently unrelated and noncontiguous genes in the 500 kb surrounding this domain, including a gene (NME4) located 300 kb from the alpha-globin cluster. Changes in NME4 expression are mediated by physical cis-interactions between this gene and the alpha-globin regulatory elements. Polymorphic structural variation within the globin cluster, altering the number of alpha-globin genes, affects the pattern of NME4 expression by altering the competition for the shared alpha-globin regulatory elements. These findings challenge the concept that the genome is organized into discrete, insulated regulatory domains. In addition, this work has important implications for our understanding of genome evolution, the interpretation of genome-wide expression, expression-quantitative trait loci, and copy number variant analyses. Hide abstract
The interest in stem cell based therapies has emphasized the importance of understanding the cellular and molecular mechanisms by which stem cells are generated in ontogeny and maintained throughout adult life. Hematopoietic stem cells (HSCs) are first found in clusters of hematopoietic cells budding from the luminal wall of the major arteries in the developing mammalian embryo. The transcription factor Runx1 is critical for their generation and is specifically expressed at sites of HSC generation, prior to their formation. To understand better the transcriptional hierarchies that converge on Runx1 during HSC emergence, we have initiated studies into its transcriptional regulation. Here we systematically analyzed Runx1 P1 and P2 alternative promoter usage in hematopoietic sites and in sorted cell populations during mouse hematopoietic development. Our results indicate that Runx1 expression in primitive erythrocytes is largely P2-derived, whilst in definitive hematopoietic stem and/or progenitor cells from the yolk sac or AGM and vitelline and umbilical arteries both the distal P1 and proximal P2 promoters are active. After cells have migrated to the fetal liver, the P1 gradually becomes the main hematopoietic promoter and remains this into adulthood. In addition, we identified a novel P2-derived Runx1 isoform. Hide abstract
We describe a pathogenetic mechanism underlying a variant form of the inherited blood disorder alpha thalassemia. Association studies of affected individuals from Melanesia localized the disease trait to the telomeric region of human chromosome 16, which includes the alpha-globin gene cluster, but no molecular defects were detected by conventional approaches. After resequencing and using a combination of chromatin immunoprecipitation and expression analysis on a tiled oligonucleotide array, we identified a gain-of-function regulatory single-nucleotide polymorphism (rSNP) in a nongenic region between the alpha-globin genes and their upstream regulatory elements. The rSNP creates a new promoterlike element that interferes with normal activation of all downstream alpha-like globin genes. Thus, our work illustrates a strategy for distinguishing between neutral and functionally important rSNPs, and it also identifies a pathogenetic mechanism that could potentially underlie other genetic diseases. Hide abstract