Prof Jim R Hughes

Research Area: Cell and Molecular Biology
Technology Exchange: Bioinformatics
Scientific Themes: Genes, Genetics, Epigenetics & Genomics and Haematology
Keywords: Gene Regulation, Genomics, Capture-C and Bioinformatics
Web Links:
The epigenetic and transcriptional landscape of the Nprl3 locus in mouse erythroid cells, which contains the regulatory elements of the α globin genes (MCS-R1-R1 and DHS-12)

The epigenetic and transcriptional landscape of the Nprl3 locus in mouse erythroid cells, which ...

Basic classification of all of the active elements (Dnase1 Hypersensitive Sites or DHS) in mouse erythroid cells (Ter119+) into enhancer (blue rectangle) and promoter elements (red rectangle) based on the relative enrichment of two chromatin-associated modifications (H3K4me1 and H3K4me3). All panels show different data types associated with these elements in the same sort order and shows the high levels of erythroid transcription factor binding (Gata1, Scl, Klf1 and Ldb1) associated with enhancer elements compared to promoter elements in these cells. This emphasizes the functional relevance of enhancer elements in gene regulation in Ter119+ cells.

Basic classification of all of the active elements (Dnase1 Hypersensitive Sites or DHS) in mouse ...

Schematic representation of the Capture-C Chromatin Conformation Capture method.   Hughes et al Nat Genet, 46	(2), pp. 205-212 (2014)

Schematic representation of the Capture-C Chromatin Conformation Capture method. Hughes et al Nat ...

Molecular biology and the biological sciences in general have undergone a technical revolution over the last decade, founded upon the ability to sequence and reconstruct an organism's genomic blueprint in its entirety. Subsequent technical advances such as expression or tiled genomic microarrays and now high-throughput sequencing technologies (HTS) allow us to investigate, on the scale of the whole genome, how and in what situations particular parts of that blueprint are actually used. Although the biological questions remain the same as those asked at an individual gene or genomic loci, the methods to generate, analyse and combine these whole-genome data-types are different and require specialist approaches and skills.

One of the most fundamental questions in molecular biology is how are specific parts of the genomic blueprint used in specific situations when the same underlying genomic sequence is used whether a cell becomes a neuronal cell or a blood cell. The most basic expression of a genome's activity is the RNA it produces or "expresses" as in the form of mRNA this will go on to determine which proteins are produced in the cell. It has also become clear that RNA which does not produce protein (non-coding or ncRNA) also has a vital and complex regulatory function within the genome.

One of the main research interests of the Genome Biology group is to study the processes that determine whether RNA is or is not produced from a genomic locus as cells develop into red blood cells (erythropoiesis) and which factors determine the rate at which it is produced. We employ most of the current genome-wide methods to determine which parts of the genome are being transcribed into RNA (RNA-seq), investigating both the stable fractions (mRNA) and raw output of the genome (nascent). We correlate this activity (transcription) with changes in the distribution and chemical modifications of the nucleosomal proteins associated with genomic DNA (DNase-seq and ChIP-seq) and which regulatory proteins (transcription factor ChIP-seq) are bound to the DNA, in an effort to determine how these changes regulate RNA expression (Figure 1 and 2).

Although we use many existing methodologies the group also develops novel assays where needed to fill many of the current deficiencies in our ability to assess genome behaviour.  One of the most difficult problems when trying to understand gene regulation on the scale of genome or at individual genes is to determine which regions of the genome control the expression patterns and levels of any particular gene.  To address this problem we developed the Capture-C 3C method which allows us to interrogate the regulatory landscapes of hundreds of genes in a single experiment (Figure 3).  We are now using the Capture-C method, in combination with our genomics and transcriptomics data, to link genes and regulatory elements en masse in the erythroid system.

We are at present trying to determine which parts of the surrounding genome are functionally required to regulate the transcription of a particular gene or transcript (cis-regulatory elements) and how the molecular events at these regions lead to production of RNA at a remote gene promoter. This represents a fundamental lack in our current understanding of gene regulation and is a necessary step to a complete understanding of this process.

Due to the size and complexity of the datasets produced the group is heavily reliant on bioinformatics to analyse and correlate these data and has a lot of experience in using and developing these types of tools in its own right and as a strong collaboration with the Oxford Computational Biology Research Group (CBRG). The group is very collaborative in structure and works closely with other groups within the MHU department in particular and the WIMM as a whole. This efficient structure allows observations derived from genome-wide observations to be functionally tested in well-understood paradigms of gene regulation such as the α globin locus and facilitates the genome-wide analysis of concepts gained from the careful interrogation of the model loci.

 

Name Department Institution Country
Prof Doug Higgs FRS Nuffield Division of Clinical Laboratory Sciences Oxford University, Weatherall Institute of Molecular Medicine United Kingdom
Prof Thomas Milne Nuffield Division of Clinical Laboratory Sciences Oxford University, Weatherall Institute of Molecular Medicine United Kingdom
Prof Veronica J Buckle Weatherall Institute of Molecular Medicine Oxford University, Weatherall Institute of Molecular Medicine United Kingdom
Paralkar VR, Taborda CC, Huang P, Yao Y, Kossenkov AV, Prasad R, Luan J, Davies JO, Hughes JR, Hardison RC et al. 2016. Unlinking an lncRNA from Its Associated cis Element. Mol Cell, 62 (1), pp. 104-110. | Show Abstract | Read more

Long non-coding (lnc) RNAs can regulate gene expression and protein functions. However, the proportion of lncRNAs with biological activities among the thousands expressed in mammalian cells is controversial. We studied Lockd (lncRNA downstream of Cdkn1b), a 434-nt polyadenylated lncRNA originating 4 kb 3' to the Cdkn1b gene. Deletion of the 25-kb Lockd locus reduced Cdkn1b transcription by approximately 70% in an erythroid cell line. In contrast, homozygous insertion of a polyadenylation cassette 80 bp downstream of the Lockd transcription start site reduced the entire lncRNA transcript level by >90% with no effect on Cdkn1b transcription. The Lockd promoter contains a DNase-hypersensitive site, binds numerous transcription factors, and physically associates with the Cdkn1b promoter in chromosomal conformation capture studies. Therefore, the Lockd gene positively regulates Cdkn1b transcription through an enhancer-like cis element, whereas the lncRNA itself is dispensable, which may be the case for other lncRNAs.

Davies JO, Telenius JM, McGowan SJ, Roberts NA, Taylor S, Higgs DR, Hughes JR. 2016. Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat Methods, 13 (1), pp. 74-80. | Show Abstract | Read more

Methods for analyzing chromosome conformation in mammalian cells are either low resolution or low throughput and are technically challenging. In next-generation (NG) Capture-C, we have redesigned the Capture-C method to achieve unprecedented levels of sensitivity and reproducibility. NG Capture-C can be used to analyze many genetic loci and samples simultaneously. High-resolution data can be produced with as few as 100,000 cells, and single-nucleotide polymorphisms can be used to generate allele-specific tracks. The method is straightforward to perform and should greatly facilitate the investigation of many questions related to gene regulation as well as the functional dissection of traits examined in genome-wide association studies.

Voon HPJ, Hughes JR, Rode C, DeLaRosa-Velázquez IA, Jenuwein T, Feil R, Higgs DR, Gibbons RJ. 2015. ATRX Plays a Key Role in Maintaining Silencing at Interstitial Heterochromatic Loci and Imprinted Genes Cell Reports, 11 (3), pp. 405-418. | Show Abstract | Read more

© 2015 The Authors.Histone H3.3 is a replication-independent histone variant, which replaces histones that are turned over throughout the entire cell cycle. H3.3 deposition at euchromatin is dependent on HIRA, whereas ATRX/Daxx deposits H3.3 at pericentric heterochromatin and telomeres. The role of H3.3 at heterochromatic regions is unknown, but mutations in the ATRX/Daxx/H3.3 pathway are linked to aberrant telomere lengthening in certain cancers. In this study, we show that ATRX-dependent deposition of H3.3 is not limited to pericentric heterochromatin and telomeres but also occurs at heterochromatic sites throughout the genome. Notably, ATRX/H3.3 specifically localizes to silenced imprinted alleles in mouse ESCs. ATRX KO cells failed to deposit H3.3 at these sites, leading to loss of the H3K9me3 heterochromatin modification, loss of repression, and aberrant allelic expression. We propose a model whereby ATRX-dependent deposition of H3.3 into heterochromatin is normally required to maintain the memory of silencing at imprinted loci.

Voon HP, Hughes JR, Rode C, De La Rosa-Velázquez IA, Jenuwein T, Feil R, Higgs DR, Gibbons RJ. 2015. ATRX Plays a Key Role in Maintaining Silencing at Interstitial Heterochromatic Loci and Imprinted Genes. Cell Rep, 11 (3), pp. 405-418. | Show Abstract | Read more

Histone H3.3 is a replication-independent histone variant, which replaces histones that are turned over throughout the entire cell cycle. H3.3 deposition at euchromatin is dependent on HIRA, whereas ATRX/Daxx deposits H3.3 at pericentric heterochromatin and telomeres. The role of H3.3 at heterochromatic regions is unknown, but mutations in the ATRX/Daxx/H3.3 pathway are linked to aberrant telomere lengthening in certain cancers. In this study, we show that ATRX-dependent deposition of H3.3 is not limited to pericentric heterochromatin and telomeres but also occurs at heterochromatic sites throughout the genome. Notably, ATRX/H3.3 specifically localizes to silenced imprinted alleles in mouse ESCs. ATRX KO cells failed to deposit H3.3 at these sites, leading to loss of the H3K9me3 heterochromatin modification, loss of repression, and aberrant allelic expression. We propose a model whereby ATRX-dependent deposition of H3.3 into heterochromatin is normally required to maintain the memory of silencing at imprinted loci.

Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E, Lynch M, De Gobbi M, Taylor S, Gibbons R, Higgs DR. 2014. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet, 46 (2), pp. 205-212. | Show Abstract | Read more

Gene expression during development and differentiation is regulated in a cell- and stage-specific manner by complex networks of intergenic and intragenic cis-regulatory elements whose numbers and representation in the genome far exceed those of structural genes. Using chromosome conformation capture, it is now possible to analyze in detail the interaction between enhancers, silencers, boundary elements and promoters at individual loci, but these techniques are not readily scalable. Here we present a high-throughput approach (Capture-C) to analyze cis interactions, interrogating hundreds of specific interactions at high resolution in a single experiment. We show how this approach will facilitate detailed, genome-wide analysis to elucidate the general principles by which cis-acting sequences control gene expression. In addition, we show how Capture-C will expedite identification of the target genes and functional effects of SNPs that are associated with complex diseases, which most frequently lie in intergenic cis-acting regulatory elements.

Marques AC, Hughes J, Graham B, Kowalczyk MS, Higgs DR, Ponting CP. 2013. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol, 14 (11), pp. R131. | Show Abstract | Read more

BACKGROUND: Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. RESULTS: We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5' capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. CONCLUSIONS: We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci.

McGowan SJ, Hughes JR, Han ZP, Taylor S. 2013. MIG: Multi-Image Genome viewer. Bioinformatics, 29 (19), pp. 2477-2478. | Show Abstract | Read more

SUMMARY: Multi-Image Genome (MIG) viewer is a web-based application for visualizing, querying and filtering many thousands of genome browser regions as well as for exporting the data in a variety of formats. This methodology has been used successfully to analyze ChIP-Seq data and RNA-Seq data and to detect somatic mutations in genome resequencing projects. AVAILABILITY: MIG is available at https://mig.molbiol.ox.ac.uk/mig/

Hosseini M, Goodstadt L, Hughes JR, Kowalczyk MS, de Gobbi M, Otto GW, Copley RR, Mott R, Higgs DR, Flint J. 2013. Causes and consequences of chromatin variation between inbred mice. PLoS Genet, 9 (6), pp. e1003570. | Show Abstract | Read more

Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits.

Lower KM, De Gobbi M, Hughes JR, Derry CJ, Ayyub H, Sloane-Stanley JA, Vernimmen D, Garrick D, Gibbons RJ, Higgs DR. 2013. Analysis of sequence variation underlying tissue-specific transcription factor binding and gene expression. Hum Mutat, 34 (8), pp. 1140-1148. | Show Abstract | Read more

Although mutations causing monogenic disorders most frequently lie within the affected gene, sequence variation in complex disorders is more commonly found in noncoding regions. Furthermore, recent genome- wide studies have shown that common DNA sequence variants in noncoding regions are associated with "normal" variation in gene expression resulting in cell-specific and/or allele-specific differences. The mechanism by which such sequence variation causes changes in gene expression is largely unknown. We have addressed this by studying natural variation in the binding of key transcription factors (TFs) in the well-defined, purified cell system of erythropoiesis. We have shown that common polymorphisms frequently directly perturb the binding sites of key TFs, and detailed analysis shows how this causes considerable (~10-fold) changes in expression from a single allele in a tissue-specific manner. We also show how a SNP, located at some distance from the recognized TF binding site, may affect the recruitment of a large multiprotein complex and alter the associated chromatin modification of the variant regulatory element. This study illustrates the principles by which common sequence variation may cause changes in tissue-specific gene expression, and suggests that such variation may underlie an individual's propensity to develop complex human genetic diseases.

Cited:

40

Scopus

Twigg SRF, Vorgia E, Mcgowan SJ, Peraki I, Fenwick AL, Sharma VP, Allegra M, Zaragkoulias A, Akha ES, Knight SJL et al. 2013. Reduced dosage of ERF causes complex craniosynostosis in humans and mice and links ERK1/2 signaling to regulation of osteogenesis Nature Genetics, 45 (3), pp. 308-313. | Show Abstract | Read more

The extracellular signal-related kinases 1 and 2 (ERK1/2) are key proteins mediating mitogen-activated protein kinase signaling downstream of RAS: phosphorylation of ERK1/2 leads to nuclear uptake and modulation of multiple targets. Here, we show that reduced dosage of ERF, which encodes an inhibitory ETS transcription factor directly bound by ERK1/2 (refs. 2,3,4,5,6,7), causes complex craniosynostosis (premature fusion of the cranial sutures) in humans and mice. Features of this newly recognized clinical disorder include multiple-suture synostosis, craniofacial dysmorphism, Chiari malformation and language delay. Mice with functional Erf levels reduced to ∼30% of normal exhibit postnatal multiple-suture synostosis; by contrast, embryonic calvarial development appears mildly delayed. Using chromatin immunoprecipitation in mouse embryonic fibroblasts and high-throughput sequencing, we find that ERF binds preferentially to elements away from promoters that contain RUNX or AP-1 motifs. This work identifies ERF as a novel regulator of osteogenic stimulation by RAS-ERK signaling, potentially by competing with activating ETS factors in multifactor transcriptional complexes. © 2013 Nature America, Inc. All rights reserved.

Hughes JR, Lower KM, Dunham I, Taylor S, De Gobbi M, Sloane-Stanley JA, McGowan S, Ragoussis J, Vernimmen D, Gibbons RJ, Higgs DR. 2013. High-resolution analysis of cis-acting regulatory networks at the α-globin locus Philosophical transactions of the Royal Society of London. Series B, Biological sciences, 368 (1620), pp. 20120361. | Show Abstract | Read more

We have combined the circular chromosome conformation capture protocol with high-throughput, genome-wide sequence analysis to characterize the cis-acting regulatory network at a single locus. In contrast to methods which identify large interacting regions (10-1000 kb), the 4C approach provides a comprehensive, high-resolution analysis of a specific locus with the aim of defining, in detail, the cis-regulatory elements controlling a single gene or gene cluster. Using the human α-globin locus as a model, we detected all known local and long-range interactions with this gene cluster. In addition, we identified two interactions with genes located 300 kb (NME4) and 625 kb (FAM173a) from the α-globin cluster.

Hughes JR, Lower KM, Dunham I, Taylor S, De Gobbi M, Sloane-Stanley JA, McGowan S, Ragoussis J, Vernimmen D, Gibbons RJ, Higgs DR. 2013. High-resolution analysis of cis-acting regulatory networks at the α-globin locus. Philos Trans R Soc Lond B Biol Sci, 368 (1620), pp. 20120361. | Show Abstract | Read more

We have combined the circular chromosome conformation capture protocol with high-throughput, genome-wide sequence analysis to characterize the cis-acting regulatory network at a single locus. In contrast to methods which identify large interacting regions (10-1000 kb), the 4C approach provides a comprehensive, high-resolution analysis of a specific locus with the aim of defining, in detail, the cis-regulatory elements controlling a single gene or gene cluster. Using the human α-globin locus as a model, we detected all known local and long-range interactions with this gene cluster. In addition, we identified two interactions with genes located 300 kb (NME4) and 625 kb (FAM173a) from the α-globin cluster.

Kowalczyk MS, Hughes JR, Babbs C, Sanchez-Pulido L, Szumska D, Sharpe JA, Sloane-Stanley JA, Morriss-Kay GM, Smoot LB, Roberts AE et al. 2012. Nprl3 is required for normal development of the cardiovascular system. Mamm Genome, 23 (7-8), pp. 404-415. | Show Abstract | Read more

C16orf35 is a conserved and widely expressed gene lying adjacent to the human α-globin cluster in all vertebrate species. In-depth sequence analysis shows that C16orf35 (now called NPRL3) is an orthologue of the yeast gene Npr3 (nitrogen permease regulator 3) and, furthermore, is a paralogue of its protein partner Npr2. The yeast Npr2/3 dimeric protein complex senses amino acid starvation and appropriately adjusts cell metabolism via the TOR pathway. Here we have analysed a mouse model in which expression of Nprl3 has been abolished using homologous recombination. The predominant effect on RNA expression appears to involve genes that regulate protein synthesis and cell cycle, consistent with perturbation of the mTOR pathway. Embryos homozygous for this mutation die towards the end of gestation with a range of cardiovascular defects, including outflow tract abnormalities and ventriculoseptal defects consistent with previous observations, showing that perturbation of the mTOR pathway may affect development of the myocardium. NPRL3 is a candidate gene for harbouring mutations in individuals with developmental abnormalities of the cardiovascular system.

Kowalczyk MS, Hughes JR, Garrick D, Lynch MD, Sharpe JA, Sloane-Stanley JA, McGowan SJ, De Gobbi M, Hosseini M, Vernimmen D et al. 2012. Intragenic enhancers act as alternative promoters. Mol Cell, 45 (4), pp. 447-458. | Show Abstract | Read more

A substantial amount of organismal complexity is thought to be encoded by enhancers which specify the location, timing, and levels of gene expression. In mammals there are more enhancers than promoters which are distributed both between and within genes. Here we show that activated, intragenic enhancers frequently act as alternative tissue-specific promoters producing a class of abundant, spliced, multiexonic poly(A)(+) RNAs (meRNAs) which reflect the host gene's structure. meRNAs make a substantial and unanticipated contribution to the complexity of the transcriptome, appearing as alternative isoforms of the host gene. The low protein-coding potential of meRNAs suggests that many meRNAs may be byproducts of enhancer activation or underlie as-yet-unidentified RNA-encoded functions. Distinguishing between meRNAs and mRNAs will transform our interpretation of dynamic changes in transcription both at the level of individual genes and of the genome as a whole.

Lynch MD, Smith AJ, De Gobbi M, Flenley M, Hughes JR, Vernimmen D, Ayyub H, Sharpe JA, Sloane-Stanley JA, Sutherland L et al. 2012. An interspecies analysis reveals a key role for unmethylated CpG dinucleotides in vertebrate Polycomb complex recruitment. EMBO J, 31 (2), pp. 317-329. | Show Abstract | Read more

The role of DNA sequence in determining chromatin state is incompletely understood. We have previously demonstrated that large chromosomal segments from human cells recapitulate their native chromatin state in mouse cells, but the relative contribution of local sequences versus their genomic context remains unknown. In this study, we compare orthologous chromosomal regions for which the human locus establishes prominent sites of Polycomb complex recruitment in pluripotent stem cells, whereas the corresponding mouse locus does not. Using recombination-mediated cassette exchange at the mouse locus, we establish the primacy of local sequences in the encoding of chromatin state. We show that the signal for chromatin bivalency is redundantly encoded across a bivalent domain and that this reflects competition between Polycomb complex recruitment and transcriptional activation. Furthermore, our results suggest that a high density of unmethylated CpG dinucleotides is sufficient for vertebrate Polycomb recruitment. This model is supported by analysis of DNA methyltransferase-deficient embryonic stem cells.

De Gobbi M, Garrick D, Lynch M, Vernimmen D, Hughes JR, Goardon N, Luc S, Lower KM, Sloane-Stanley JA, Pina C et al. 2011. Generation of bivalent chromatin domains during cell fate decisions. Epigenetics Chromatin, 4 (1), pp. 9. | Show Abstract | Read more

BACKGROUND: In self-renewing, pluripotent cells, bivalent chromatin modification is thought to silence (H3K27me3) lineage control genes while 'poising' (H3K4me3) them for subsequent activation during differentiation, implying an important role for epigenetic modification in directing cell fate decisions. However, rather than representing an equivalently balanced epigenetic mark, the patterns and levels of histone modifications at bivalent genes can vary widely and the criteria for identifying this chromatin signature are poorly defined. RESULTS: Here, we initially show how chromatin status alters during lineage commitment and differentiation at a single well characterised bivalent locus. In addition we have determined how chromatin modifications at this locus change with gene expression in both ensemble and single cell analyses. We also show, on a global scale, how mRNA expression may be reflected in the ratio of H3K4me3/H3K27me3. CONCLUSIONS: While truly 'poised' bivalently modified genes may exist, the original hypothesis that all bivalent genes are epigenetically premarked for subsequent expression might be oversimplistic. In fact, from the data presented in the present work, it is equally possible that many genes that appear to be bivalent in pluripotent and multipotent cells may simply be stochastically expressed at low levels in the process of multilineage priming. Although both situations could be considered to be forms of 'poising', the underlying mechanisms and the associated implications are clearly different.

Law MJ, Lower KM, Voon HP, Hughes JR, Garrick D, Viprakasit V, Mitson M, De Gobbi M, Marra M, Morris A et al. 2010. ATR-X syndrome protein targets tandem repeats and influences allele-specific expression in a size-dependent manner. Cell, 143 (3), pp. 367-378. | Show Abstract | Read more

ATRX is an X-linked gene of the SWI/SNF family, mutations in which cause syndromal mental retardation and downregulation of α-globin expression. Here we show that ATRX binds to tandem repeat (TR) sequences in both telomeres and euchromatin. Genes associated with these TRs can be dysregulated when ATRX is mutated, and the change in expression is determined by the size of the TR, producing skewed allelic expression. This reveals the characteristics of the affected genes, explains the variable phenotypes seen with identical ATRX mutations, and illustrates a new mechanism underlying variable penetrance. Many of the TRs are G rich and predicted to form non-B DNA structures (including G-quadruplex) in vivo. We show that ATRX binds G-quadruplex structures in vitro, suggesting a mechanism by which ATRX may play a role in various nuclear processes and how this is perturbed when ATRX is mutated.

Kassouf MT, Hughes JR, Taylor S, McGowan SJ, Soneji S, Green AL, Vyas P, Porcher C. 2010. Genome-wide identification of TAL1's functional targets: insights into its mechanisms of action in primary erythroid cells. Genome Res, 20 (8), pp. 1064-1083. | Show Abstract | Read more

Coordination of cellular processes through the establishment of tissue-specific gene expression programs is essential for lineage maturation. The basic helix-loop-helix hemopoietic transcriptional regulator TAL1 (formerly SCL) is required for terminal differentiation of red blood cells. To gain insight into TAL1 function and mechanisms of action in erythropoiesis, we performed ChIP-sequencing and gene expression analyses from primary fetal liver erythroid cells. We show that TAL1 coordinates expression of genes in most known red cell-specific processes. The majority of TAL1's genomic targets require direct DNA-binding activity. However, one-fifth of TAL1's target sequences, mainly among those showing high affinity for TAL1, can recruit the factor independently of its DNA binding activity. An unbiased DNA motif search of sequences bound by TAL1 identified CAGNTG as TAL1-preferred E-box motif in erythroid cells. Novel motifs were also characterized that may help distinguish activated from repressed genes and suggest a new mechanism by which TAL1 may be recruited to DNA. Finally, analysis of recruitment of GATA1, a protein partner of TAL1, to sequences occupied by TAL1 suggests that TAL1's binding is necessary prior or simultaneous to that of GATA1. This work provides the framework to study regulatory networks leading to erythroid terminal maturation and to model mechanisms of action of tissue-specific transcription factors.

Lower KM, Hughes JR, De Gobbi M, Henderson S, Viprakasit V, Fisher C, Goriely A, Ayyub H, Sloane-Stanley J, Vernimmen D et al. 2009. Adventitious changes in long-range gene expression caused by polymorphic structural variation and promoter competition. Proc Natl Acad Sci U S A, 106 (51), pp. 21771-21776. | Show Abstract | Read more

It is well established that all of the cis-acting sequences required for fully regulated human alpha-globin expression are contained within a region of approximately 120 kb of conserved synteny. Here, we show that activation of this cluster in erythroid cells dramatically affects expression of apparently unrelated and noncontiguous genes in the 500 kb surrounding this domain, including a gene (NME4) located 300 kb from the alpha-globin cluster. Changes in NME4 expression are mediated by physical cis-interactions between this gene and the alpha-globin regulatory elements. Polymorphic structural variation within the globin cluster, altering the number of alpha-globin genes, affects the pattern of NME4 expression by altering the competition for the shared alpha-globin regulatory elements. These findings challenge the concept that the genome is organized into discrete, insulated regulatory domains. In addition, this work has important implications for our understanding of genome evolution, the interpretation of genome-wide expression, expression-quantitative trait loci, and copy number variant analyses.

Bee T, Liddiard K, Swiers G, Bickley SR, Vink CS, Jarratt A, Hughes JR, Medvinsky A, de Bruijn MF. 2009. Alternative Runx1 promoter usage in mouse developmental hematopoiesis. Blood Cells Mol Dis, 43 (1), pp. 35-42. | Show Abstract | Read more

The interest in stem cell based therapies has emphasized the importance of understanding the cellular and molecular mechanisms by which stem cells are generated in ontogeny and maintained throughout adult life. Hematopoietic stem cells (HSCs) are first found in clusters of hematopoietic cells budding from the luminal wall of the major arteries in the developing mammalian embryo. The transcription factor Runx1 is critical for their generation and is specifically expressed at sites of HSC generation, prior to their formation. To understand better the transcriptional hierarchies that converge on Runx1 during HSC emergence, we have initiated studies into its transcriptional regulation. Here we systematically analyzed Runx1 P1 and P2 alternative promoter usage in hematopoietic sites and in sorted cell populations during mouse hematopoietic development. Our results indicate that Runx1 expression in primitive erythrocytes is largely P2-derived, whilst in definitive hematopoietic stem and/or progenitor cells from the yolk sac or AGM and vitelline and umbilical arteries both the distal P1 and proximal P2 promoters are active. After cells have migrated to the fetal liver, the P1 gradually becomes the main hematopoietic promoter and remains this into adulthood. In addition, we identified a novel P2-derived Runx1 isoform.

Ballabio E, Cantarella CD, Federico C, Di Mare P, Hall G, Harbott J, Hughes J, Saccone S, Tosi S. 2009. Ectopic expression of the HLXB9 gene is associated with an altered nuclear position in t(7;12) leukaemias. Leukemia, 23 (6), pp. 1179-1182. | Read more

Brown JM, Green J, das Neves RP, Wallace HA, Smith AJ, Hughes J, Gray N, Taylor S, Wood WG, Higgs DR et al. 2008. Association between active genes occurs at nuclear speckles and is modulated by chromatin environment. J Cell Biol, 182 (6), pp. 1083-1097. | Show Abstract | Read more

Genes on different chromosomes can be spatially associated in the nucleus in several transcriptional and regulatory situations; however, the functional significance of such associations remains unclear. Using human erythropoiesis as a model, we show that five cotranscribed genes, which are found on four different chromosomes, associate with each other at significant but variable frequencies. Those genes most frequently in association lie in decondensed stretches of chromatin. By replacing the mouse alpha-globin gene cluster in situ with its human counterpart, we demonstrate a direct effect of the regional chromatin environment on the frequency of association, whereas nascent transcription from the human alpha-globin gene appears unaffected. We see no evidence that cotranscribed erythroid genes associate at shared transcription foci, but we do see stochastic clustering of active genes around common nuclear SC35-enriched speckles (hence the apparent nonrandom association between genes). Thus, association between active genes may result from their location on decondensed chromatin that enables clustering around common nuclear speckles.

De Gobbi M, Anguita E, Hughes J, Sloane-Stanley JA, Sharpe JA, Koch CM, Dunham I, Gibbons RJ, Wood WG, Higgs DR. 2007. Tissue-specific histone modification and transcription factor binding in alpha globin gene expression. Blood, 110 (13), pp. 4503-4510. | Show Abstract | Read more

To address the mechanism by which the human globin genes are activated during erythropoiesis, we have used a tiled microarray to analyze the pattern of transcription factor binding and associated histone modifications across the telomeric region of human chromosome 16 in primary erythroid and nonerythroid cells. This 220-kb region includes the alpha globin genes and 9 widely expressed genes flanking the alpha globin locus. This un-biased, comprehensive analysis of transcription factor binding and histone modifications (acetylation and methylation) described here not only identified all known cis-acting regulatory elements in the human alpha globin cluster but also demonstrated that there are no additional erythroid-specific regulatory elements in the 220-kb region tested. In addition, the pattern of histone modification distinguished promoter elements from potential enhancer elements across this region. Finally, comparison of the human and mouse orthologous regions in a unique mouse model, with both regions coexpressed in the same animal, showed significant differences that may explain how these 2 clusters are regulated differently in vivo.

Giardine B, Riemer C, Hefferon T, Thomas D, Hsu F, Zielenski J, Sang Y, Elnitski L, Cutting G, Trumbower H et al. 2007. PhenCode: connecting ENCODE data with mutations and phenotype. Hum Mutat, 28 (6), pp. 554-562. | Show Abstract | Read more

PhenCode (Phenotypes for ENCODE; http://www.bx.psu.edu/phencode) is a collaborative, exploratory project to help understand phenotypes of human mutations in the context of sequence and functional data from genome projects. Currently, it connects human phenotype and clinical data in various locus-specific databases (LSDBs) with data on genome sequences, evolutionary history, and function from the ENCODE project and other resources in the UCSC Genome Browser. Initially, we focused on a few selected LSDBs covering genes encoding alpha- and beta-globins (HBA, HBB), phenylalanine hydroxylase (PAH), blood group antigens (various genes), androgen receptor (AR), cystic fibrosis transmembrane conductance regulator (CFTR), and Bruton's tyrosine kinase (BTK), but we plan to include additional loci of clinical importance, ultimately genomewide. We have also imported variant data and associated OMIM links from Swiss-Prot. Users can find interesting mutations in the UCSC Genome Browser (in a new Locus Variants track) and follow links back to the LSDBs for more detailed information. Alternatively, they can start with queries on mutations or phenotypes at an LSDB and then display the results at the Genome Browser to view complementary information such as functional data (e.g., chromatin modifications and protein binding from the ENCODE consortium), evolutionary constraint, regulatory potential, and/or any other tracks they choose. We present several examples illustrating the power of these connections for exploring phenotypes associated with functional elements, and for identifying genomic data that could help to explain clinical phenotypes.

Wallace HA, Marques-Kranc F, Richardson M, Luna-Crespo F, Sharpe JA, Hughes J, Wood WG, Higgs DR, Smith AJ. 2007. Manipulating the mouse genome to engineer precise functional syntenic replacements with human sequence. Cell, 128 (1), pp. 197-209. | Show Abstract | Read more

We have devised a strategy (called recombinase-mediated genomic replacement, RMGR) to allow the replacement of large segments (>100 kb) of the mouse genome with the equivalent human syntenic region. The technique involves modifying a mouse ES cell chromosome and a human BAC by inserting heterotypic lox sites to flank the proposed exchange interval and then using Cre recombinase to achieve segmental exchange. We have demonstrated the feasibility of this approach by replacing the mouse alpha globin regulatory domain with the human syntenic region and generating homozygous mice that produce only human alpha globin chains. Furthermore, modified ES cells can be used iteratively for functional studies, and here, as an example, we have used RMGR to produce an accurate mouse model of human alpha thalassemia. RMGR has general applicability and will overcome limitations inherent in current transgenic technology when studying the expression of human genes and modeling human genetic diseases.

Higgs DR, Vernimmen D, Hughes J, Gibbons R. 2007. Using genomics to study how chromatin influences gene expression. Annu Rev Genomics Hum Genet, 8 (1), pp. 299-325. | Show Abstract | Read more

A postgenome challenge is to understand how the code in DNA is converted into the biological processes underlying various cell fates. By establishing the appropriate technical tools, we are moving from an era in which such questions have been asked by studying individual genes to one in which large domains, whole chromosomes, and the entire human genome can be investigated. These developments will allow us to study in parallel the transcriptional program and components of the epigenetic program (nuclear position, timing of replication, chromatin structure and modification, DNA methylation) to determine the hierarchy and order of events required to switch genes on and off during differentiation and development.

De Gobbi M, Viprakasit V, Hughes JR, Fisher C, Buckle VJ, Ayyub H, Gibbons RJ, Vernimmen D, Yoshinaga Y, de Jong P et al. 2006. A regulatory SNP causes a human genetic disease by creating a new transcriptional promoter. Science, 312 (5777), pp. 1215-1217. | Show Abstract | Read more

We describe a pathogenetic mechanism underlying a variant form of the inherited blood disorder alpha thalassemia. Association studies of affected individuals from Melanesia localized the disease trait to the telomeric region of human chromosome 16, which includes the alpha-globin gene cluster, but no molecular defects were detected by conventional approaches. After resequencing and using a combination of chromatin immunoprecipitation and expression analysis on a tiled oligonucleotide array, we identified a gain-of-function regulatory single-nucleotide polymorphism (rSNP) in a nongenic region between the alpha-globin genes and their upstream regulatory elements. The rSNP creates a new promoterlike element that interferes with normal activation of all downstream alpha-like globin genes. Thus, our work illustrates a strategy for distinguishing between neutral and functionally important rSNPs, and it also identifies a pathogenetic mechanism that could potentially underlie other genetic diseases.

Higgs DR, Vernimmen D, De Gobbi M, Anguita E, Hughes J, Buckle V, Iborra F, Garrick D, Wood WG. 2006. How transcriptional and epigenetic programmes are played out on an individual mammalian gene cluster during lineage commitment and differentiation. Biochem Soc Symp, 73 (73), pp. 11-22. | Show Abstract | Read more

In the post-genomic era, a great deal of work has focused on understanding how DNA sequence is used to programme complex nuclear, cellular and tissue functions throughout differentiation and development. There are many approaches to these issues, but we have concentrated on understanding how a single mammalian gene cluster is activated or silenced as stem cells undergo lineage commitment, differentiation and maturation. In particular we have analysed the alpha globin cluster, which is expressed in a cell-type- and developmental stage-specific manner in the haemopoietic system. Our studies include analysis of the transcriptional programme that accompanies globin gene activation, focusing on the expression of relevant transcription factors and cofactors. Binding of these factors to the chromosomal domain containing the alpha globin cluster has been characterized by ChIP (chromatin immunoprecipitation). In addition, we have monitored the epigenetic modifications (e.g. nuclear position, timing of replication, chromatin modification, DNA methylation) that occur as the genes are activated (in erythroid cells) or silenced (e.g. in granulocytes) as haemopoiesis proceeds. Together, these observations provide a uniquely well-characterized model illustrating the mechanisms that regulate and memorize patterns of mammalian gene expression as stem cells undergo lineage specification, differentiation and terminal maturation.

Hughes JR, Cheng JF, Ventress N, Prabhakar S, Clark K, Anguita E, De Gobbi M, de Jong P, Rubin E, Higgs DR. 2005. Annotation of cis-regulatory elements by identification, subclassification, and functional assessment of multispecies conserved sequences. Proc Natl Acad Sci U S A, 102 (28), pp. 9830-9835. | Show Abstract | Read more

An important step toward improving the annotation of the human genome is to identify cis-acting regulatory elements from primary DNA sequence. One approach is to compare sequences from multiple, divergent species. This approach distinguishes multispecies conserved sequences (MCS) in noncoding regions from more rapidly evolving neutral DNA. Here, we have analyzed a region of approximately 238kb containing the human alpha globin cluster that was sequenced and/or annotated across the syntenic region in 22 species spanning 500 million years of evolution. Using a variety of bioinformatic approaches and correlating the results with many aspects of chromosome structure and function in this region, we were able to identify and evaluate the importance of 24 individual MCSs. This approach sensitively and accurately identified previously characterized regulatory elements but also discovered unidentified promoters, exons, splicing, and transcriptional regulatory elements. Together, these studies demonstrate an integrated approach by which to identify, subclassify, and predict the potential importance of MCSs.

Higgs DR, Garrick D, Anguita E, De Gobbi M, Hughes J, Muers M, Vernimmen D, Lower K, Law M, Argentaro A et al. 2005. Understanding alpha-globin gene regulation: Aiming to improve the management of thalassemia. Ann N Y Acad Sci, 1054 (1), pp. 92-102. | Show Abstract | Read more

Over the past 50 years, many advances in our understanding of the general principles controlling gene expression during hematopoiesis have come from studying the synthesis of hemoglobin. Discovering how the alpha- and beta-globin genes are normally regulated and documenting the effects of inherited mutations that cause thalassemia have played a major role in establishing our current understanding of how genes are switched on or off in hematopoietic cells. Previously, nearly all mutations causing thalassemia have been found in or around the globin loci, but rare inherited and acquired trans-acting mutations are being found more often. Such mutations have demonstrated new mechanisms underlying human genetic disease. Furthermore, they are revealing new pathways in the regulation of globin gene expression that, in turn, may open up new avenues for improving the management of patients with common types of thalassemia.

Anguita E, Hughes J, Heyworth C, Blobel GA, Wood WG, Higgs DR. 2004. Globin gene activation during haemopoiesis is driven by protein complexes nucleated by GATA-1 and GATA-2. EMBO J, 23 (14), pp. 2841-2852. | Show Abstract | Read more

How does an emerging transcriptional programme regulate individual genes as stem cells undergo lineage commitment, differentiation and maturation? To answer this, we have analysed the dynamic protein/DNA interactions across 130 kb of chromatin containing the mouse alpha-globin cluster in cells representing all stages of differentiation from stem cells to mature erythroblasts. The alpha-gene cluster appears to be inert in pluripotent cells, but priming of expression begins in multipotent haemopoietic progenitors via GATA-2. In committed erythroid progenitors, GATA-2 is replaced by GATA-1 and binding is extended to additional sites including the alpha-globin promoters. Both GATA-1 and GATA-2 nucleate the binding of various protein complexes including SCL/LMO2/E2A/Ldb-1 and NF-E2. Changes in protein/DNA binding are accompanied by sequential alterations in long-range histone acetylation and methylation. The recruitment of polymerase II, which ultimately leads to a rapid increase in alpha-globin transcription, occurs late in maturation. These studies provide detailed evidence for the more general hypothesis that commitment and differentiation are primarily driven by the sequential appearance of key transcriptional factors, which bind chromatin at specific, high-affinity sites.

Tufarelli C, Hardison R, Miller W, Hughes J, Clark K, Ventress N, Frischauf AM, Higgs DR. 2004. Comparative analysis of the alpha-like globin clusters in mouse, rat, and human chromosomes indicates a mechanism underlying breaks in conserved synteny. Genome Res, 14 (4), pp. 623-630. | Show Abstract | Read more

We have sequenced and fully annotated a 65,871-bp region of mouse Chromosome 17 including the Hba-ps4 alpha-globin pseudogene. Comparative sequence analysis with the functional alpha-globin loci at human Chromosome 16p13.3 and mouse Chromosome 11 shows that this segment of mouse Chromosome 17 contains a group of three alpha-like pseudogenes (Hba-psm-Hba-ps4-Hba-q3), similar to the duplicated sets found at the functional mouse cluster on Chromosome 11. In addition, exons 7 to 12 of the mLuc7L gene are present just downstream from the pseudogene cluster, indicating that this clone contains the region in which human 16p13.3 switches in synteny between mouse Chromosomes 11 and 17. Comparison of the sequences around the alpha-like clusters on the two mouse chromosomes reveals the presence of conserved tandem repeats. We propose that these repetitive elements have played a role in the fragmentation of the mouse alpha cluster during evolution.

Tosi S, Hughes J, Scherer SW, Nakabayashi K, Harbott J, Haas OA, Cazzaniga G, Biondi A, Kempski H, Kearney L. 2003. Heterogeneity of the 7q36 breakpoints in the t(7;12) involving ETV6 in infant leukemia. Genes Chromosomes Cancer, 38 (2), pp. 191-200. | Show Abstract | Read more

The t(7;12)(q36;p13) is a recurrent chromosome abnormality in infant leukemia. In these cases, the involvement of ETV6, with disruption of the gene consistently at its 5' end, has been reported by several groups. A fusion transcript between ETV6 and HLXB9 has been detected in some, but not all, reported cases of t(7;12). We report here a study based on fluorescence in situ hybridization (FISH) mapping of the translocation breakpoints in seven patients and detailed molecular studies using Southern blotting on two of these patients. The FISH studies have shown a cluster of breakpoints within a cosmid contig proximal to the HLXB9 gene. Southern blotting analysis enabled us to define two distinct breakpoints within the area covered by the cosmid contig in two patients. The analysis of an unusual case of t(7;12)(q22;p13) [full karyotype: 46,XX,der(7)t(7;12)(q22;p13)del(7)(q22q36)] also revealed a break in 7q36, although in a region proximal to the overlapping cosmids. 5' RACE PCR in one patient has shown a rearrangement involving the ETV6 allele not involved in the t(7;12), suggesting that no functional ETV6 allele might be present in this case. These data show some heterogeneity in the distribution of breakpoints in 7q36, indicating that the generation of a fusion gene might not be the mechanism responsible for leukemogenesis in the t(7;12), at least in some cases.

Viprakasit V, Kidd AM, Ayyub H, Horsley S, Hughes J, Higgs DR. 2003. De novo deletion within the telomeric region flanking the human alpha globin locus as a cause of alpha thalassaemia. Br J Haematol, 120 (5), pp. 867-875. | Show Abstract | Read more

We have identified and characterized a Scottish individual with alpha thalassaemia, resulting from a de novo 48 kilobase (kb) deletion from the telomeric flanking region of the alpha globin cluster which occurred as a result of recombination between two misaligned repetitive elements that normally lie approximately 83 kb and 131 kb from the 16p telomere. The deletion removes two previously described putative regulatory elements (HS-40 and HS-33) but leaves two other elements (HS-10 and HS-8) intact. Analysis of this deletion, together with eight other published deletions of the telomeric region, showed that they all severely downregulated alpha globin expression. Together they defined a 20.4-kb region of the human alpha cluster, which contains all of the positive cis-acting elements required to regulate alpha globin expression. Comparative analysis of this region with the corresponding segment of the mouse alpha globin cluster demonstrated conserved non-coding sequences corresponding to the putative regulatory elements HS-40 and HS-33. Although the role of HS-40 as an enhancer of alpha globin expression is fully established, these observations suggest that the role of HS-33 and other sequences in this region should be more fully investigated in the context of the natural human and mouse alpha globin loci.

Hughes J, Ward CJ, Aspinwall R, Butler R, Harris PC. 1999. Identification of a human homologue of the sea urchin receptor for egg jelly: a polycystic kidney disease-like protein. Hum Mol Genet, 8 (3), pp. 543-549. | Show Abstract | Read more

Previous studies have shown sequence similarity between a region of the autosomal dominant polycystic kidney disease (ADPKD) protein, polycystin-1 and a sea urchin sperm glycoprotein involved in fertilization, the receptor for egg jelly (suREJ). We have analysed sequence databases for novel genes encoding PKD/REJ-like proteins and found a significant region of homology to a large open reading frame in genomic sequence from human chromosome 22. Northern analysis showed that this is a functional gene [termed the polycystic kidney disease and receptor for egg jelly related gene ( PKDREJ )], but unlike polycystin-1, has a very restricted expression pattern; the approximately 8 kb transcript was found exclusively in testis, coincident with the timing of sperm maturation. The PKDREJ transcript was cloned by screening a testis cDNA library and RT-PCR which revealed a 7660 bp mRNA terminating with a 900 bp 3'UTR and a polyA tail. Comparison with genomic sequence showed that PKDREJ is intronless; sequencing the mouse orthologue revealed a similar structure. The predicted human PKDREJ protein has 2253 amino acids (calculated molecular mass 255 kDa) and sequence similarity over approximately 2000 amino acids with polycystin-1, corresponding to the predicted membrane associated region and the area of homology ( approximately 1000 amino acids) with the suREJ protein (the REJ module). The suREJ protein binds the glycoprotein coat of the egg (egg jelly), triggering the acrosome reaction, which transforms the sperm into a fusogenic cell. The sequence similarity and expression pattern suggests that PKDREJ is a mammalian equivalent of the suREJ protein and therefore may have a central role in human fertilization.

Sandford R, Sgotto B, Aparicio S, Brenner S, Vaudin M, Wilson RK, Chissoe S, Pepin K, Bateman A, Chothia C et al. 1997. Comparative analysis of the polycystic kidney disease 1 (PKD1) gene reveals an integral membrane glycoprotein with multiple evolutionary conserved domains. Hum Mol Genet, 6 (9), pp. 1483-1489. | Show Abstract | Read more

PKD1 is the major locus of the common genetic disorder autosomal dominant polycystic kidney disease (ADPKD). Analysis of the predicted protein sequence of the human PKD1 gene, polycystin, shows a large molecule with a unique arrangement of extracellular domains and multiple putative transmembrane regions. The precise function of polycystin remains unclear with a paucity of mutations to define key structural and functional domains. To refine the structure of this protein we have cloned the genomic region encoding the Fugu PKD1 gene. Fugu PKD1 spans 36 kb of genomic DNA and has greater complexity with 54 exons compared with 46 in man. Comparative analysis of the predicted protein sequences shows a lower level of homology than in similar studies with identity of 40 and 59% similarity. However key structural motifs including leucine rich repeats (LRR), a C-type lectin and LDL-A like domains and 16 PKD repeats are maintained. A region of homology with the sea urchin REJ protein was also confirmed in Fugu but found to extend over 1000 amino acids. Several highly conserved intra- and extra-cellular regions, with no known sequence homologies, that are likely to be of functional importance were detected. The likely structure of the membrane associated region has been refined with similarity to the PKD2 protein and voltage gated Ca2+ and Na+ channels highlighted over part of this area. The overall protein structure has therefore been clarified and this comparative analysis derived structure will form the basis for the functional study of polycystin and its individual domains.

Harris PC, Ward CJ, Peral B, Hughes J. 1995. Polycystic kidney disease. 1: Identification and analysis of the primary defect. J Am Soc Nephrol, 6 (4), pp. 1125-1133. | Show Abstract

The identification of the primary defect in autosomal dominant polycystic kidney disease (ADPKD) by biochemical methods has proved difficult because of the complexity of the cystic kidney. However, by the use of the genetic method of positional cloning, a gene accounting for approximately 85% of ADPKD, PKD1, has now been identified in the chromosome region 16p13.3. Its exact location was pinpointed because it was bisected by a chromosome translocation; members with the balanced exchange had PKD1. The PKD1 gene encodes an approximately 14-kb transcript, but full characterization was complicated, because most of the gene lies in a genomic region that is duplicated elsewhere on chromosome 16; the duplicate area encodes three genes with substantial homology to PKD1. At present, only seven mutations of PKD1 have been characterized and these are clustered in the nonduplicated, 3' end of the gene. However, a number of patients with large deletions of the PKD1 and adjacent tuberous sclerosis 2 (TSC2) genes, who have tuberous sclerosis and severe childhood-onset polycystic kidney disease, have also been described. Recently, the entire sequence of the PKD1 transcript and the genomic region containing the gene have been determined. The PKD1 gene covers approximately 52 kb of genomic DNA and is divided into 46 exons. The transcript is approximately 14.15 kb, and the predicted protein polycystin is 4302/3 amino acids with a calculated mass of approximately 460 kd. Polycystin contains several distinctive extracellular domains, including a flank-leucine rich repeat-flank domain, a C-type lectin, 16 approximately 85-amino-acid units that are similar to immunoglobulin repeats, four fibronectin Type III-related domains, and a low-density lipoprotein A domain. The C-terminal third of the protein has multiple hydrophobic regions, and modeling of this region suggests the presence of many transmembrane domains and a cytoplasmic C terminus. Hence, polycystin is probably an integral membrane protein with multiple extracellular domains that are involved in cell-cell and/or cell-matrix interactions. The ADPKD phenotype suggests that polycystin may play a role in cell-matrix communication, which is important for normal basement membrane production and for controlling cellular differentiation.

Hughes J, Ward CJ, Peral B, Aspinwall R, Clark K, San Millán JL, Gamble V, Harris PC. 1995. The polycystic kidney disease 1 (PKD1) gene encodes a novel protein with multiple cell recognition domains. Nat Genet, 10 (2), pp. 151-160. | Show Abstract | Read more

Characterization of the polycystic kidney disease 1 (PKD1) gene has been complicated by genomic rearrangements on chromosome 16. We have used an exon linking strategy, taking RNA from a cell line containing PKD1 but not the duplicate loci, to clone a cDNA contig of the entire transcript. The transcript consists of 14,148 bp (including a correction to the previously described C terminus), distributed among 46 exons spanning 52 kb. The predicted PKD1 protein, polycystin, is a glycoprotein with multiple transmembrane domains and a cytoplasmic C-tail. The N-terminal extracellular region of over 2,500 aa contains leucine-rich repeats, a C-type lectin, 16 immunoglobulin-like repeats and four type III fibronectin-related domains. Our results indicate that polycystin is an integral membrane protein involved in cell-cell/matrix interactions.

Harris PC, Ward CJ, Peral B, Hughes J. 1995. Autosomal dominant polycystic kidney disease: molecular analysis. Hum Mol Genet, 4 Spec No pp. 1745-1749. | Show Abstract

Using a positional cloning approach the major autosomal dominant polycystic kidney disease (ADPKD) gene (PKD1) has been identified on chromosome 16: a disease associated chromosome translocation was instrumental in its identification. Study of the PKD1 gene has been complicated because most of the gene lies in a genomic region reiterated elsewhere on the same chromosome. The duplicate area contains three genes which share substantial homology with PKD1 and generate polyadenylated transcripts. Most PKD1 mutations have so far been detected in the single copy, 3' end of the gene, but a group of patients with deletion of PKD1 and the adjacent TSC2 gene, which have severe infantile polycystic kidney disease, have also been characterised. The full length transcript of PKD1 (approximately 14 kb) has now been cloned and is predicted to encode a protein, polycystin, of 4302/3 aa. Polycystin contains multiple extracellular domains including leucine rich repeats, a C-type lectin, immunoglobulin and fibronectin type III-like domains and has a C terminal region which is likely associated with the membrane. These homologies indicate that polycystin is a cell-cell/matrix interaction protein.

Brook-Carter PT, Peral B, Ward CJ, Thompson P, Hughes J, Maheshwar MM, Nellist M, Gamble V, Harris PC, Sampson JR. 1994. Deletion of the TSC2 and PKD1 genes associated with severe infantile polycystic kidney disease--a contiguous gene syndrome. Nat Genet, 8 (4), pp. 328-332. | Show Abstract | Read more

Major genes which cause tuberous sclerosis (TSC) and autosomal dominant polycystic kidney disease (ADPKD), known as TSC2 and PKD1 respectively, lie immediately adjacent to each other on chromosome 16p. Renal cysts are often found in TSC, but a specific renal phenotype, distinguished by the severity and infantile presentation of the cystic changes, is seen in a small proportion of cases. We have identified large deletions disrupting TSC2 and PKD1 in each of six such cases studied. Analysis of the deletions indicates that they inactivate PKD1, in contrast to the mutations reported in ADPKD patients, where in each case abnormal transcripts have been detected.

Cited:

428

WOS

WARD C, PERAL B, HUGHES J, THOMAS S, GAMBLE V, MACCARTHY A, SLOANESTANLEY J, BUCKLE V, KEARNEY L, HIGGS D et al. 1994. THE POLYCYSTIC KIDNEY-DISEASE-1 GENE ENCODES A 14-KB TRANSCRIPT AND LIES WITHIN A DUPLICATED REGION ON CHROMOSOME-16 CELL, 77 (6), pp. 881-894. | Read more

Brackley CA, Brown JM, Waithe D, Babbs C, Davies J, Hughes JR, Buckle VJ, Marenduzzo D. 2016. Predicting the three-dimensional folding of cis-regulatory regions in mammalian genomes using bioinformatic data and polymer models. Genome Biol, 17 (1), pp. 59. | Show Abstract | Read more

The three-dimensional (3D) organization of chromosomes can be probed using methods like Capture-C. However, it is unclear how such population-level data relate to the organization within a single cell, and the mechanisms leading to the observed interactions are still largely obscure. We present a polymer modeling scheme based on the assumption that chromosome architecture is maintained by protein bridges, which form chromatin loops. To test the model, we perform FISH experiments and compare with Capture-C data. Starting merely from the locations of protein binding sites, our model accurately predicts the experimentally observed chromatin interactions, revealing a population of 3D conformations.

Paralkar VR, Taborda CC, Huang P, Yao Y, Kossenkov AV, Prasad R, Luan J, Davies JO, Hughes JR, Hardison RC et al. 2016. Unlinking an lncRNA from Its Associated cis Element. Mol Cell, 62 (1), pp. 104-110. | Show Abstract | Read more

Long non-coding (lnc) RNAs can regulate gene expression and protein functions. However, the proportion of lncRNAs with biological activities among the thousands expressed in mammalian cells is controversial. We studied Lockd (lncRNA downstream of Cdkn1b), a 434-nt polyadenylated lncRNA originating 4 kb 3' to the Cdkn1b gene. Deletion of the 25-kb Lockd locus reduced Cdkn1b transcription by approximately 70% in an erythroid cell line. In contrast, homozygous insertion of a polyadenylation cassette 80 bp downstream of the Lockd transcription start site reduced the entire lncRNA transcript level by >90% with no effect on Cdkn1b transcription. The Lockd promoter contains a DNase-hypersensitive site, binds numerous transcription factors, and physically associates with the Cdkn1b promoter in chromosomal conformation capture studies. Therefore, the Lockd gene positively regulates Cdkn1b transcription through an enhancer-like cis element, whereas the lncRNA itself is dispensable, which may be the case for other lncRNAs.

Davies JO, Telenius JM, McGowan SJ, Roberts NA, Taylor S, Higgs DR, Hughes JR. 2016. Multiplexed analysis of chromosome conformation at vastly improved sensitivity. Nat Methods, 13 (1), pp. 74-80. | Show Abstract | Read more

Methods for analyzing chromosome conformation in mammalian cells are either low resolution or low throughput and are technically challenging. In next-generation (NG) Capture-C, we have redesigned the Capture-C method to achieve unprecedented levels of sensitivity and reproducibility. NG Capture-C can be used to analyze many genetic loci and samples simultaneously. High-resolution data can be produced with as few as 100,000 cells, and single-nucleotide polymorphisms can be used to generate allele-specific tracks. The method is straightforward to perform and should greatly facilitate the investigation of many questions related to gene regulation as well as the functional dissection of traits examined in genome-wide association studies.

Voon HPJ, Hughes JR, Rode C, DeLaRosa-Velázquez IA, Jenuwein T, Feil R, Higgs DR, Gibbons RJ. 2015. ATRX Plays a Key Role in Maintaining Silencing at Interstitial Heterochromatic Loci and Imprinted Genes Cell Reports, 11 (3), pp. 405-418. | Show Abstract | Read more

© 2015 The Authors.Histone H3.3 is a replication-independent histone variant, which replaces histones that are turned over throughout the entire cell cycle. H3.3 deposition at euchromatin is dependent on HIRA, whereas ATRX/Daxx deposits H3.3 at pericentric heterochromatin and telomeres. The role of H3.3 at heterochromatic regions is unknown, but mutations in the ATRX/Daxx/H3.3 pathway are linked to aberrant telomere lengthening in certain cancers. In this study, we show that ATRX-dependent deposition of H3.3 is not limited to pericentric heterochromatin and telomeres but also occurs at heterochromatic sites throughout the genome. Notably, ATRX/H3.3 specifically localizes to silenced imprinted alleles in mouse ESCs. ATRX KO cells failed to deposit H3.3 at these sites, leading to loss of the H3K9me3 heterochromatin modification, loss of repression, and aberrant allelic expression. We propose a model whereby ATRX-dependent deposition of H3.3 into heterochromatin is normally required to maintain the memory of silencing at imprinted loci.

Hughes JR, Roberts N, McGowan S, Hay D, Giannoulatou E, Lynch M, De Gobbi M, Taylor S, Gibbons R, Higgs DR. 2014. Analysis of hundreds of cis-regulatory landscapes at high resolution in a single, high-throughput experiment. Nat Genet, 46 (2), pp. 205-212. | Show Abstract | Read more

Gene expression during development and differentiation is regulated in a cell- and stage-specific manner by complex networks of intergenic and intragenic cis-regulatory elements whose numbers and representation in the genome far exceed those of structural genes. Using chromosome conformation capture, it is now possible to analyze in detail the interaction between enhancers, silencers, boundary elements and promoters at individual loci, but these techniques are not readily scalable. Here we present a high-throughput approach (Capture-C) to analyze cis interactions, interrogating hundreds of specific interactions at high resolution in a single experiment. We show how this approach will facilitate detailed, genome-wide analysis to elucidate the general principles by which cis-acting sequences control gene expression. In addition, we show how Capture-C will expedite identification of the target genes and functional effects of SNPs that are associated with complex diseases, which most frequently lie in intergenic cis-acting regulatory elements.

Marques AC, Hughes J, Graham B, Kowalczyk MS, Higgs DR, Ponting CP. 2013. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biol, 14 (11), pp. R131. | Show Abstract | Read more

BACKGROUND: Mammalian transcriptomes contain thousands of long noncoding RNAs (lncRNAs). Some lncRNAs originate from intragenic enhancers which, when active, behave as alternative promoters producing transcripts that are processed using the canonical signals of their host gene. We have followed up this observation by analyzing intergenic lncRNAs to determine the extent to which they might also originate from intergenic enhancers. RESULTS: We integrated high-resolution maps of transcriptional initiation and transcription to annotate a conservative set of intergenic lncRNAs expressed in mouse erythroblasts. We subclassified intergenic lncRNAs according to chromatin status at transcriptional initiation regions, defined by relative levels of histone H3K4 mono- and trimethylation. These transcripts are almost evenly divided between those arising from enhancer-associated (elncRNA) or promoter-associated (plncRNA) elements. These two classes of 5' capped and polyadenylated RNA transcripts are indistinguishable with regard to their length, number of exons or transcriptional orientation relative to their closest neighboring gene. Nevertheless, elncRNAs are more tissue-restricted, less highly expressed and less well conserved during evolution. Of considerable interest, we found that expression of elncRNAs, but not plncRNAs, is associated with enhanced expression of neighboring protein-coding genes during erythropoiesis. CONCLUSIONS: We have determined globally the sites of initiation of intergenic lncRNAs in erythroid cells, allowing us to distinguish two similarly abundant classes of transcripts. Different correlations between the levels of elncRNAs, plncRNAs and expression of neighboring genes suggest that functional lncRNAs from the two classes may play contrasting roles in regulating the transcript abundance of local or distal loci.

McGowan SJ, Hughes JR, Han ZP, Taylor S. 2013. MIG: Multi-Image Genome viewer. Bioinformatics, 29 (19), pp. 2477-2478. | Show Abstract | Read more

SUMMARY: Multi-Image Genome (MIG) viewer is a web-based application for visualizing, querying and filtering many thousands of genome browser regions as well as for exporting the data in a variety of formats. This methodology has been used successfully to analyze ChIP-Seq data and RNA-Seq data and to detect somatic mutations in genome resequencing projects. AVAILABILITY: MIG is available at https://mig.molbiol.ox.ac.uk/mig/

Hosseini M, Goodstadt L, Hughes JR, Kowalczyk MS, de Gobbi M, Otto GW, Copley RR, Mott R, Higgs DR, Flint J. 2013. Causes and consequences of chromatin variation between inbred mice. PLoS Genet, 9 (6), pp. e1003570. | Show Abstract | Read more

Variation at regulatory elements, identified through hypersensitivity to digestion by DNase I, is believed to contribute to variation in complex traits, but the extent and consequences of this variation are poorly characterized. Analysis of terminally differentiated erythroblasts in eight inbred strains of mice identified reproducible variation at approximately 6% of DNase I hypersensitive sites (DHS). Only 30% of such variable DHS contain a sequence variant predictive of site variation. Nevertheless, sequence variants within variable DHS are more likely to be associated with complex traits than those in non-variant DHS, and variants associated with complex traits preferentially occur in variable DHS. Changes at a small proportion (less than 10%) of variable DHS are associated with changes in nearby transcriptional activity. Our results show that whilst DNA sequence variation is not the major determinant of variation in open chromatin, where such variants exist they are likely to be causal for complex traits.

Cited:

40

Scopus

Twigg SRF, Vorgia E, Mcgowan SJ, Peraki I, Fenwick AL, Sharma VP, Allegra M, Zaragkoulias A, Akha ES, Knight SJL et al. 2013. Reduced dosage of ERF causes complex craniosynostosis in humans and mice and links ERK1/2 signaling to regulation of osteogenesis Nature Genetics, 45 (3), pp. 308-313. | Show Abstract | Read more

The extracellular signal-related kinases 1 and 2 (ERK1/2) are key proteins mediating mitogen-activated protein kinase signaling downstream of RAS: phosphorylation of ERK1/2 leads to nuclear uptake and modulation of multiple targets. Here, we show that reduced dosage of ERF, which encodes an inhibitory ETS transcription factor directly bound by ERK1/2 (refs. 2,3,4,5,6,7), causes complex craniosynostosis (premature fusion of the cranial sutures) in humans and mice. Features of this newly recognized clinical disorder include multiple-suture synostosis, craniofacial dysmorphism, Chiari malformation and language delay. Mice with functional Erf levels reduced to ∼30% of normal exhibit postnatal multiple-suture synostosis; by contrast, embryonic calvarial development appears mildly delayed. Using chromatin immunoprecipitation in mouse embryonic fibroblasts and high-throughput sequencing, we find that ERF binds preferentially to elements away from promoters that contain RUNX or AP-1 motifs. This work identifies ERF as a novel regulator of osteogenic stimulation by RAS-ERK signaling, potentially by competing with activating ETS factors in multifactor transcriptional complexes. © 2013 Nature America, Inc. All rights reserved.

Hughes JR, Lower KM, Dunham I, Taylor S, De Gobbi M, Sloane-Stanley JA, McGowan S, Ragoussis J, Vernimmen D, Gibbons RJ, Higgs DR. 2013. High-resolution analysis of cis-acting regulatory networks at the α-globin locus. Philos Trans R Soc Lond B Biol Sci, 368 (1620), pp. 20120361. | Show Abstract | Read more

We have combined the circular chromosome conformation capture protocol with high-throughput, genome-wide sequence analysis to characterize the cis-acting regulatory network at a single locus. In contrast to methods which identify large interacting regions (10-1000 kb), the 4C approach provides a comprehensive, high-resolution analysis of a specific locus with the aim of defining, in detail, the cis-regulatory elements controlling a single gene or gene cluster. Using the human α-globin locus as a model, we detected all known local and long-range interactions with this gene cluster. In addition, we identified two interactions with genes located 300 kb (NME4) and 625 kb (FAM173a) from the α-globin cluster.

Lynch MD, Smith AJ, De Gobbi M, Flenley M, Hughes JR, Vernimmen D, Ayyub H, Sharpe JA, Sloane-Stanley JA, Sutherland L et al. 2012. An interspecies analysis reveals a key role for unmethylated CpG dinucleotides in vertebrate Polycomb complex recruitment. EMBO J, 31 (2), pp. 317-329. | Show Abstract | Read more

The role of DNA sequence in determining chromatin state is incompletely understood. We have previously demonstrated that large chromosomal segments from human cells recapitulate their native chromatin state in mouse cells, but the relative contribution of local sequences versus their genomic context remains unknown. In this study, we compare orthologous chromosomal regions for which the human locus establishes prominent sites of Polycomb complex recruitment in pluripotent stem cells, whereas the corresponding mouse locus does not. Using recombination-mediated cassette exchange at the mouse locus, we establish the primacy of local sequences in the encoding of chromatin state. We show that the signal for chromatin bivalency is redundantly encoded across a bivalent domain and that this reflects competition between Polycomb complex recruitment and transcriptional activation. Furthermore, our results suggest that a high density of unmethylated CpG dinucleotides is sufficient for vertebrate Polycomb recruitment. This model is supported by analysis of DNA methyltransferase-deficient embryonic stem cells.

De Gobbi M, Garrick D, Lynch M, Vernimmen D, Hughes JR, Goardon N, Luc S, Lower KM, Sloane-Stanley JA, Pina C et al. 2011. Generation of bivalent chromatin domains during cell fate decisions. Epigenetics Chromatin, 4 (1), pp. 9. | Show Abstract | Read more

BACKGROUND: In self-renewing, pluripotent cells, bivalent chromatin modification is thought to silence (H3K27me3) lineage control genes while 'poising' (H3K4me3) them for subsequent activation during differentiation, implying an important role for epigenetic modification in directing cell fate decisions. However, rather than representing an equivalently balanced epigenetic mark, the patterns and levels of histone modifications at bivalent genes can vary widely and the criteria for identifying this chromatin signature are poorly defined. RESULTS: Here, we initially show how chromatin status alters during lineage commitment and differentiation at a single well characterised bivalent locus. In addition we have determined how chromatin modifications at this locus change with gene expression in both ensemble and single cell analyses. We also show, on a global scale, how mRNA expression may be reflected in the ratio of H3K4me3/H3K27me3. CONCLUSIONS: While truly 'poised' bivalently modified genes may exist, the original hypothesis that all bivalent genes are epigenetically premarked for subsequent expression might be oversimplistic. In fact, from the data presented in the present work, it is equally possible that many genes that appear to be bivalent in pluripotent and multipotent cells may simply be stochastically expressed at low levels in the process of multilineage priming. Although both situations could be considered to be forms of 'poising', the underlying mechanisms and the associated implications are clearly different.

Law MJ, Lower KM, Voon HP, Hughes JR, Garrick D, Viprakasit V, Mitson M, De Gobbi M, Marra M, Morris A et al. 2010. ATR-X syndrome protein targets tandem repeats and influences allele-specific expression in a size-dependent manner. Cell, 143 (3), pp. 367-378. | Show Abstract | Read more

ATRX is an X-linked gene of the SWI/SNF family, mutations in which cause syndromal mental retardation and downregulation of α-globin expression. Here we show that ATRX binds to tandem repeat (TR) sequences in both telomeres and euchromatin. Genes associated with these TRs can be dysregulated when ATRX is mutated, and the change in expression is determined by the size of the TR, producing skewed allelic expression. This reveals the characteristics of the affected genes, explains the variable phenotypes seen with identical ATRX mutations, and illustrates a new mechanism underlying variable penetrance. Many of the TRs are G rich and predicted to form non-B DNA structures (including G-quadruplex) in vivo. We show that ATRX binds G-quadruplex structures in vitro, suggesting a mechanism by which ATRX may play a role in various nuclear processes and how this is perturbed when ATRX is mutated.

1788