Dr Hashem Koohy

Research Area: Bioinformatics & Stats (inc. Modelling and Computational Biology)
Technology Exchange: Bioinformatics, Cell sorting, Cellular immunology, Computational biology, Flow cytometry, Human genetics, Immunohistochemistry, Medical statistics, Statistical genetics and Transcript profiling
Scientific Themes: Bioinformatics, Statistics & Computational Biology and Immunology
Keywords: Machine-learning, statistical inference, B and T cell receptor profiling, VDJ recombination and neoantigen
Epigenetic features that associate with active VH gene recombination in mouse IgH locus.
Recombination of V genes overlapping with Architectural chromatin states is strongly associated with high enrichment of CTCF, RAD21 and DHS, whereas combination of V genes in Enhancer state is associated to enrichment of PAX5, IRF4, H3k4me1 and DHS.

Epigenetic features that associate with active VH gene recombination in mouse IgH ...

The research theme of my group is to further understand the functional and molecular mechanisms of immune system in various immunologically important biological systems such as cancer, infection, autoimmune disease as well as aging. We are also excited to find out more about immune response to vaccines, in particular, newly developed vaccines targeting various cancer types.


General Research Theme

We usually take a genomic and epigenomic approach and therefore in the group we are so excited about advances in at least two disciplines:

 A) New high throughput sequencings techniques, in particular in single cell level. This allows our collaborators to measure immune repertoire at a single cell level under various conditions (tumour vs healthy cells) and therefore providing us with very exciting unprecedented large-scale data.

B) New advances in the world of data science and machine-learning, such as the cutting-edge deep neural networks as well as forefront developments of data science in Python and R. We are devoted to learn and to be updated about these advances and apply them to biological data to address our scientific questions of interest.


Specific Research Theme

B and T cells are two key components of vertebrates’ adaptive immune system. The efficiency of an adaptive immune response at any given time point is based on the huge diversity of receptor proteins sitting on the surface of B and T cells which are known as BCRs and TCRs (collectively as the immune repertoire). The diversity of immune repertoire is initiated by chromosomal rearrangement of V, D and J gene segments in different BCR and TCR loci. Diversification is further enhanced by somatic hypermutation and class-switching in B cells and alpha and beta pairing in T cells. Understanding regulatory mechanisms behind this huge diversity of the immune repertoire has implications in further understanding of immune response to cancer, infection and autoimmune disease.


In a collaboration with world-class Immunologists and Cell Biologists at the Babraham Institute, we have measured V gene usage in mouse heavy chain locus, employed supervised and unsupervised machine learning techniques and shown that the specificity of Recombination Binding Gene (RAG) plays an important ‘permissive’ role in VDJ recombination. The frequency of V gene usage, however, is determined by one of the two regulatory mechanisms that have been evolved alongside the protein coding sequences of V genes (figure 1).

In order to find out more about the underlying functional causes of immune deterioration as a function of age, we have further continued this collaboration and have performed a comprehensive study of age-associated changes in BCR repertoire, transcriptome and epigenome at various B cell developmental stages (manuscript is under submission).


My research interests will be mainly focussed on the characterisation of TCR sequences expressed by tumour infiltrating T lymphocytes and on the identification of neo-antigens derived from tumours’ somatic mutations. This research projects will be closely aligned to the research interests of many groups within the MRC Human Immunology Unit and in particular to the research developed within the Cerundolo group.


Talented candidates for PhD and postdoc positions who A) love to contribute to very exciting and potentially career-changing projects, B) love to learn more about the cutting-edge machine-learning and data science techniques; are strongly encouraged to contact me.



Name Department Institution Country
Dr Peter Fraser Babraham Institute United Kingdom
Dr Anne Corcoran Babraham Institute United Kingdom
Dr Mikhail Spivakov Babraham Institute United Kingdom
Dr Patrick Varga-Weisz Babraham Institute United Kingdom
Prof Vincenzo Cerundolo Investigative Medicine Division Oxford University, Weatherall Institute of Molecular Medicine United Kingdom
Matheson LS, Bolland DJ, Chovanec P, Krueger F, Andrews S, Koohy H, Corcoran AE. 2017. Local Chromatin Features Including PU.1 and IKAROS Binding and H3K4 Methylation Shape the Repertoire of Immunoglobulin Kappa Genes Chosen for V(D)J Recombination. Front Immunol, 8 (NOV), pp. 1550. | Show Abstract | Read more

V(D)J recombination is essential for the generation of diverse antigen receptor (AgR) repertoires. In B cells, immunoglobulin kappa (Igκ) light chain recombination follows immunoglobulin heavy chain (Igh) recombination. We recently developed the DNA-based VDJ-seq assay for the unbiased quantitation ofIghVH and DH repertoires. Integration of VDJ-seq data with genome-wide datasets revealed that two chromatin states at the recombination signal sequence (RSS) of VH genes are highly predictive of recombination in mouse pro-B cells. It is unknown whether local chromatin states contribute to Vκ gene choice duringIgκrecombination. Here we adapt VDJ-seq to profile theIgκVκJκ repertoire and present a comprehensive readout in mouse pre-B cells, revealing highly variable Vκ gene usage. Integration with genome-wide datasets for histone modifications, DNase hypersensitivity, transcription factor binding and germline transcription identified PU.1 binding at the RSS, which was unimportant forIgh, as highly predictive of whether a Vκ gene will recombine or not, suggesting that it plays a binary, all-or-nothing role, priming genes for recombination. Thereafter, the frequency with which these genes recombine was shaped both by the presence and level of enrichment of several other chromatin features, including H3K4 methylation and IKAROS binding. Moreover, in contrast to theIghlocus, the chromatin landscape of the promoter, as well as of the RSS, contributes to Vκ gene recombination. Thus, multiple facets of local chromatin features explain much of the variation in Vκ gene usage. Together, these findings reveal shared and divergent roles for epigenetic features and transcription factors in AgR V(D)J recombination and provide avenues for further investigation of chromatin signatures that may underpin V(D)J-mediated chromosomal translocations.

Bolland DJ, Koohy H, Wood AL, Matheson LS, Krueger F, Stubbington MJT, Baizan-Edge A, Chovanec P, Stubbs BA, Tabbada K et al. 2016. Two Mutually Exclusive Local Chromatin States Drive Efficient V(D)J Recombination. Cell Rep, 15 (11), pp. 2475-2487. | Show Abstract | Read more

Variable (V), diversity (D), and joining (J) (V(D)J) recombination is the first determinant of antigen receptor diversity. Understanding how recombination is regulated requires a comprehensive, unbiased readout of V gene usage. We have developed VDJ sequencing (VDJ-seq), a DNA-based next-generation-sequencing technique that quantitatively profiles recombination products. We reveal a 200-fold range of recombination efficiency among recombining V genes in the primary mouse Igh repertoire. We used machine learning to integrate these data with local chromatin profiles to identify combinatorial patterns of epigenetic features that associate with active VH gene recombination. These features localize downstream of VH genes and are excised by recombination, revealing a class of cis-regulatory element that governs recombination, distinct from expression. We detect two mutually exclusive chromatin signatures at these elements, characterized by CTCF/RAD21 and PAX5/IRF4, which segregate with the evolutionary history of associated VH genes. Thus, local chromatin signatures downstream of VH genes provide an essential layer of regulation that determines recombination efficiency.

Petrini E, Baillet V, Cridge J, Hogan CJ, Guillaume C, Ke H, Brandetti E, Walker S, Koohy H, Spivakov M, Varga-Weisz P. 2015. A new phosphate-starvation response in fission yeast requires the endocytic function of myosin I. J Cell Sci, 128 (20), pp. 3707-3713. | Show Abstract | Read more

Endocytosis is essential for uptake of many substances into the cell, but how it links to nutritional signalling is poorly understood. Here, we show a new role for endocytosis in regulating the response to low phosphate in Schizosaccharomyces pombe. Loss of function of myosin I (Myo1), Sla2/End4 or Arp2, proteins involved in the early steps of endocytosis, led to increased proliferation in low-phosphate medium compared to controls. We show that once cells are deprived of phosphate they undergo a quiescence response that is dependent on the endocytic function of Myo1. Transcriptomic analysis revealed a wide perturbation of gene expression with induction of stress-regulated genes upon phosphate starvation in wild-type but not Δmyo1 cells. Thus, endocytosis plays a pivotal role in mediating the cellular response to nutrients, bridging the external environment and internal molecular functions of the cell.

Koohy H, Koohy B. 2014. A lesson from the ice bucket challenge: using social networks to publicize science. Front Genet, 5 (DEC), pp. 430. | Read more

Koohy H, Down TA, Spivakov M, Hubbard T. 2014. A comparison of peak callers used for DNase-Seq data. PLoS One, 9 (5), pp. e96303. | Show Abstract | Read more

Genome-wide profiling of open chromatin regions using DNase I and high-throughput sequencing (DNase-seq) is an increasingly popular approach for finding and studying regulatory elements. A variety of algorithms have been developed to identify regions of open chromatin from raw sequence-tag data, which has motivated us to assess and compare their performance. In this study, four published, publicly available peak calling algorithms used for DNase-seq data analysis (F-seq, Hotspot, MACS and ZINBA) are assessed at a range of signal thresholds on two published DNase-seq datasets for three cell types. The results were benchmarked against an independent dataset of regulatory regions derived from ENCODE in vivo transcription factor binding data for each particular cell type. The level of overlap between peak regions reported by each algorithm and this ENCODE-derived reference set was used to assess sensitivity and specificity of the algorithms. Our study suggests that F-seq has a slightly higher sensitivity than the next best algorithms. Hotspot and the ChIP-seq oriented method, MACS, both perform competitively when used with their default parameters. However the generic peak finder ZINBA appears to be less sensitive than the other three. We also assess accuracy of each algorithm over a range of signal thresholds. In particular, we show that the accuracy of F-Seq can be considerably improved by using a threshold setting that is different from the default value.

Koohy H, Down TA, Hubbard TJ. 2013. Chromatin accessibility data sets show bias due to sequence specificity of the DNase I enzyme. PLoS One, 8 (7), pp. e69853. | Show Abstract | Read more

BACKGROUND: DNase I is an enzyme which cuts duplex DNA at a rate that depends strongly upon its chromatin environment. In combination with high-throughput sequencing (HTS) technology, it can be used to infer genome-wide landscapes of open chromatin regions. Using this technology, systematic identification of hundreds of thousands of DNase I hypersensitive sites (DHS) per cell type has been possible, and this in turn has helped to precisely delineate genomic regulatory compartments. However, to date there has been relatively little investigation into possible biases affecting this data. RESULTS: We report a significant degree of sequence preference spanning sites cut by DNase I in a number of published data sets. The two major protocols in current use each show a different pattern, but for a given protocol the pattern of sequence specificity seems to be quite consistent. The patterns are substantially different from biases seen in other types of HTS data sets, and in some cases the most constrained position lies outside the sequenced fragment, implying that this constraint must relate to the digestion process rather than events occurring during library preparation or sequencing. CONCLUSIONS: DNase I is a sequence-specific enzyme, with a specificity that may depend on experimental conditions. This sequence specificity is not taken into account by existing pipelines for identifying open chromatin regions. Care must be taken when interpreting DNase I results, especially when looking at the precise locations of the reads. Future studies may be able to improve the sensitivity and precision of chromatin state measurement by compensating for sequence bias.

Koohy H, Dyer NP, Reid JE, Koentges G, Ott S. 2010. An alignment-free model for comparison of regulatory sequences. Bioinformatics, 26 (19), pp. 2391-2397. | Show Abstract | Read more

MOTIVATION: Some recent comparative studies have revealed that regulatory regions can retain function over large evolutionary distances, even though the DNA sequences are divergent and difficult to align. It is also known that such enhancers can drive very similar expression patterns. This poses a challenge for the in silico detection of biologically related sequences, as they can only be discovered using alignment-free methods. RESULTS: Here, we present a new computational framework called Regulatory Region Scoring (RRS) model for the detection of functional conservation of regulatory sequences using predicted occupancy levels of transcription factors of interest. We demonstrate that our model can detect the functional and/or evolutionary links between some non-alignable enhancers with a strong statistical significance. We also identify groups of enhancers that are likely to be similarly regulated. Our model is motivated by previous work on prediction of expression patterns and it can capture similarity by strong binding sites, weak binding sites and even the statistically significant absence of sites. Our results support the hypothesis that weak binding sites contribute to the functional similarity of sequences. Our model fills a gap between two families of models: detailed, data-intensive models for the prediction of precise spatio-temporal expression patterns on the one side, and crude, generally applicable models on the other side. Our model borrows some of the strengths of each group and addresses their drawbacks. AVAILABILITY: The RRS source code is freely available upon publication of this manuscript: http://www2.warwick.ac.uk/fac/sci/systemsbiology/staff/ott/tools_and_software/rrs.

Rittman M, Gilroy E, Koohya H, Rodger A, Richards A. 2009. Is DNA a worm-like chain in Couette flow? In search of persistence length, a critical review. Sci Prog, 92 (Pt 2), pp. 163-204. | Show Abstract | Read more

Persistence length is the foremost measure of DNA flexibility. Its origins lie in polymer theory which was adapted for DNA following the determination of BDNA structure in 1953. There is no single definition of persistence length used, and the links between published definitions are based on assumptions which may, or may not be, clearly stated. DNA flexibility is affected by local ionic strength, solvent environment, bound ligands and intrinsic sequence-dependent flexibility. This article is a review of persistence length providing a mathematical treatment of the relationships between four definitions of persistence length, including: correlation, Kuhn length, bending, and curvature. Persistence length has been measured using various microscopy, force extension and solution methods such as linear dichroism and transient electric birefringence. For each experimental method a model of DNA is required to interpret the data. The importance of understanding the underlying models, along with the assumptions required by each definition to determine a value of persistence length, is highlighted for linear dichroism data, where it transpires that no model is currently available for long DNA or medium to high shear rate experiments.

Koohy H. 2008. On finiteness of multiplication modules Acta Mathematica Hungarica, 118 (1-2), pp. 1-7. | Show Abstract | Read more

Our main aim in this note, is a further generalization of a result due to D. D. Anderson, i.e., it is shown that if R is a commutative ring, and M a multiplication R-module, such that every prime ideal minimal over Ann (M) is finitely generated, then M contains only a finite number of minimal prime submodules. This immediately yields that if P is a projective ideal of R, such that every prime ideal minimal over Ann (P) is finitely generated, then P is finitely generated. Furthermore, it is established that if M is a multiplication R-module in which every minimal prime submodule is finitely generated, then R contains only a finite number of prime ideals minimal over Ann (M). © 2007 Springer Science + Business Media B.V.