Cracking the 3p21.31 COVID-19 mystery.
4 November 2021
Damien Downes, the senior Postdoctoral Scientist in the Hughes’ Genome Biology Group, shares the story behind identifying LZTFL1 as an effector gene at a COVID-19 risk locus, work which was published in Nature Genetics today.
When the Coronavirus pandemic hit the UK in 2020, like many other scientists in the WIMM, I transitioned from bench scientist to work-from-home bioinformatician. During those first few months of getting used to the new normal, many of my friends and family asked me “are you working on the virus?”, they all knew I worked in a Molecular Medicine institute, and I’d laugh, “that is so far out of the realm of my research”. I’d spent the last 5 years in Prof. Jim Hughes’ group developing a systematic approach to decipher how inherited differences in the DNA sequence can increase our risk of common diseases like diabetes and dementia – definitely not immunology or virology. Then suddenly, from June 2020, I was working on the virus. Using a population-based approach, Ellinghaus et al2 had identified two regions of DNA in the human genome that increased the risk of severe COVID-19. One of those regions (3p21.31) doubled a patient’s chance of respiratory failure; there was no good explanation why.
For almost 20 years, large population studies known as Genome Wide Association Studies (GWAS) have been used to identify regions of disease risk in the genome. GWAS are relatively straightforward to perform, the GWAS Catalog3 contains results from 5,419 publications or five per week since the first GWAS in 2002, yet deciphering the results is far from straightforward. The genetic signals from GWAS are complex and cell-type agnostic. Each signal contains tens to hundreds of DNA sequence variants, known as polymorphisms, but only one variant is likely to be responsible for conferring risk at each locus. And, because the same DNA is in every cell of the body, the genetic signal can’t tell you if the risk variant alters biology in the eye, the kidney or in any other cell-type. Over time, we’ve also come to learn that the majority of causative variants don’t affect genes directly, but alter the switches that turn genes on, regulating the amount of protein that is made rather than the protein itself.
As a murder-mystery buff, I like the analogy of Cluedo for solving GWAS signals: from a group of suspect polymorphisms, we have to work out which one (causal DNA variant), used which weapon (gene), in which room of the house (cell type). It’s this love of mystery solving that has been at the core of my 5 years of developing a GWAS decoding approach that combines a mixture of functional genomic and epigenetic data, machine learning prediction, and methods developed in the WIMM to detect gene-switch interactions though DNA-folding4,5.
As soon as I saw the 3p21.31 GWAS result, I knew the “Hughes lab approach” would work and I set about collecting the data that I needed. Most people who’ve seen the genes at this COVID-19 risk locus immediately suspect immune cells; there are numerous cytokine genes which function in immunity. But using our data-driven method, I could find no evidence of immune cell involvement. Instead, our machine learning algorithm pointed to the epithelial lining of the lung. What’s more, our high-resolution chromosome conformation capture results ruled out the cytokine genes, pointing to LZTFL1, a relatively unstudied gene. When I searched data on the GTEx expression database, I could see that the causal DNA variant (rs17713054) is associated with increased expression of LZTFL1. I had a suspect, a weapon, and a room.
Excited by my results, I set up a meeting in a local cafe with Prof. James Davies – an MRC Clinician Scientist who’d returned to the wards to treat COVID-19 patients – to see what he thought. He was sceptical, but when I mentioned the LZTFL1 Mendelian disease is a ciliopathy, a disease of ciliated cells, his eyes lit up. Ciliated epithelium lines the surface of the airways and lungs; the primary target of SARS-CoV-2. Suddenly he was convinced, but thought we should find more evidence.
After some reading, we discovered different levels of the LZTFL1 protein affect a biological pathway called epithelial-mesenchymal-transition (EMT), and that EMT was a known cellular response to the HIV-1 and Herpes viruses. Could this pathway be similarly involved in the response to SARS-CoV-2? Prof. Davies reached out to Prof. Fadi Issa, who he knew was generating spatial gene expression data in lung biopsies of COVID-19 patients. Working with Dr. Amy Cross from the Issa group we were able to detect signals of EMT in patients’ lungs, showing this pathway was indeed relevant to SARS-CoV-2 infection. Case closed!
Solving the 3p21.31 mystery was important for several reasons. Due to geographical differences in polymorphism frequencies, the rs17713054 risk variant is much more likely to be found in someone with South Asian ancestry than someone with Black or European ancestry. This difference in frequency was likely a major contributor to the higher rates of South Asian deaths seen during the second and third waves in the UK. Crucially, 3p21.31 doesn’t alter immune cell function, so its negative effect is completely removed by vaccination. Additionally, rs17713054 leads to higher levels of LZTFL1, this means that drugs targeting this protein’s function could be developed as novel therapeutics for viral infection.
Disclosure: Damien also works as a consultant for Nucleome Therapeutics which was founded by Prof. Jim Hughes, Prof. James Davies, and Dr. Danuta Jeziorska as an Oxford Science Enterprises backed spinout company based on the Hughes lab approach to deciphering GWAS signals and leveraging the Micro-Capture-C method developed by the Davies lab.
1- Downes D.J., et al. (2021) Identification of LZTFL1 as a candidate effector gene at a COVID-19 risk locus. Nature Genetics https://www.nature.com/articles/s41588-021-00955-3
2- Ellinghaus D., et al. (2020) Genomewide Association Study of Severe Covid-19 with Respiratory Failure. New England Journal of Medicine https://doi.org/10.1056/NEJMoa2020283
3- NHGRI-EBI GWAS Catalog: https://www.ebi.ac.uk/gwas/home
4- Downes D.J., et al. (2021) High-resolution targeted 3C interrogation of cis-regulatory element organization at genome-wide scale. Nature Communications https://doi.org/10.1038/s41467-020-20809-6
5- Hua P., et al. (2021) Defining genome architecture at base-pair resolution. Nature https://doi.org/10.1038/s41586-021-03639-4