Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

Image credit:

The term ‘junk DNA’ [1] is one loved by journalists, and often loathed by scientists. When the full sequence of the human genome was published in 2004 [2], it was found that in actual fact less than 2% of your DNA actually contains instructions to make proteins (the physical building blocks of the human body). So if the remaining 98% doesn’t appear to be doing anything useful…what on earth is it there for?

The truth is, ten years later, scientists still don’t really know – but it certainly doesn’t look like it’s all rubbish. Typically, when scientists look for changes to the DNA sequence in patients with a disease that might provide clues to help identify the cause of the condition, they would probably first look in regions that contain the information to make proteins. But increasingly, changes to the DNA associated with disease are found in the bits of sequence in between – the ‘junk’ DNA.

One specific example of this situation was found in a patient with an inherited form of anaemia (a deficiency or defect in red blood cell production) [3]. Red blood cells contain a protein complex called haemoglobin that enables these cells to carry oxygen around the body. In this patient, some of the key proteins in the haemoglobin complex were absent – but the DNA sequence containing the information to produce the proteins was most definitely still there. So why wasn’t the patient producing the protein?

It turned out that the patient in fact had a whole region of DNA missing just next door to where the actual protein-coding information was located, and this somehow meant that the next-door proteins weren’t produced. The absence of this bit of ‘junk’ DNA was having a major impact on human health – and similar examples in many other human diseases have also been found.

So although it is perhaps justified to say that not all junk DNA is rubbish, it should also be acknowledged that precisely how these non-protein-coding regions of the DNA affect the protein-coding bits isn’t fully understood. The study of the patient described above was carried out in Professor Doug Higgs’ lab at the WIMM in 1990, and he continues to research the underlying mechanism that causes this effect.

It is known that RNA (a copy of the DNA blueprint which carries information to other parts of the cell) is produced all over the non-protein coding portion of the genome, but whether these DNA copies originate anywhere in particular or just at random is not well understood. In a recent collaboration with Chris Ponting’s group in the MRC Functional Genomics Unit in Oxford, the two labs tackled this question. They found that a subset of these RNA copies originate precisely at some of the regulatory regions of the genome which are known to affect how, when, and where protein-coding DNA is activated [4].

Answers always lead to more questions, however, and the authors did not speculate as to whether these RNAs might actually play a functional role in the relationship between ‘junk’ and protein-coding DNA. But what is clear is that there is quite a lot going on in these regions of the genome that were previously dismissed as rubbish, and that revealing the secrets of these mysterious pieces of DNA could hold the key to understanding many complex human diseases.

Post written by Bryony Graham.


  1. Ohno, Susumo. So much ‘junk’ DNA in our genome. Evolution of genetic systems. Brookhaven Symposia in Biology (ed. H. H. Smith et al.)23 pg. 366-370 (1972
  2. Consortium, T. I. H. G. S. Finishing the euchromatic sequence of the human genome. Nature 431, pg. 931-945 (2004)
  3. Hatton, CS., Wilkie, AO., Dysdale, HC., Wood, WG., Vickers, MA., Sharpe, J., Ayyub, H., Pretorius, IM., Buckle, VJ., and Higgs, DR. Alpha-thalassemia caused by a large (62kb) deletion upstream of the human alpha globin gene cluster. Blood 76, pg. 221-227 (1990)
  4. Marques, AC., Hughes, J., Graham, B., Kowalczyk, MS., Higgs, DR., and Ponting, CP. Chromatin signatures at transcriptional start sites separate two equally populated yet distinct classes of intergenic long noncoding RNAs. Genome Biology 14 (2013)