The Dark Matter of the Genome

Only 1−2% of the human genome encodes protein coding genes. The rest of the genome consists of non-coding RNA, untranslated regions, splice sites and transposable elements. Most of the functions of these elements are unknown.

Image Credit: Black Prometheus / Shutterstock

Non-coding RNAs

Non-coding RNAs or ncRNAs are transcripts which do not code for proteins. Micro RNAs (miRNAs) are small ncRNA with 18−25 nucleotides. They can bind to complimentary regions on mRNA and prevent their translation and reduce the stability.

Deletion of miRNA has also been associated with cancer progression. Apart from deletion, point mutations in miRNA can also affect miRNA processing and its target recognition of the mRNA sequence.

Long non-coding RNA

More than 50,000 non-coding RNAs are transcribed in the human genome which are not translated in to proteins. These non-coding RNAs are mostly longer than 200 nucleotides in length; hence they have been termed long non-coding RNAs (Lnc RNA).

Although they do not code for proteins, studies have uncovered critical roles they play in the regulation of several processes inside the cell. Lnc RNA can be present in nucleus or cytoplasm where they have been shown to regulate the cell cycle, cell differentiation, proliferation, and transcriptional regulation of gene expression. Lnc RNA can act by recruiting epigenetic effectors which can modify the expression of protein coding genes without altering the DNA sequence.

Transposable elements

A large portion of the non-coding regions are constituted of “jumping genes” or “transposable elements”. These regions can “jump” from one region of the genome to another. Several functions have been attributed to these genes. They can encode regulatory sequences which in turn regulate the expression of protein coding genes.

As these genes can move and insert themselves in to different regions, they can sometimes enhance, reduce, or totally stop the expression of coding sequences based on where they get inserted.  For example, some of these genes have been found to be involved in the neurodegenerative disease Amyotrophic Lateral Sclerosis (ALS).

Regulatory elements

Although regulatory regions do not code for proteins, they contain promoters and enhancers which can influence the expression of coding genes. Also, any structural alterations in these regions, such as translocations, deletions, insertions, or duplications can lead to changes in the interaction between the regulatory elements and coding genes. Many of them are also present in the vicinity of oncogenes and regulate their activation or repression.

5’-Untranslated regions (5’-UTR)

5’-UTR as the name suggests, are sequences which are not translated, and they lie adjacent to the coding regions in mRNA.  Although functions of all the 5’-UTRs are not known, many of them have been found to regulate translation or mRNA stability through different mechanisms.

They can also influence translation of coding regions by reducing the access of translational machinery to the coding regions. Mutations in this region can also lead to creation of initiation codons. For example, generation of premature start codons by mutations in 5’-UTR have been shown to create melanoma.  But the functional characterization of 5’-UTR and their mutations is still incomplete.

Introns and splice sites

Introns are also non-coding regions, and often mutations and alterations in introns and intronic splice sites do not receive much attention. However, changes in the splice sites in introns can lead to deletion of exons or inclusion of introns present next to them.

Many cancers are associated with mutations in intronic splice sites which lead to deletion of essential exons.  Introns may also contain regulatory elements, and mutations may lead to destruction of those sequences leading to change in the gene expression.

Although the non-coding region constitutes almost 98% of our genome, they may contain important regulatory factors which control the levels and expression of the 2% of the coding regions.

Sources

  • The dark matter of the cancer genome: aberrations in regulatory elements, untranslated regions, splice sites, non-coding RNA and synonymous mutations. EMBO Mol Med. 2016 May 2;8(5):442-57.
  • Revealing the Dark Matter of the Genome.
  • Dark matter of the genome, part 1.

Further Reading

  • All Genomics Content
  • What is Genomics?
  • Next Generation Sequencing: The Basics
  • Genome Analysis
  • How Important is Intronic Variation?
More…

Last Updated: Feb 26, 2019

Written by

Dr. Surat P

Dr. Surat graduated with a Ph.D. in Cell Biology and Mechanobiology from the Tata Institute of Fundamental Research (Mumbai, India) in 2016. Prior to her Ph.D., Surat studied for a Bachelor of Science (B.Sc.) degree in Zoology, during which she was the recipient of anIndian Academy of SciencesSummer Fellowship to study the proteins involved in AIDs. She produces feature articles on a wide range of topics, such as medical ethics, data manipulation, pseudoscience and superstition, education, and human evolution. She is passionate about science communication and writes articles covering all areas of the life sciences.  

Source: Read Full Article