moving the lamppost

random musings of a molecular biologist turned code jockey in the era of big data and open science.

Fundamentals of Gene Expression: the ‘Central Dogma’ of Molecular Biology


This is an adaptation/update of a post I wrote a long time ago on my other blog Here is the original post in case you would like to read it.

The purpose of this post is to serve as a background foundation for me to point to when I reference certain aspects of gene expression in posts on this blog, especially as related to descriptions in my ‘dissertation’ posts.

The “Dogma” of Gene Expression?

There is a concept in molecular biology that is referred to as “the central dogma”. I do not like this name as I feel that the language is too close to that of religion, but history has firmly cemented its use to describe the concept. In general, it is described as:

Genetic information flows from nucleic acid (DNA and RNA) to nucleic acid, or from nucleic acid to protein, but does not flow from protein to nucleic acid.

I believe that Francis Crick was one of the first to assert this. Figure 1 reproduces a diagram of the central dogma from circa 1958, as reconstructed by Francis Crick. There had been a strong debate at the time over what molecule was transferring the genetic information from old cell to new cell. The two types of molecules involved were proteins and nucleic acids. The legendary description of the double helix by Crick and Watson went a long way towards supporting DNA as the general data keeper, and the data has born this out.

In this post I will go over some of the implications of the central dogma, and provide an introduction into how DNA, RNA, and proteins interact to enable the life processes of our cells.


Figure 1

DNA as Database

There are a couple reasons why DNA seems to be the main storage molecule. DNA is more stable than RNA due to its molecular structure. In addition, molecules of DNA are usually found in the famous double stranded helix orientation which yields a much more stable complex than single stranded nucleic acids. The double helix also provides a very useful way to duplicate genetic information as a cell divides.

Nucleic acids store information. Ok. What kind of information? I said that information flows to proteins. What does that mean? Well DNA stores the instructions for how to make proteins. Both DNA and proteins are polymers. Like a metal chain, which is made up of repeating links of metal, DNA and proteins are made of smaller repeating building blocks or residues. Unlike a chain, which usually has only one type of link, DNA and proteins have multiple types of building blocks. Nucleic acids generally use four different residues called nucleotides. Proteins are made up of about 20 different residues called amino acids. So the information I was talking about is encoded as a particular pattern of the building blocks arranged in a sequence. Just as a sentence in English can encode a virtually infinite number of meanings using sequences of only 26 letters, a protein uses a set of 20 amino acids to encode a virtually infinite number of patterns.

The specific pattern of amino acids determines the properties of the protein and therefore its function in the cell. There are many reasons that the sequence determines function, but for now, suffice it to say that the sequence of amino acids that make up a particular protein is vital to its function. It is this vital information that is encoded in the DNA.

Proteins do the work of life. They digest our food; they repair tissue damage; they let our cells communicate; they attack intruding organisms. DNA is how our cells know how to make these molecules.

A Typical Example

What follows is a simplified version of how the information for how to make a protein gets from the DNA to being manifested as a functional protein. I will describe how it happens in a eukaryotic cell because that is the type of cells that humans are made of. Eukaryotes (organisms with eukaryotic cells) separate their DNA from the rest of the cell into what is called a nucleus (Figure 2). This protects the DNA from the cellular processes that happen in the cell and allows for an extra level of control when it comes to gene regulation.


Figure 2

The parts of an eukaryotic cell. (Click to enlarge)

[image credit: Mariana Ruiz]

The Library Analogy

You can think of the nucleus as a library. The DNA inside is like a giant set of reference books. In a library, you usually cannot check out the reference books right? You have to copy the information that you need into notes, and you will use these to write your paper. It is the same with the DNA in the nucleus. So what do cells copy notes with? RNA.

When RNA is used in this fashion, we call it messenger RNA (mRNA) because it carries the message of how to make a protein. There are certain types of proteins in the the nucleus that can unzip the double stranded helix of DNA. Once this happens a copy of that region of the DNA is made, but with RNA instead of DNA. This process of copying the DNA sequence into RNA sequence is called transcription.

DNA Transcription (Basic) (Advanced)

I recommend watching both the basic and advanced versions of these videos. They each contain certain aspects that are left out of the other.

[video credit: Cold Spring Harbor Laboratory - DNA Learning Center]

Only one strand is copied. This single stranded RNA copy of the gene then leaves the nucleus. Outside the nucleus, another set of proteins reads the sequence of mRNA and gathers free floating amino acids to fuse them into a chain. The sequence of the mRNA determines the order in which each amino acid is incorporated into the growing protein. The process of translating the mRNA sequence into a protein sequence is called translation.

mRNA Translation (Basic) (Advanced)

[video credit: Cold Spring Harbor Laboratory - DNA Learning Center]

That’s pretty much it! Not that complicated eh?

Not all organism’s use DNA as their storage molecule, however. Some viruses encode their genome with RNA, others use DNA but only a single strand of it whereas the norm is double stranded DNA. I have described what is typical, but you should know that there are forms of life that use variations of this.


The central dogma of molecular biology is the observation that genetic information generally can be transmitted from DNA/RNA to other DNA/RNA or from DNA/RNA to protein, but not from protein to DNA/RNA I should say here that we cannot say that it is impossible for information to flow from protein to nucleic acid; it may be. However, no one has postulated a mechanism whereby this might happen, and has never been observed. It may be discovered someday, but that is VERY unlikely. DNA is usually the storage molecule for genetic information which is then encoded into RNA so that it can ultimately be translated into a protein.



[Crick1970]Crick, F. (1970). Central dogma of molecular biology. Nature, 227(5258), 561-3. Retrieved from