DNA Sequencing – Technologies to read the code of life

dna-sequencing

What is DNA sequencing?

DNA sequencing is a method of determining the order of nucleotides in a DNA molecule. DNA sequencing has revolutionized genomics. Since 1995, DNA sequencing has made it possible to analyze the genomes of over 50,000 (as of 2020) different organisms. 

Together with other DNA-analytical methods, this technique is also used to investigate genetic diseases, among other things. Moreover, in the context of molecular cloning, DNA sequencing has become indispensable in molecular biology and genetic engineering laboratories.

Are you interested in decoding 100% of your DNA? Nebula Genomics offers the most affordable Whole Genome Sequencing! Begin a lifetime of discovery with full access to your genomic data, weekly updates based on the latest scientific discoveries, advanced ancestry analysis, and powerful genome exploration tools. Click here to learn more!

Edited by Christina Swords, Ph.D.

The challenges of DNA sequencing

Determination of the biological sequences was an unsolved problem for decades until the mid-1970s when biochemical or molecular methods were developed. Nowadays, even the large scale sequencing of entire genomes has become comparatively fast and easy. 

However, the challenges of genome sequencing are not just limited to the direct reading of the nucleic acid sequence. Owing to technical limitations, only short DNA sections (reads) up to 1000 base pairs are read in each individual sequencing reaction. For a long strand of DNA, a method known as primer walking was first used in 1979. In this procedure, the sequence was read piece by piece.

In a larger sequencing project, the Human Genome Project, several billion base pairs were sequenced. This was done by an approach known as shotgun sequencing. Today, we are able to study diseases, drug targets, DNA based identifications, etc. Thanks to the massively parallel DNA sequencing technology.

In shotgun sequencing, the DNA strand is first broken down into smaller fragments of nucleotides, which are then sequenced base by base. The sequence information of the individual short segments is then reassembled into a complete genome using bioinformatics tools.

The raw data sequence is analyzed in order to obtain biologically relevant information. Without it, any sequence information remains without scientific value.

Sequencing methods

Several methods are available today for reading the DNA sequence information using sequencing machines. For a long time, developments of the sequencing methods were built upon Sanger sequencing. Modern methods offer possibilities for accelerated sequencing through highly parallel sequencing. The new sequencing methods are often referred to as next-generation sequencing.

Classical methods

Method of Maxam and Gilbert

The method was developed by Allan Maxam and Walter Gilbert in 1977. It is based on two approaches. The base-specific chemical cleavage of DNA and subsequent separation of the fragments by denaturing polyacrylamide gel electrophoresis. 

The DNA is first labeled at a 5′ or 3′ end with radioactive phosphate or a non-radioactive substance (biotin, fluorescein). In four separate preparations, specific bases from the sugar-phosphate backbone of the DNA are then partially (limited) modified and cleaved. For example, the base guanine (G) is methylated by the reagent dimethyl sulfate and removed by alkali treatment with piperidine.

The DNA strand is then completely cleaved. In each preparation, fragments of different lengths are formed, the 3′-end of which has always been cleaved at certain bases. Denaturing polyacrylamide gel electrophoresis separates the fragments according to length, with length differences being resolved by one base. By comparing the four approaches on the gel, the sequence of the DNA can be read. 

This method enabled its inventors to determine the operon sequence of a bacterial genome. Today, the method is rarely used because it requires toxic reagents. Besides, it is more difficult to automate than the Sanger dideoxy method developed at the same time.

Maxam Gilbert sequencing approach.
Maxam Gilbert sequencing approach. Image source: JHCaufield, Wikimedia Commons

Dideoxy method (Sanger sequencing)

The Sanger dideoxy method is also called chain termination synthesis. It is an enzymatic method. It was developed by Sanger and Coulson around 1975. It was presented in 1977 with the first complete sequence of a genome (Bacteriophage φX174).

Sanger received the Nobel Prize in Chemistry in 1980 for his work on DNA sequencing. He received the prize together with Walter Gilbert and Paul Berg.

The enzyme DNA polymerase is used to lengthen one of the two complementary DNA strands. The DNA double helix is denatured by heating, after which single strands are available for further processing. 

In four otherwise identical preparations, one of each of the four bases is added as dideoxynucleoside triphosphate (ddNTP). These chain-terminating ddNTPs do not have a 3′-hydroxy group to link the phosphate group of the next nucleotide. Their addition into the newly synthesized strand halts DNA extension by the DNA polymerase because of the missing OH group.

 As a result, DNA fragments of different lengths are produced, which always end with the same ddNTP in each individual batch. 

After the sequencing reaction, the labeled break-off products from all the approaches are separated lengthwise by polyacrylamide gel electrophoresis. 

The sequence can then be read off on a photographic film (X-ray film) after exposure of the radioactive gel. The corresponding complementary sequence is the sequence of the single-stranded DNA template used. Nowadays, a variation of the polymerase chain reaction (PCR) is used as a sequencing reaction. 

Unlike PCR, only one primer is used, so that the DNA is only amplified linearly.

In the early 1980s, Fritz M. Pohl and his research group developed a radioactive method for DNA sequencing.  In this sequencing technology, DNA molecules would be transferred to a carrier during electrophoretic separation. 

The “Direct-Blotting-Electrophoresis System GATC 1500” was marketed by the Constance-based company GATC Biotech. The DNA sequencer was used, for example, in the European genome project for sequencing chromosome II of the yeast Saccharomyces cerevisiae.

Since the early 1990s, dideoxynucleoside triphosphates labeled with fluorescent dyes have been used in particular. Each of the four ddNTPs is coupled with a different dye. This modification makes it possible to add all four ddNTPs in one reaction vessel. 

Splitting into separate approaches and handling radioisotopes is no longer necessary. The resulting chain termination products are separated by capillary electrophoresis and excited to fluoresce by a laser.

The ddNTPs at the end of each DNA fragment thus show fluorescence of different colors. This fluorescence can be detected by a detector. The electropherogram directly shows the sequence of bases of the sequenced DNA strand.

A typical workflow for Sanger sequencing.
A typical workflow for Sanger sequencing. Image source: Estevezj, Wikimedia Commons

Modern approaches

With the increasing importance of DNA sequencing in research and diagnostics, methods have been developed that allow increased throughput. It is now possible to sequence the complete human genome in about 8 days. The corresponding methods are called second-generation sequencing.

 Different companies have developed methods with different advantages and disadvantages. Apart from those listed here, there are others. Second-generation DNA sequencing was named “method of the year 2007”  by the journal Nature Methods.

DNA sequencing using pyrosequencing

Pyrosequencing uses a DNA polymerase to synthesize the DNA counter strand, although the type of DNA polymerase can still vary. 

The DNA mixture is ligated with a DNA adapter and coupled to beads via a complementary adapter sequence. The beads loaded with DNA are placed on a plate with pores. The detection is dependent on a light detector.

The DNA polymerase is observed “in action” as it successively attaches individual nucleotides to a newly synthesized DNA strand. Successful insertion of a nucleotide is converted into a light signal that is detected by a detector. The DNA to be sequenced serves as a matrix strand and is available in a single-strand form. 

DNA strand elongation is done by adding one of the four types of deoxynucleoside triphosphates (dNTP) complementarily. When the appropriate nucleotide is added, a signal is obtained. The addition of an unsuitable NTP does not result in detectable signal. Then the existing NTP is destroyed and another type is added; this continues until a reaction is shown again.

This is a luciferase enzyme-mediated reaction. When a complementary nucleotide is incorporated by the DNA polymerase, pyrophosphate (PPi) is released. The pyrophosphate is converted to adenosine triphosphate (ATP) by  ATP sulfurylase. 

The ATP drives the luciferase reaction, which converts luciferin into oxyluciferin. This in turn results in a detectable light signal – the strength of which is proportional to the ATP consumed.

Pyrosequencing has been used to determine the frequency of certain gene mutations e.g. in the investigation of genetic diseases. Pyrosequencing can be easily automated and is suitable for the highly parallel analysis of DNA samples.

Pyrosequencing.
Pyrosequencing. Image source: Thisisjp, Wikimedia Commons

Sequencing by hybridization

For this purpose, short DNA sections (oligonucleotides) are fixed in rows and columns on a glass carrier (DNA chip or microarray). 

The fragments of the DNA to be sequenced are labeled with dyes and applied to the oligonucleotide matrix. This allows the complementary fixed and free DNA sections to hybridize with each other. After washing out unbound fragments, the hybridization pattern is read from the color labels and their strength. 

The sequences of the fixed oligonucleotides and their overlapping regions are known. Hence, the color pattern can ultimately be used to deduce the underlying overall sequence of the unknown DNA.

Ion-semiconductor DNA sequencing system

The Ion Torrent method uses semiconductor chip technology to perform direct non-optical genome sequencing using integrated circuits. 

Sequencing data is obtained directly from the detection of ions produced by template-dependent DNA polymerases. The semiconductor chip used in this method has an ion-sensitive field-effect transistor. Its sensors are arranged in a grid of 1.2 million wells in which the polymerase reaction takes place. This grid enables parallel and simultaneous detection of independent sequence reactions.

Complementary metal-oxide-semiconductor (CMOS) technology is used, which allows a cost-effective reaction at high measuring point density.

Ion semiconductor sequencing technology, Source: Konrad Foerstner, Wikimedia Commons
Ion semiconductor sequencing technology, Source: Konrad Foerstner, Wikimedia Commons

Sequencing with bridge synthesis

This sequencing technology is known as sequencing with bridge synthesis of Solexa/Illumina. The double-stranded DNA to be sequenced is ligated at both ends with a different adapter sequence at each end. 

The DNA is then denatured, ligated onto a carrier plate in a single-stranded form after dilution. And further amplified in situ by bridge amplification. This creates individual areas (clusters) of amplified DNA on the carrier plate, which have the same sequence within a cluster. 

The PCR reaction is with four differently colored fluorescent chain-terminating substrates. The respective base incorporated per cycle is determined in real-time in a cluster.

Two-base encoding

Sequencing by Oligo Ligation Detection (SOLiD) from Applied Biosystems is a variant of Sequencing by Ligation. 

A DNA library is diluted and coupled to microbeads with a DNA polymerase. The DNA is then amplified in an emulsion PCR. Thus, each microbead contains several copies of a single DNA sequence. The microbeads are modified at the 3′-end so that they can be individually attached to a carrier plate.

 After primer binding, four different cleavable probes are added, each of which is marked with a different fluorescent color. They bind to the DNA template by means of the first two nucleotides (CA, CT, GG, GC). Next, the microbeads are ligated with a DNA ligase. The probes are then cleaved, thereby releasing the labels. 

Using up to five primers, each base in the DNA sequence is determined in at least two different ligation reactions.

SOLiD platform.
SOLiD platform, Image source: Philippe Hupe, Wikimedia Commons

DNA Sequencing with paired ends

A clearly identifiable signal can be obtained by generating short pieces of DNA from the terminal ends of a DNA sequence. This DNA sequencing method is also known as paired-end tag sequencing, PETS.

Third-generation DNA sequencing

For the first time, third-generation sequencing measures the reaction in single molecules as a single-molecule experiment. 

This eliminates the need for amplification by PCR prior to sequencing. Polymerases show bias for binding to some DNA sequences. Hence this avoids uneven amplification by thermostable DNA polymerases. 

Thus, some sequences can be overlooked. Furthermore, the genome of an individual cell can be examined. 

The released signal is recorded in real-time. In third-generation DNA sequencing, two different signals are recorded, depending on the method used. They are released protons or fluorophores. The DNA and RNA sequencing of individual cells was named “method of the year 2013” by the journal Nature Methods.

Third-generation sequencing technology. Image source: Chandra Shekhar Pareek, Rafal Smoczynski, Andrej Tretyn, Wikimedia Commons
Third-generation sequencing technology. Image source: Chandra Shekhar Pareek, Rafal Smoczynski, Andrej Tretyn, Wikimedia Commons

Nanopore sequencing

Nanopore sequencing is based on changes in the ion flow through nanopores embedded in an artificially created membrane. Nanopores are biological or synthetic pores or even semi-synthetic pores. The nanopore is embedded in an artificial membrane that has a particularly high electrical resistance. 

The pore is permanently open in contrast to ordinary ion channels.  This allows a constant flow of ions through the membrane after the application of a potential. DNA molecules passing through the pore lead to a reduction of the current. This current reduction has a specific amplitude for each nucleotide, which can be measured and assigned to the corresponding nucleotide. 

In single-strand sequencing, a double-stranded DNA strand is separated by a helicase and introduced into the nanopore. 

In the case of a MspA pore, four nucleotides of the DNA are simultaneously located within the pore. The speed of passage depends, among other things, on the pH value difference across the membrane. The specific ion current changes for each of the four nucleotides allow the sequence to be read. 

An evaluation is carried out, for example, with the Poretools software. The advantage of this method is that it has a constant accuracy even with long DNA strands. A variation of the method is used for protein sequencing.

The nanopore sequencing technology is being advanced by the British company Oxford Nanopore Technologies. Their “MinION” sequencer was initially only available through an “Early Access Programme”. But it has been available through conventional distribution channels since 2015.

Before you leave, check out our next-generation sequencing-based Whole Genome Sequencing!