A Review of Next Generation Sequencing Methods and its Applications in Laboratory Diagnosis

Next-generation sequencing (NGS) is a new technology used to detect the sequence of DNA and RNA and to detect mutations or variations of significance. NGS generates large quantities of sequence data within a short time duration. The various types of sequencing includes Sanger Sequencing, Pyrosequencing, Sequencing by Synthesis (Illumina), Ligation (SoLID), Single molecule Fluorescent Sequencing (Helicos), Single molecule Real time Sequencing (Pacbio), Semiconductor sequencing (Ion torrent technology), Nanopore sequencing and fourth generation sequencing. These methods of sequencing have been modified and improved over the years such that it has become cost effective and accessible to diagnostic laboratories. Management of Outbreaks, rapid identification of bacteria, molecular case finding, taxonomy, detection of the zoonotic agents and guiding prevention strategies in HIV outbreaks are just a few of the many applications of Next Generation sequencing in clinical microbiology.


INTRODUCTION
Next-generation sequencing (NGS) is a new technology used to detect the sequence of DNA and RNA and mutations or variations of significance.NGS generates large quantities of sequence data within a shorter time duration as compared to conventional Sanger's method of sequencing.This novel technique uses different chemistries, matrices and bioinformatics technologies which can be used to sequence entire genomes or different lengths of DNA and RNA sequences in shorter time periods. 4The first Next Generation Sequencing technique to be commercially available was the massively parallel pyrosequencing platform in 2005. 46NA sequencing is done various steps which includes : 1.DNA fragmentation 2. Gene Library 3. Sequencing 4. Data analysis

DNA Fragmentation
Targeted DNA is broken into several small segments using different methods like sonication, enzymatic digestion.The required short segments are isolated using different methods such as Hybridisation Capture Assay, Amplicon Assay Genomic Library An organism's genome gets broken down into smaller pieces.Each piece is cloned into a unique vector, which is then carried by a unique microbial cell.A collection of recombinant vectors represents the organism's whole genome.This helps to analyse isolated groups of genes, hence understand their expression and function.It is this library that is sequenced using the various DNA sequencing methods.
All the methods of DNA sequencing starts by attaching short oligonucleotide sequences to the ends of fragmented short strands of DNA.These short oligonucleotide sequences are called adapters.The 5' and 3' ends of fragmented or amplicon DNA are annealed with certain DNA adaptor sequences.Double-stranded DNA adapters are 20-40bp segments with known sequences.On both the 5' and 3' ends of the fragmented DNA, there are two distinct adaptor sequences that can anneal to the DNA fragments.
Each DNA fragment has an adapter on one end that connects it to a solid substrate such as beads or flow cells, and another adapter on the other end that anneals to a primer that starts the polymerase chain reaction (PCR).PCR produces several copies of the same fragment, which are sequenced at the same time.As a result, these techniques are sometimes referred to as massively parallel sequencing techniques.

DNA Sequencing
An NGS sequencer is used to perform massive parallel sequencing.In a specific sequencer, the library is uploaded onto a sequencing matrix.The platform on which the sequencing takes place is known as a sequencing matrix.Sequencing matrices differ depending on the sequencer.For example, the Illumina NGS sequencer uses flow cells, while the Ion Torrent NGS sequencer uses sequencing chips.
Several generations of sequencing methods have been developed.The first method used for sequencing was Sanger's DNA sequencing.Dr. Frederick Sanger's did research to design a DNA sequencing method in the 1970s.He had already published methods for RNA sequencing in the late 1960s.Sanger sequencing was the most commonly used tool in genomic research that led to extraordinary accomplishments.For example, the high-quality, reference sequence of the human genome under the Human Genome Project (HGP) was a product of Sangers sequencing.
The synthesis of DNA occurs in the 5' to 3' direction.The DNA double helix unwinds to expose single strands of the DNA which are replicated by adding on complementary nucleotides (dNTPs).Several enzymes are involved in the replication of DNA including DNA polymerase, helicase, topoisomerase and DNA gyrase.The replication of DNA usually stops when the replication fork ends or upon reaching termination sequences in the template DNA strand.
Here the DNA to be sequenced is used as a template.The reaction mixture consists of primer (a short sequence of DNA complementary to the region to be sequenced), DNA polymerase, the four deoxynucleoside triphosphates (dNTPs), and dideoxynucleoside triphosphates (ddNTPs).The single stranded template DNA is mixed with this reaction mixture.The synthesis of DNA is carried out as usual with dNTPs till a ddNTP is added to the complementary strand.The ddNTP lacks a hydroxyl group at the 3' end, hence no interaction occurs with the 5'PO 4 of the next dNTP.The elongation of the complementary chain of DNA stops with the ddNTP.Hence, Sanger's sequencing is also called the Chain Termination DNA sequencing method.
Several such strands of DNA are obtained and the various locations of the particular nucleotide on the complementary strand is determined by performing electrophoresis on thin slab polyacrylamide gel.Individual synthesis reactions are prepared for each ddNTP.In the initial models, the position of the different fragments were identified using P 32 which was used to label the dATP molecules.This was observed on an X ray film. 7ater use of fluorescent labels simplified the process of reading the electrophoresis plates.In 1986, Applied Biosystems introduced a fluorescent DNA sequencing instrument, which used fluorescently labelled primers.The primer for specific nucleotide reaction was labelled with a specific fluorochrome and a scanning laser beam emitting specific wavelengths scanned the surface of the gel.Different excitation wavelengths were emitted from the fluorescently labelled primers which were detected during the separation of fragments electrophoretically.Initially, the fluorochromes were used to label the primer sequences for identification.The further sequences and position of nucleotides were identified by the length of the electrophoretically separated fragments.Improvements were made in the technology and fluorescently labelled dideoxynucleotides started to be used known as terminators.This allowed all the four dideoxynucleotides to get labelled with different fluorochromes and the same reaction vessel could be used.This helped in lowering the cost of the run.
Automated methods for sequencing became popular with the introduction of fluorescent labelling.ddNTP s are fluorescently labelled and all the four reaction mixtures are loaded into a single lane of a gel.These fragments are separated using electrophoresis.The position of each ddNTP is detected by laser beams when the ddNTPs leave the gel.A chromatogram documenting the fragment order is generated in which the amplitude of each spike represents the fluorescent intensity of each fragment.
Further improvement in the technology was made by introduction of Capillary gel electrophoresis instead of slab gel electrophoresis.This involved directly injecting a polymeric separation matrix into capillaries leading to singlenucleotide resolution. 7he sample containing the fragmented and amplified DNA is loaded into the capillaries.Electrical current pulses were passed through the loaded capillary gel thereby separating the DNA fragments to obtain single nucleotide resolution through a process known as electrokinetic injection.
Sanger's Chain termination method is very time consuming and expensive method of gene sequencing.The human genome project which took 10 years to complete used Sanger's Chain termination DNA sequencing method.Newer methods of DNA sequencing which does not require construction of Genomic Library have been developed.

Pyrosequencing / 454 Sequencing
A succession of enzymatic reactions are used which leads to generation of visible light.Each plastic bead has one DNA fragment attached which is amplified by PCR within an oil-water emulsion.The final product has is a about one million copies of the DNA fragment covering the bead.The PCR products are denatured and each bead is deposited into picoliter sized wells.Three important enzyme reactions take place in these wells.In the first step, DNA synthesis occurs using a primer and single type of unlabelled dNTP such as d ATP, dGTP, dCTP, or dTTP which is catalysed by DNA polymerase.With each nucleotide added, a pyrophosphate (PPi) molecule is released.In the second step, the enzyme ATP sulfurylase catalyses the conversion of the PPi to ATP.
In the third step, this ATP is converted to light using the enzyme firefly luciferase.A flash of light is hence generated for each nucleotide added and the intensity of the flash depends on how many nucleotides were added.The flashes generated from each well is correlated with the nucleotide added in the particular well.A Computer software is used to monitor the growth of the DNA chain synthesised, one nucleotide at a time.

ABISOLiD
This method is called sequencing by ligation or SOLiD technology.Amplification of genomic fragments is similar to pyrosequencing.The amplified fragments are moved onto a glass support surface where a primer is hybridised to the adapter.Following denaturation of amplified fragments, a primer with complementary nucleotide sequence to the adapter is added.Eight based oligonucleotides (octamers) are attached through hybridization to each fragment and this is yet again attached to the primer by the enzyme ligase.The bases in the fourth or fifth position of the octamer are labelled with fluorescent markers.To identify the bases, laser light is used to stimulate the fluorescent labels.After that, the fluorescent label is removed by cleaving the ligated octamer after the fifth base.The ligation and cleavage process is repeated, and the sequence is identified based on the detected fluorescence.
Whole genome resequencing, targeted resequencing, transcriptome research (including gene expression profiling, small RNA analysis, and whole transcriptome analysis), and epigenome research are among applications of SOLiD.

Solexa/Illumina Sequencing
Solexa/Illumina sequencing is sequencing by synthesis and reversible dye terminators which help to identify each nucleotide added onto the DNA strand during the sequencing process.This type of sequencing uses a glass slide coated with a lawn oligonucleotides in 8 channels known as the flow cell.The sequences of these oligonucleotides complement the adapters on both the 3' and 5' ends of the DNA strand, respectively.Hence the fragments of DNA with the adapters at both ends attach to the glass slide.This forms the solid surface where the sequencing occurs.The adsorbed DNA strand bind on the 3' and 5' ends to complementary oligonucleotides in the flow cell.Using the Template DNA strand linked to the flow cell via adapters, a polymerase enzyme synthesises the complementary DNA strand.The denatured double-stranded molecule is formed and the original template is washed away.Bridge PCR is used for clonal amplification.Bridge PCR is a technique in which the DNA strand linked to the flow cell on the 3' end folds over, allowing the adaptor on the 5' end to hybridise with the flow cell's matching oligonucleotide.A DNA polymerase then generates complementary strands, resulting in the formation of a double-stranded bridge.This double stranded bridge is denatured resulting in 2 single stranded copies of the molecule attached to the flow cell.Ultimately, many copies of a single DNA template are present as clusters in bundles at their respective locations.Once cluster generation is complete, the templates are ready for sequencing.Cluster generation is essential for achieving the signal intensity required for sequencing.

Reads
This technology applies the Real time principle to sequencing.A SMRT bell library is created by ligating adapters to both ends of the denatured dsDNA creating a circular template.The platform for sequencing here is a SMRT cell which contains millions of tiny wells called Zero mode Waveguides (ZMW).A single circular DNA molecule is incorporated in a single ZMW.Primers along with DNA polymerases are added which adds labelled nucleotides.With every nucleotide added light is discharged.The nucleotide addition is hence, observed in Real time.

Nanopore DNA Sequencing
Nanopore sequencing devices uses flow cells as their platform for amplification.These flow cells contain numerous minute holes called as Nanopores which are embedded in an electroresistant membrane.minute holes are lined by electrically conducting molecules such as iron molecule.As a result, the Nanopore has an electric field and acts on its own as an electrode, which is linked to a channel and sensor chip that gauges the electric current flowing through it. 8The DNA fragment to be sequenced is passed through the nanopore such that at a time only one nucleotide can pass through.The different nucleotides produce a change in the magnitude of current which is detected in real time by the sensor chip.The alteration of the electric field of the nanopore is different for each base which is used for detecting the nucleotide base.The characteristic squiggle in the electric field produced by the passage of each nucleic acid through the nanopore allows the DNA to be sequenced.This method can be used for sequencing of DNA, RNA and proteins.The currently in use nanopores are single channel nanopores like biological nanopores and solid state nanopores.Recent research has shown promise for multichannel nanopore arrays where the fluidic chip has multiple pores set in a parallel sequence facing one common chamber where the DNA is added.This in turn helps to achieve better and faster results. 29

Helicos tSMS
It is the first commercial NGS platform using the principle of single molecule fluorescent sequencing.DNA samples are first fragmented to small size molecules of 100-200 nucleotides in length.A universal poly A sequence along with a fluorescent adenosine nucleotide is attached to the 3' end of each strand which acts as the template.A Helicos flow cell is the surface where hybridization occurs.The flow cell has several oligoT oligomers immobilised on its surface which act as capture sites for the polyA tagged DNA fragments.The flow cell that contains the hybridised fragments is loaded into the instrument.Each fluorescently labelled template is detected by a laser that illuminates the flow cell.On the flow cell surface, a CCD camera creates a map of the template.The fluorescent template label is cleaved and cleared away once the templates have been recorded.A DNA polymerase is added which catalyses the addition of single type of complementary fluorescently labelled nucleotides one at a time to the primers.A wash step removes the unhybridized nucleotides and DNA polymerase.A laser beam is used again to detect and image the fluorescence from the added nucleotides at the specific locations on the flow cell surface.The fluorescent label is cleaved yet again, and the procedure is repeated with each of the other nucleotide bases until the DNA is sequenced to the appropriate length.Imaging is done with each addition of the nucleotides.This method can sequence a billion bases per hour.Hence, the time required to sequence complete genomes is considerably reduced.

Fourth Generation Sequencing
These methods are used especially in histology specimens where the sequencing is done in situ in the cells, preserving the spatial co ordinates of RNA and DNA.This ultimately enables to map the sequenced data back to the histological context.This enables to identify tumor microenvironments to identify new targets for therapy.Examples of fourth generation sequencing includes single cell RNA sequencing (scRNA-seq) technology, in situ sequencing (ISS).

Comparison Of Various Sequencing Methods
Next generation sequencing was a great advancement over the Sanger's sequencing and helped ensure that this method could be commercially available for use.Some advantages over Sanger's sequencing include 1.Sequencing library construction and clonal amplification of DNA as part of the sequencing procedure 2. Array based sequencing in which DNA can be multiplexed and larger through put achieved 3. Immobilisation of DNA on solid phase as the platform The first NGS sequencer to be commercially available was the pyrosequencer.It had several advantages over Sanger's method in having a higher sensitivity, it was faster and more cost effective.The disadvantages of pyrosequencing includes:a.Laborious sample preparation. 26.Only short sequences can be sequenced. 6c.Prone to produce errors as the reading was through homopolymeric sequences. 6d.Expensive instruments. 26he Illumina sequencer though offers a high throughput of data, requires very expensive instruments. 26BI/SOLiD has very high throughput and lesser cost of reagents but it takes a long duration of time to give results and the cost of the instrument is high. 26he short read length and resequencing is a key disadvantage of the Sequencing by Ligation technique.The Nanopore sequencing method has several advantages over the earlier sequencers given that it can read long lengths of fragments, is fluorescent tag-free, 7 and that use of enzymes is remarkably low thereby reducing the need for stringent temperature controls. 8The problem with the Nanopore sequencer and the Pacific Biosciences Single Molecular Real time Reads is that they are expensive, and low through put.

Clinical Microbiology Applications of Next Generation Sequencing
Management of Outbreaks, identification and surveillance of pathogens, rapid identification of bacteria using the 16S-23S rRNA region, molecular case finding, taxonomy, metagenomics approaches on clinical samples, and the determination of the transmission of zoonotic microorganisms from animals to humans are just a few of the many applications of Next Generation sequencing in clinical microbiology.

Outbreak Management
Whole genome sequencing and NGS can be used to detect and monitor Outbreaks in hospital settings, in public health and epidemiological studies. 28Apart from Outbreak tracing and characterisation, NGS can be used to implement control measures to prevent the spread of the infective agent. 25This is particularly useful in outbreaks caused by Multidrug Resistant organisms 30,37,31 and highly virulent strains of bacteria.This can be applied in the high risk areas initially such as transplant units, neonatal ICU'S etc. to promptly curtail any outbreaks. 22,33,34he main challenge here is the lack of user-friendly platforms in the bioinformatics software for the interpretation of data by diagnostic microbiologists. 22A global initiative to sequence and store genetic information about all known pathogens would help to curb outbreaks of emerging pathogens in the initial phase of community transmission. 47

Molecular Case Finding
A molecular case definition is given to the isolates producing the outbreaks and are further used in clinical and epidemiological investigations.These databases are used retrospectively in outbreaks later on to form a conclusive diagnosis.As a result, cases that would have been missed by standard epidemiological investigations can now be tracked down. 41

Characterisation of Pathogens
NGS can be used to characterise the pathogens with respect to virulence characteristics and detection of novel resistance genes.This is of special significance in organisms such as Mycobacterium tuberculosis, where Next Generation sequencing methods can provide data regarding resistance against anti tubercular drugs precipitately within 8 -9 days as it can be done directly from the specimen.This in turn helps the physician to administer well targeted therapy. 23he epidemiology and evolution of pathogens such as Yersinia pestis, Vibrio cholerae, Methicillin resistant Staphylococcus aureus etc. has been studied using these sequencing technologies aiding in our understanding of the emergence of these epidemic clones and ultimately helping in preventing such epidemics. 32,35

Influenza Virus Vaccine Development
Influenza virus undergoes genetic reassortment, antigenic shift and antigenic drift thereby requiring active surveillance to detect the strain in circulation in the particular region and device vaccines according to the circulating strains.The evolution of human and animal influenza viruses are monitored all around the year to select the appropriate strains for vaccine development. 21

Guiding Prevention Strategies in HIV
In certain circumstances where a sudden increase in the number of reported cases of HIV occurs, it is important to trace the source and detect HIV genomic sequences and analyse them with epidemiological data to characterise the outbreak and establish the transmission dynamics.This can be done accurately in a shorter duration of time using Next generation sequencing. 21][44]

Detection of Emerging Pathogens
Use of sequencing is pivotal for diagnosis of emerging infections using genomic and metagenomic analysis. 2The detection of potential bioterrorism agents can be done at a faster pace using Next Generation sequencing.Strain specific genotyping helps in tracing the contacts in such suspected cases. 45

Bacterial Taxonomy
Whole Genome sequencing and 16s rRNA gene are used to identify the taxonomic trees of bacteria.

Metagenomics
Metagenomic analysis of the organisms in a particular environment such as gut microbiome has vast implications.Research has been done to find the relationship between the gut microbiome and many illnesses such as diabetes, Irritable bowel syndrome, obesity etc. Targetted Next Generation Sequencing of the 16s-23s rRNA cluster region for characterisation of bacteria from clinical specimens can be used where the entire microbial population in a clinical sample is detected and analysed. 36

Detecting Zoonotic Transmission
Human and animal pathogens can be well distinguished by NGS with more conviction.Early and specific detection of such zoonotic transmission helps in precise and directed treatment targeting the right organism.It is also very important in curtailing community spread of these zoonosis.The virulence factors allowing such transmission also can be studied using NGS.

Diagnosis of Genetic Disorders by Next Generation Sequencing
In genetic disorders where the clinical diagnosis is unclear, genetic testing can help with precise diagnosis.Three methods of sequencing for diagnosis are broadly gene panel, exon and genome sequencing

Control of Antimicrobial Resistance
Whole genome sequencing has immense potential to help in curtailing antimicrobial resistance.The detection of specific genetic changes associated with the development of resistance, 27 to the antibiotic as well as the identification of newer targets of action for drugs can be identified which opens the possibilities of novel antibiotic development. 24

CONCLUSION
Next generation sequencing methods have evolved through several years to reach a point where it can be used in clinical diagnostic laboratories and research labs to create quality work.It helps to determine new targets for therapy and diagnosis which in turn broadens the horizon for patient care.