Dna Sequencing By Frederick Sanger

1183 Words5 Pages

DNA sequencing technologies have revolutionized biology. Since the introduction of the chain termination sequencing method by Frederick Sanger in , the genomes of more than 800 bacteria and 100 eukaryotes have been sequenced, including the genomes of several human individuals. Close to a trillion base pairs are currently deposited in Genbank a central repository of genetic sequence information hosted by the NCBI and this number is rapidly increasing. This wealth of data has resulted in numerous biological discoveries and led to a better understanding of the fundamental principles of life. The dramatic impact of sequencing as a key component of modern biological research is, at first glance, surprising due to limitations in the length of DNA …show more content…

The resulting DNA segments are combined into a reconstruction of the original genome using computer programs called genome assemblers. The assembly process is often compared to solving a jigsaw puzzle metaphor that highlights several challenges. First, the assembly problem is complicated by genomic repeats sections of DNA that occur in a near-identical form throughout a genome equivalent to large stretches of sky in a jigsaw puzzle. Second, the complexity of a jigsaw puzzle increases dramatically with the number of pieces. Similarly, the difficulty of the assembly problem is dependent on the number of reads being assembled large genomes and/or short DNA fragments posing specific challenges. To overcome such challenges, sophisticated computational algorithms have been developed over the years, resulting in genome assemblers capable of reconstructing large mammalian genomes . These programs have been critical to the success of many recent genome projects including mammals , plants and worms . Despite such successes, genome assembly is far from being a solved problem. New challenges are posed by recent advances in genome sequencing technologies, both due to the sheer size of the data that need to be process, and due to the characteristics of the new data: shorter sequencing reads and new types of sequencing …show more content…

To recap, the process starts by randomly shearing a genome of interest into a collection of fragments. A subset of the fragments is then selected, usually those that fall within a prescribed size range, and their sequence is ‘read’ using a sequencing instrument, resulting in a collection of reads DNA fragments whose sequence is known. Commonly, the length of the fragments exceeds the read length achievable by a sequencing technology, therefore only the ends of DNA fragments are being sequenced. This feature has resulted in a variant of the shotgun method, double-barreled shotgun sequencing, wherein sequence reads are generated from both ends of each DNA fragment, resulting in a collection of read pairs that are separated by a known distance.
Shotgun sequencing can be viewed as a random sampling process, each DNA fragment originating from a random location within the genome. It can be mathematically shown that a certain amount of over-sampling is necessary in order to ensus that every base in the genome is sampled by at least one of the reads. The amount by which a genome is over-sampled is commonly referred to as coverage the ratio between the cumulative size of the set of reads and the size of the