Description
Microbial genomes with unbalanced sequence compositions present unique challenges for current high-throughput sequencing methods. Single molecule sequencing technologies present new opportunities for population based genomic studies of microbial genomes. The combination of Pacific Biosciences RS Sequencer and an in-house developed data analysis protocol PacDAP was used as a novel sequencing pipeline to sequence the GCrich bacterial genome of Mycobacterium tuberculosis. The two major contributions of this pipeline are significant error correction procedure for the PacBio RS platform and elimination of systematic bias. Using the avirulent strain of H37Ra, and comparable runs on Illumina, the error profile and accuracy of PacDAP was estimated. Here we show that RS+PacDAP protocol significantly reduces error rates and single nucleotide polymorphism calls can be made with an unprecedented uniform Phred score of nearly 52. Furthermore, base quality scores for resulting sequences were found to be stable even in regions of high GC content.