Description
Microbial genomes with unbalanced sequence compositions (AT or GC-rich) present unique challenges for current high-throughput sequencing methods. Single molecule sequencing technologies present new opportunities for population based genomic studies of microbial genomes. The combination of Pacific Biosciences RS Sequencer, a singlemolecule sequencing platform, and an in-house developed data analysis protocol (PacDAP) was used as a novel sequencing pipeline to sequence the GC-rich bac-terial genome of Mycobacterium tuberculosis (Mtb). The major two contributions of this pipeline are significant error correction procedure for the PacBio RS platform and elimination of systematic bias. Using the avirulent strain of the microbe, H37Ra, and comparable runs on Illumina, the error profile and accuracy of PacDAP was estimated. We have been able to show that using the RS+PacDAP protocol significantly reduces error rates and single nucleotide polymorphism (SNP) calls can be made with an unprecedented uniform Phred score of nearly 52 (1 error in over 157,000 base calls). Furthermore, base quality scores for resulting sequences were found to be stable even in regions of high GC content. Comparison of 363 clinical isolates sequenced with both PacBio C1 and C2 chemistries and processed by PacDAP showed significant improvements in both read length and coverage depth over alternative methods. In this article we show that a single SMRT cell run is sufficient and will produce an average coverage width of nearly 97.6% using C1 chemistry and 99.1% can be reported for C2. A comparison of the variant caller results with Sanger and pyrosequencing platforms demonstrates a high accuracy in detection of SNPs associated with drug resistance in mono-resistant, MDR, and XDR Mtb isolates mak-ing PacDAP an ideal platform for genome-wide diagnostics of Mtb and other similar bacteria. This pa-per also reports the successful sequencing of the MIRU region of Mtb using WGS, the discovery of structural errors in the genome of H37ra, and recommended corrections to this genome.