We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
Mycobacterium tuberculosis infects and kills millions of people worldwide each year. Next-generation sequencing (NGS) is fundamental to M. tuberculosis basic research and many clinical applications like identifying targets for molecular diagnostics and vaccine development. Coverage across Illumina sequenced M. tuberculosis genomes is known to vary with sequence context, but this bias is poorly characterized. In my first project, I discern Illumina 'blind spots' in the M. tuberculosis reference genome for seven sequencing workflows. I found blind spots to be widespread, affecting 529 genes, and provide their exact coordinates, enabling salvage of unaffected regions. Surprisingly, coverage bias persisted in shorter tracts of homopolymers than previously reported. A modified Nextera library preparation that amplifies DNA with a high-fidelity polymerase markedly attenuates coverage bias in G+C-rich and homopolymeric sequences, expanding the 'Illumina- sequenceable' genome. In my second project, I use long-read sequencing data paired with a novel bioinformatics approach to uncover heterogeneous structural variation in the M. tuberculosis genome. While heterogeneity is a known feature of bacterial populations, sequencing studies typically represent them as a single consensus genome. When heterogeneity is studied it is limited to small variants, and structural variants have received far less attention. Here, I found heterogeneous SVs to be constitutively present across a global set of isolates, demonstrating their influence in M. tuberculosis evolution. Through these findings from both projects, I highlight effective strategies for handling bias in microbial sequencing data and illustrate the structural complexity of the genomic composition of microbial populations.