We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
The Mycobacterium-specific PE and PPE protein families comprise approximately 10% of the Mycobacterium tuberculosis genome, however, the function of the proteins encoded by the gene families is largely unknown. The gene families are often excluded from studies due to their high GC content and repetitive sequences which can result in systematic biased error in these regions when using short-read sequencing and subsequent processing methods. However, some members are implicated in antigenic variation, suggesting important host-pathogen interactions are occurring. In this study, 97 reference-quality genomes of clinical M. tuberculosis isolates and their pangenome were used to elucidate evolutionary patterns among the PE and PPE protein families. Large amino acid motifs were identified from PE and PPE proteins present in the reference genome H37Rv. These amino acid motifs identified 247 PE and 118 PPE genes from the pangenome not in the reference genome. Phylogenetic analysis revealed large sequence diversity among the PE_PGRS (polymorphic GC-rich repetitive sequence) in the pangenome. Several PE_PGRS genes not in H37Rv were predicted to be a result of a gene conversion event. Furthermore, recombinant genes were predicted to contain B-cell epitopes. Comparing B-cell epitopes in the recombinant to the putative donor and putative acceptor showed the resulting B-cell epitopes changed significantly, indicating a potential mechanism of immune evasion in M. tuberculosis. This work provides insights into M. tuberculosis diversification via protein family evolution and reveals potential mechanisms that could have contributed to the evolution of mycobacterial pathogenicity.