We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
Viruses that that infect bacteria are called bacteriophages, or phages. It is estimated that there are 10³⁰ bacterial cells in biosphere. Given that typical ratios of bacteria to phage are on the order of 10:1, it is estimated that there exist 10³¹ phage particles on the planet. Viruses thus are the most abundant biological entities on the planet. Phages are ubiquitous and can be found in any environment where their bacterial hosts are present, yet only a small fraction of phages have so far been characterized. Phages are not placed on the universal tree of life, however they are classified into Order, Family, Subfamily, Genus and Species. All phages either fall under the Order Caudovirales or have yet to be assigned to an Order. Below the hierarchy of Order 14 phage Families have been created. There are also two distinct lifestyles of phages: lytic and lysogenic. The lytic lifestyle has many implications for phage therapy, genomics, and climate change. Classifying the Family and lifestyle of phages is currently performed through culturing and isolation in the lab, and is not only time consuming but costly. Classifying phages computationally has always been difficult due to the highly mosaic organization of their genomes. As such, a Phage Classification Tool Set (PHACTS) was developed to classify phages by using the known phage genomes in The PHANTOME database. PHACTS utilizes these known genomes to find similarities in the unknown phage genome. A supervised Random Forest classifier then determines which class a phage falls into. To test the classifier the phage Family, phage lifestyle and Gram stain of host were manually curated by hand for phages in the PHANTOME database. Each phage was sequentially removed from the annotated datasets one at a time and treated as an unknown. PHACTS was then used to predicted which classes that phage belonged to. When predicting the lifestyle of a phage, PHACTS had a 99% accuracy rate, and when predicting the Gram stain of the host PHACTS had a 95% accuracy rate. For the non-binary classification of phage Family the protocols were modified, and PHACTS had a 99% accuracy rate for predicting the Family of a phage. PHACTS was not able to confidently predict the classes of some phages, however it is thought that as more known phages are added to the PHANTOME database, PHACTS will be able to more accurately predict classes.