We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
With genomic sequences becoming increasingly available, annotating them is crucial in understanding gene information and protein function. This project focused on the pros and cons of three annotation tools, Prokka, InterProScan, and RAST. The process consisted of obtaining FASTA files from a total of 30 species from the Firmicutes, Bacteroides, and Vibrio/Shewanella genera; 10 species per genus. The tools were compared using four criteria, speed, memory, consistency, and usability. Speed to determine how long the tools take from start to finish; memory to show how much RAM a typical machine would need; consistency to cluster protein sequences and compare correctness between and within tools and species; usability to compare the number of hypothetical and non-hypothetical proteins generated. Prokka had the fastest speed ranging from 1-4 minutes, InterProScan followed with 7- 10 minutes, and RAST took the longest between 3-6 hours. Memory was inapplicable for RAST, as it is a web service, but took approximately half a gigabyte of RAM for Prokka and about 2.8 GB for InterPro. Consistency became the bulk of the project with varying results. When comparing the sequences within the tools, each software had 95 percent or higher in rating. However, when distinguishing between the software tools, it drastically declined to a range of 40-60 percent. Consistency did not show how similar the annotations were when examining Prokka, InterPro, and RAST together, indicating that no gene had the same annotation for all three tools, affecting reliability. Usability showed Prokka had more hypothetical proteins than non-hypothetical proteins present with only a 0.68% usefulness with the Bacteroides group. InterProScan fared well with the highest efficiency being 92% and least non-hypothetical protein being 86.6%. RAST did not have exceptional results but managed better than Prokka. Overall, Prokka had the speed and memory that may be enticing to users, but low usability can be unappealing. InterProScan had the speed and use but requires more memory to run and may be an issue if the machine does not have the capabilities. RAST is an excellent alternative to non-computer scientists who do not mind the wait time and results are useful.