Description
BioBranch* is a visual analysis tool for building decision trees using genomic expression datasets. This tool enables us in capturing the complex interactions between biological variables and phenotypes in a way that would otherwise need extensive computational sophistication. It also enables us to manually intervene in building the decision tree using the prior knowledge of biology in the course of construction. In addition, it allows experts to participate directly with the high-throughput datasets without a need for intervention for a team of bioinformaticians. This tool can inform the biological researchers and help to produce more accurate and meaningful classifiers. BioBranch will offer the ability to: (1) upload and share datasets intended for classification tasks, (2) construct decision trees by manually selecting features such as genes for a gene expression dataset, (3) collaboratively edit decision trees, (4) create feature functions that aggregate content from multiple independent features into single decision nodes (e.g. Pathways) and (5) evaluate decision tree classifiers in terms of precision and recall. For my thesis, my contribution to the project "BioBranch" is to scale and support large dataset collections and to increase the performance by minimizing the memory usage and increasing the overall efficiency of the application. I am also adding an additional feature to the application, which allows the user to upload his dataset and start building the data model using his datasets. * Note that this project is in collaboration with Benjamin Good, Andrew Su and Karthik Gangavarapu at The Scripps Research Institute.