We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
Network analysis is a powerful data mining tool to begin examining the potential interactions in metagenomic datasets. I assessed the structural similarities in microbial interactions networks constructed using Spiec-Easi pipeline and spearman correlations on the same dataset. Spiec-Easi is more robust to the structure of metagenomic datasets, which have high dimensionality and consist of relative abundance data but cannot converge given the sparsity assumption when insufficient (samples size < 13) metagenomic samples are present. In contrast, spearman networks readily produce a network even with few (~6) metagenomic samples but do not account for structure of metagenomic data. Both methods produce different networks with little overlap between the two. As such, usage of a method depends upon data availability but Spiec-Easi is preferred whenever possible. Markov clustering is an unsupervised algorithm that partitions a network into potential functional modules. When analyzing the whale shark data sets, in regard to taxa and function, no common communities were found across the different sampling locations. Additionally, the resulting module membership varied across environments. Hamming distance and graph kernels were explored as a metric of comparing similarities of network structures. The pre-processing necessary for calculating hamming distance between networks results in information loss of the initial data set, especially when the datasets consist different taxa or functions. Hamming distance may be useful for data sets that have similar diversities but falls short when diversity varies. Graph kernels are a potential algorithm to assess the similarity between networks. Exploring further labeling or classifying strategies and kernels programs may prove useful in the future.