We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
Captured data in many fields of study are growing at a rate which human analysts cannot process in a time-efficient manner without machine learning. There are two main areas of machine learning, supervised and unsupervised. There are advantages and disadvantages to using these tools in data analysis. In this thesis, we demonstrate the benefit of using unsupervised learning on toothed whale echolocation clicks. Echolocation clicks are one of the types of calls toothed whales produce. These clicks help toothed whales navigate and forage. Leveraging machine learning, analysts can speed up the process of differentiating detected sets of echolocation clicks. However, supervised learning requires labeled data to train a classifier. In areas where species assemblages are poorly understood, such labeled data are usually unavailable. Unsupervised learning provides a low cost ability to learn about patterns in species assemblages that are in understudied or remote regions. The ability to identify potential species provides scientists with cues that can target judicious allocation of more costly survey methods such as biopsies, visual tracking, etc. Observing temporal and spatial patterns within these data such as seasonality and diel pattern can help scientists to allocate the right resources in the right place at the right time. Results are presented on the 2015 Detection, Classification, Localization, and Density Estimation workshop high-frequency toothed-whale development dataset, which contains labels for several species of analyst labeled odontocetes as well as an unknown category. Echolocation clicks were detected and grouped into acoustic encounters from which spectral and temporal features were extracted. Dendrograms were constructed by average-linkage clustering using a symmetric Kullback-Leibler similarity metric. The dynamic tree and dynamic hybrid algorithms along with different parameter settings were used to partition the dendrogram. The chosen partition was picked by the highest generated Dunn index value. Comparison of the best machine-generated partitioning with that produced by analysts yielded an adjusted Rand statistic of 0.48, demonstrating a good degree of concurrence.