We've Moved!
Visit SDSU’s new digital collections website at https://digitalcollections.sdsu.edu
Description
This work investigates the problem of modeling human phonotactic acceptability judgments. In particular, the question of which type of model is best suited to this task: models based on the probability of the phone segments in a particular word (phonotactic probability) or models based on the number of words that share phone sequences with a particular word (neighborhood density). The models developed in this work were based on classifiers, a type of machine learning model that is able to determine whether a new data item belongs to a particular class. Two type of classifiers were used, the perceptron: a linear classifier that is based on the number of times phone sequences occur (essentially a probability based model) and k Nearest Neighbor (kNN), a non-linear classifier that is based on the number of existing data items that are similar to a new data item (essentially a neighborhood density model). Two types of kNN models were developed: one based on two classes (like the perceptron models) and one based on multiple classes. The quality of these models was evaluated by comparing their judgments against the judgments of human subjects on a short phonotactic acceptability task. Human subjects were asked to rate a set of fifty nonce words from 1 to 7 based on their assessment of the phonotactic acceptability of these words. Comparison was calculated using the Spearman correlation. The two-class classifier models (based on perceptron and kNN) performed by far the best and both performed approximately equally well under optimal conditions, while the multi-class kNN model performed considerably worse. The results did not indicate that one type of model (phonotactic probability or neighborhood density) performed better than the other. However, it is possible that both phonotactic probability and neighborhood density effects are influencing the performance of both types of models. And finer grained differences may emerge using a phonotactic acceptability task that employs more sophisticated means to control for both types of effects.