Description
Fundamental frequency estimation has numerous applications in speech signal processing, with automatic speech recognition (ASR) being the most notable. The intent is to discern the fundamental frequency from the periodic component of human generated speech that has been corrupted by the characteristics of the vocal tract. In this research, a temporal pitch detection algorithm is proposed that utilizes the Normalized Autocorrelation function, and incorporates elements from the Normalized Weighted Autocorrelation, to provide potential pitch estimates within a given time period. The energy track is used to decipher whether segments of speech are voiced or unvoiced and ultimately find the boundaries of utterances. Finally, the dual path pitch selection (DPPS) process distinguishes the “most likely” pitch track in a manner which improves on the classical dynamic programming approach. The DPPS does not assume causality and initializes tracking at every point within an utterance, giving it bi-directional characteristics. The proposed algorithm was tested using a well-known database in both studio and telephone environments. In addition, simulations were performed in the presence of Gaussian noise and babble noise. When compared to a number of well-known pitch detection algorithms, the BAFFE algorithm demonstrated favorable results in terms of performance and computation time.