TY - GEN
T1 - Combination of soft mask methods and model based wiener filter for noise robust speech recognition
AU - Kang, Byung Ok
AU - Jung, Ho Young
AU - Lee, Yun Keun
PY - 2011
Y1 - 2011
N2 - In this paper, we present a combined approach that integrates a soft mask and model-based Wiener filter (MBW) for noise robust speech recognition. By utilizing a Gaussian mixture model (GMM) as prior information of the speech signal and added noise signal, the proposed method effectively restores clean speech spectra and separates out ambient noises from the target speech. The soft mask methods originally attempted to separate out the speech signal of the speaker of interest from a mixture of speech signals. However, by using an a priori speech/noise model, they can be applied to separate out added noises from the target speech. Combined with MBW, the proposed method can efficiently reduce distortions in spectra caused by the added noise and finally reconstruct clean speech spectra from noise-corrupted observation. To evaluate the proposed method, we constructed a 32,000 point-of-interests (POIs) recognizer for a car navigation system. For the HMM training, a number of phonetically optimized utterances are recorded from 1,700 persons under a clean condition which do not match the car environment. For GMM training of the proposed method, we used 40 minutes of car noise DB for noise GMM and 4k utterances recorded from 100 speakers inside cars running idle for speech GMM. The test DB comprises 1,252 POI utterances recorded from 30 speakers in various driving environments. Experimental results from application to a speech recognition system in car environment show that the proposed method works successfully and results in error reduction rate of 14% compared to the conventional Wiener filter.
AB - In this paper, we present a combined approach that integrates a soft mask and model-based Wiener filter (MBW) for noise robust speech recognition. By utilizing a Gaussian mixture model (GMM) as prior information of the speech signal and added noise signal, the proposed method effectively restores clean speech spectra and separates out ambient noises from the target speech. The soft mask methods originally attempted to separate out the speech signal of the speaker of interest from a mixture of speech signals. However, by using an a priori speech/noise model, they can be applied to separate out added noises from the target speech. Combined with MBW, the proposed method can efficiently reduce distortions in spectra caused by the added noise and finally reconstruct clean speech spectra from noise-corrupted observation. To evaluate the proposed method, we constructed a 32,000 point-of-interests (POIs) recognizer for a car navigation system. For the HMM training, a number of phonetically optimized utterances are recorded from 1,700 persons under a clean condition which do not match the car environment. For GMM training of the proposed method, we used 40 minutes of car noise DB for noise GMM and 4k utterances recorded from 100 speakers inside cars running idle for speech GMM. The test DB comprises 1,252 POI utterances recorded from 30 speakers in various driving environments. Experimental results from application to a speech recognition system in car environment show that the proposed method works successfully and results in error reduction rate of 14% compared to the conventional Wiener filter.
UR - http://www.scopus.com/inward/record.url?scp=84867969485&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84867969485
SN - 9781618392800
T3 - 40th International Congress and Exposition on Noise Control Engineering 2011, INTER-NOISE 2011
SP - 1588
EP - 1592
BT - 40th International Congress and Exposition on Noise Control Engineering 2011, INTER-NOISE 2011
T2 - 40th International Congress and Exposition on Noise Control Engineering 2011, INTER-NOISE 2011
Y2 - 4 September 2011 through 7 September 2011
ER -