Abstract
We propose a new frame decorrelation method for robust speech recognition in noisy environments. In most cases, signal perturbation is caused by channel distortion and additive background noise, and can be modeled as a slowly varying term in either the log-spectral or the linear-spectral domains. Thus, it is effective to deemphasize slowly varying stationary components in the spectral feature domain of speech signals, which can be considered as a temporal decorrelation process. The proposed method presents a well structured high-pass filter using the decorrelation principle, and provides some significant insights into existing high-pass approaches, such as relative spectral (RASTA) processing. The performance of the proposed method was evaluated by speaker-independent isolated-word recognition experiments using the hidden Markov model (HMM). Noisy speech was simulated by adding noise sources taken from the Noisex-92 database. Experimental results showed that the proposed method was effective for the speech recognition with significant noise and yielded better performance than other high-pass methods. In addition, we compared the dynamic property of the proposed filter with that of delta features. The feature obtained by the proposed method may offer most of the delta feature property.
Original language | English |
---|---|
Pages (from-to) | 407-416 |
Number of pages | 10 |
Journal | IEEE Transactions on Speech and Audio Processing |
Volume | 8 |
Issue number | 4 |
DOIs | |
State | Published - Jul 2000 |