TY - GEN
T1 - Noise robust spontaneous speech recognition using multi-space GMM
AU - Kang, Byung Ok
AU - Jung, Ho Young
AU - Kwon, Oh Wook
PY - 2013
Y1 - 2013
N2 - In this paper, we propose a new approach using a multi-space Gaussian mixture model (GMM) for a large-scale spontaneous speech recognition system that is robust to the acoustic environmental noise. Current speech recognition systems based on a hidden Markov model (HMM) perform well in matched conditions, but their performance is degraded by mismatch conditions, such as mobile environments with diverse additive noise. In the case of mobile voice search services, the real noise environment is reflected in rich speech log data, and using speech logs, performance improvement is achieved in the growing matched condition. However, because most of this speech data is short with a limited pattern, when it is used for large-scale spontaneous speech recognition tasks like voice SMS, the performance improvement is limited and degradation is even observed in a quiet environment. Therefore, this paper proposes a new approach which, using rich voice search speech data, constructs a multi- Acoustic space GMM with distributions of speech corrupted by diverse environment noise and reflects these statistics in an acoustic model for a speech recognition system with a distinct domain like dictation speech. The evaluation results obtained from the voice SMS task show that the proposed method provides meaningful improvements over conventional adaptive training methods to handle multi-style training data. Copyright
AB - In this paper, we propose a new approach using a multi-space Gaussian mixture model (GMM) for a large-scale spontaneous speech recognition system that is robust to the acoustic environmental noise. Current speech recognition systems based on a hidden Markov model (HMM) perform well in matched conditions, but their performance is degraded by mismatch conditions, such as mobile environments with diverse additive noise. In the case of mobile voice search services, the real noise environment is reflected in rich speech log data, and using speech logs, performance improvement is achieved in the growing matched condition. However, because most of this speech data is short with a limited pattern, when it is used for large-scale spontaneous speech recognition tasks like voice SMS, the performance improvement is limited and degradation is even observed in a quiet environment. Therefore, this paper proposes a new approach which, using rich voice search speech data, constructs a multi- Acoustic space GMM with distributions of speech corrupted by diverse environment noise and reflects these statistics in an acoustic model for a speech recognition system with a distinct domain like dictation speech. The evaluation results obtained from the voice SMS task show that the proposed method provides meaningful improvements over conventional adaptive training methods to handle multi-style training data. Copyright
KW - Acoustic model
KW - Multi-space GMM
KW - Noise robustness
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=84904480068&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84904480068
SN - 9781632662675
T3 - 42nd International Congress and Exposition on Noise Control Engineering 2013, INTER-NOISE 2013: Noise Control for Quality of Life
SP - 3682
EP - 3685
BT - 42nd International Congress and Exposition on Noise Control Engineering 2013, INTER-NOISE 2013
PB - OAL-Osterreichischer Arbeitsring fur Larmbekampfung
T2 - 42nd International Congress and Exposition on Noise Control Engineering 2013: Noise Control for Quality of Life, INTER-NOISE 2013
Y2 - 15 September 2013 through 18 September 2013
ER -