Abstract
We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-Term memory.
Original language | English |
---|---|
Pages (from-to) | 2785-2799 |
Number of pages | 15 |
Journal | KSII Transactions on Internet and Information Systems |
Volume | 14 |
Issue number | 7 |
DOIs | |
State | Published - 31 Jul 2020 |
Keywords
- Clustering
- Hybrid HMM-DNN
- I-vector
- Speaker adaptation
- Speech recognition