Speaker adaptation using i-vector based clustering

Minsoo Kim, Gil Jin Jang, Ji Hwan Kim, Minho Lee

Research output: Contribution to journalArticlepeer-review

1 Scopus citations

Abstract

We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-Term memory.

Original languageEnglish
Pages (from-to)2785-2799
Number of pages15
JournalKSII Transactions on Internet and Information Systems
Volume14
Issue number7
DOIs
StatePublished - 31 Jul 2020

Keywords

  • Clustering
  • Hybrid HMM-DNN
  • I-vector
  • Speaker adaptation
  • Speech recognition

Fingerprint

Dive into the research topics of 'Speaker adaptation using i-vector based clustering'. Together they form a unique fingerprint.

Cite this