Speaker dependent visual speech recognition by symbol and real value assignment

Jeongwoo Ju, Heechul Jung, Junmo Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we propose a visual speech recognition method using symbol or real value assignment. Our method is inspired by Bag of Word (BoW) [1] model which is usually applied to an object matching problem. In the BoW model, a codebook is produced by using K-means clustering, and a feature vector extracted from an image is converted to corresponding symbol. Similarly, we generate codebook by running K-means algorithm on a pool of pHog (Pyramid Histogram of Oriented Gradients) feature vectors extracted from a subset of lip database. Then, the remaining lip images are assigned a particular value after comparing the chi-square distance to each cluster. Based on the type of this value, two methods are suggested so as to assign the value to a lip image frame. The first method is to find the cluster whose element image has the minimum chi square distance to the processing frame, and assign the cluster label to the frame. Second one is to calculate the distances between the frame and all cluster's centroids, obtain multi-dimensional vector for the frame which directly becomes an assigned value for the frame. Following these methods, each time sequence is converted into symbolized or multi-dimensional real valued sequence. To measure the similarity between two time sequences, we use Dynamic Time Warping for real valued time sequence and Edit distance for symbolized sequences.

Original languageEnglish
Title of host publicationAn Edition of the Presented Papers from the 1st International Conference on Robot Intelligence Technology and Applications
PublisherSpringer Verlag
Pages1015-1022
Number of pages8
ISBN (Print)9783642373732
DOIs
StatePublished - 2013
Event1st International Conference on Robot Intelligence Technology and Applications, RiTA 2012 - Gwangju, Korea, Republic of
Duration: 16 Dec 201218 Dec 2012

Publication series

NameAdvances in Intelligent Systems and Computing
Volume208 AISC
ISSN (Print)2194-5357

Conference

Conference1st International Conference on Robot Intelligence Technology and Applications, RiTA 2012
Country/TerritoryKorea, Republic of
CityGwangju
Period16/12/1218/12/12

Keywords

  • Codebook
  • Dynamic Time Warping
  • Edit distance
  • pHog
  • Visual Speech Recognition

Fingerprint

Dive into the research topics of 'Speaker dependent visual speech recognition by symbol and real value assignment'. Together they form a unique fingerprint.

Cite this