Automatic detection of the pharyngeal phase in raw videos for the videofluoroscopic swallowing study using efficient data collection and 3d convolutional networks

Jong Taek Lee, Eunhee Park, Tae Du Jung

Research output: Contribution to journalArticlepeer-review

22 Scopus citations

Abstract

Videofluoroscopic swallowing study (VFSS) is a standard diagnostic tool for dysphagia. To detect the presence of aspiration during a swallow, a manual search is commonly used to mark the time intervals of the pharyngeal phase on the corresponding VFSS image. In this study, we present a novel approach that uses 3D convolutional networks to detect the pharyngeal phase in raw VFSS videos without manual annotations. For efficient collection of training data, we propose a cascade framework which no longer requires time intervals of the swallowing process nor the manual marking of anatomical positions for detection. For video classification, we applied the inflated 3D convolutional network (I3D), one of the state-of-the-art network for action classification, as a baseline architecture. We also present a modified 3D convolutional network architecture that is derived from the baseline I3D architecture. The classification and detection performance of these two architectures were evaluated for comparison. The experimental results show that the proposed model outperformed the baseline I3D model in the condition where both models are trained with random weights. We conclude that the proposed method greatly reduces the examination time of the VFSS images with a low miss rate.

Original languageEnglish
Article number3873
JournalSensors
Volume19
Issue number18
DOIs
StatePublished - 2 Sep 2019

Keywords

  • 3D convolutional networks
  • Action classification
  • Action detection
  • Pharyngeal phase
  • Videofluoroscopic swallowing study

Fingerprint

Dive into the research topics of 'Automatic detection of the pharyngeal phase in raw videos for the videofluoroscopic swallowing study using efficient data collection and 3d convolutional networks'. Together they form a unique fingerprint.

Cite this