OCR-Diff: A Two-Stage Deep Learning Framework for Optical Character Recognition Using Diffusion Model in Industrial Internet of Things

Chae Won Park, Vikas Palakonda, Sangseok Yun, Il Min Kim, Jae Mo Kang

Research output: Contribution to journalArticlepeer-review

3 Scopus citations

Abstract

Optical character recognition (OCR) is one of the key enabling technologies in industrial Internet of Things (IIoT) for extracting and utilizing useful textual information, but it is technically challenging due to poor environmental conditions. To deal with such challenges, in this letter, we propose a novel two-stage deep learning framework for OCR using a generative diffusion model, namely, OCR-Diff. In the first stage, our customized conditional U-Net is pretrained jointly with a feature extractor with the aid of the forward diffusion process such that the quality of a low-resolution text image is improved via the reverse diffusion process. In the next stage, the pretrained conditional U-Net and feature extractor are jointly fine tuned for an off-the-shelf text recognizer to precisely recognize the texts in the image. Experimental results on TextZoom data sets substantiate the superiority and effectiveness of the proposed scheme.

Original languageEnglish
Pages (from-to)25997-26000
Number of pages4
JournalIEEE Internet of Things Journal
Volume11
Issue number15
DOIs
StatePublished - 2024

Keywords

  • Deep learning (DL)
  • generative diffusion model
  • industrial Internet of Things (IIoT)
  • low resolution text image
  • optical character recognition (OCR)
  • text recognition

Fingerprint

Dive into the research topics of 'OCR-Diff: A Two-Stage Deep Learning Framework for Optical Character Recognition Using Diffusion Model in Industrial Internet of Things'. Together they form a unique fingerprint.

Cite this