Line-break prediction of hanmun text using recurrent neural networks

Dong Hoon Oh, Zahra Shah, Gil Jin Jang

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Recurrent neural network (RNN) has been broadly applied to natural language processing (NLP) and machine translation problems, by creating a deep learning model for sequential data. Hanmun is a Korean term for Chinese characters, and there are many cases in which Korean is pronounced by borrowing only the Chinese characters. Also, there are proper nouns and place names in the traditional Korean which are not used now. Therefore, we need a model for analyzing Hanmun rather than analyzing Chinese words in Chinese. In this paper, we propose a model for line-break prediction of Hanmun text using various types of RNNs. It is suitable for analyzing Hanmun meaning and usage vary according to the previous words. This model was used to segment the beginning and ending words of Hanmun characters and middle words. Experimental results show that our approach gets high performance in line-break prediction on Hanmun.

Original languageEnglish
Title of host publicationInternational Conference on Information and Communication Technology Convergence
Subtitle of host publicationICT Convergence Technologies Leading the Fourth Industrial Revolution, ICTC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages720-724
Number of pages5
ISBN (Electronic)9781509040315
DOIs
StatePublished - 12 Dec 2017
Event8th International Conference on Information and Communication Technology Convergence, ICTC 2017 - Jeju Island, Korea, Republic of
Duration: 18 Oct 201720 Oct 2017

Publication series

NameInternational Conference on Information and Communication Technology Convergence: ICT Convergence Technologies Leading the Fourth Industrial Revolution, ICTC 2017
Volume2017-December

Conference

Conference8th International Conference on Information and Communication Technology Convergence, ICTC 2017
Country/TerritoryKorea, Republic of
CityJeju Island
Period18/10/1720/10/17

Keywords

  • Bi-directional LSTM
  • Hanmun
  • line break
  • long short term memory (LSTM)
  • recurrent neural network (RNN)

Fingerprint

Dive into the research topics of 'Line-break prediction of hanmun text using recurrent neural networks'. Together they form a unique fingerprint.

Cite this