Hybridized character-word embedding for korean traditional document translation

Hosang Yu, Gil Jin Jang, Minho Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Translating traditional documents is quite laborious and time consuming for human translators owing to the voluminous nature and a complexity of grammatical patterns. In recent times, a neural network-based machine translation architecture such as sequence-to-sequence (seq2seq) model showed superior performance in translation. However, it suffers out-of-vocabulary (OOV) issue when dealing with very complex and vocabulary languages such as Chinese characters, resulting in performance degradation. To cope with the OOV issue, we propose a new method by combining word embedding and character embedding to supplement loss from unknown words with character embedding. Experimental results show that the proposed method is efficient to translate old Korean archives (Hanja) to modern Korean documents (Hangul).

Original languageEnglish
Title of host publicationNeural Information Processing - 25th International Conference, ICONIP 2018, Proceedings
EditorsLong Cheng, Seiichi Ozawa, Andrew Chi Sing Leung
PublisherSpringer Verlag
Pages82-89
Number of pages8
ISBN (Print)9783030041816
DOIs
StatePublished - 2018
Event25th International Conference on Neural Information Processing, ICONIP 2018 - Siem Reap, Cambodia
Duration: 13 Dec 201816 Dec 2018

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11303 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference25th International Conference on Neural Information Processing, ICONIP 2018
Country/TerritoryCambodia
CitySiem Reap
Period13/12/1816/12/18

Keywords

  • Character-word embedding
  • Deep learning
  • Natural language processing
  • Neural machine translation
  • Seq2seq

Fingerprint

Dive into the research topics of 'Hybridized character-word embedding for korean traditional document translation'. Together they form a unique fingerprint.

Cite this