Korean Traditional Document Translation Using Transformer In Bidirectional-CRF

Jungi Lee, Jong Won Jang, Jangwon Lee, Gil Jin Jang, Min Ho Lee

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper proposes a solution to solve the Out of Vocabulary(OOV) problem in a framework built with a transformer-based machine translation algorithm. The translation input is a traditional Korean document written in Chinese characters, and the output is a decoding of modern Korean paragraphs written in Korean alphabet. We used the word2vec algorithm to represent symbolic characters as numeric vectors and used them as input to the converter. Also, to solve the OOV problem, Bi-Directional LSTM + CRF has been used. To show the validity of the data set, the Annals of the Joseon Dynasty were presented as translations prepared by experts. Another source was collected at Kyungpook National University (Diary dataset), which is much smaller than the Annals of the Joseon Dynasty. According to the BLEU score, after learning the Annals of the Joseon Dynasty, fine-tune with data collected at Kyungpook National University showed a lower BLEU score than general machine translation in the results of applying CRF When learning only with the dataset collected at Kyungpook National University, it can be seen that a slightly high BLEU score was obtained.

Original languageEnglish
Title of host publicationICTC 2021 - 12th International Conference on ICT Convergence
Subtitle of host publicationBeyond the Pandemic Era with ICT Convergence Innovation
PublisherIEEE Computer Society
Pages1738-1742
Number of pages5
ISBN (Electronic)9781665423830
DOIs
StatePublished - 2021
Event12th International Conference on Information and Communication Technology Convergence, ICTC 2021 - Jeju Island, Korea, Republic of
Duration: 20 Oct 202122 Oct 2021

Publication series

NameInternational Conference on ICT Convergence
Volume2021-October
ISSN (Print)2162-1233
ISSN (Electronic)2162-1241

Conference

Conference12th International Conference on Information and Communication Technology Convergence, ICTC 2021
Country/TerritoryKorea, Republic of
CityJeju Island
Period20/10/2122/10/21

Keywords

  • CRF
  • deep learning
  • Hangul
  • nueral machine translation
  • seq2seq
  • transformer

Fingerprint

Dive into the research topics of 'Korean Traditional Document Translation Using Transformer In Bidirectional-CRF'. Together they form a unique fingerprint.

Cite this