Knowing the Words, Missing the Meaning: Evaluating LLMs’ Cultural Understanding Through Sino-Korean Words and Four-Character Idioms

Eunsong Lee, Hyein Do, Minsu Kim, Dongsuk Oh

Research output: Contribution to journalArticlepeer-review

Abstract

This study proposes a new benchmark to evaluate the cultural understanding and natural language processing capabilities of large language models based on Sino-Korean words and four-character idioms. Those are essential linguistic and cultural assets in Korea. Reflecting the official question types of the Korean Hanja Proficiency Test, we constructed four question categories—four-character idioms, synonyms, antonyms, and homophones—and systematically compared the performance of GPT-based and non-GPT LLMs. GPT-4o showed the highest accuracy and explanation quality. However, challenges remain in distinguishing the subtle nuances of individual characters and in adapting to uniquely Korean meanings as opposed to standard Chinese character interpretations. Our findings reveal a gap in LLMs’ understanding of Korea-specific Hanja culture and underscore the need for evaluation tools reflecting these cultural distinctions.

Original languageEnglish
Article number7561
JournalApplied Sciences (Switzerland)
Volume15
Issue number13
DOIs
StatePublished - Jul 2025

Keywords

  • Sino-Korean vocabulary
  • cross-lingual semantic shift
  • cultural contextual understanding
  • four-character idioms
  • large language models evaluation

Fingerprint

Dive into the research topics of 'Knowing the Words, Missing the Meaning: Evaluating LLMs’ Cultural Understanding Through Sino-Korean Words and Four-Character Idioms'. Together they form a unique fingerprint.

Cite this