Skip to main navigation Skip to search Skip to main content

Evaluating Creativity: Can LLMs Be Good Evaluators in Creative Writing Tasks?

  • Kyungpook National University

Research output: Contribution to journalArticlepeer-review

14 Scopus citations

Abstract

The evaluation of creative writing has long been a complex and subjective process, made even more intriguing by the rise of advanced Artificial Intelligence (AI) tools like Large Language Models (LLMs). This study evaluates the potential of LLMs as reliable and consistent evaluators of creative texts, directly comparing their performance with traditional human evaluations. The analysis focuses on key creative criteria, including fluency, flexibility, elaboration, originality, usefulness, and specific creativity strategies. Results demonstrate that LLMs provide consistent and objective evaluations, achieving higher Inter-Annotator Agreement (IAA) compared with human evaluators. However, LLMs face limitations in recognizing nuanced, culturally specific, and context-dependent aspects of creativity. Conversely, human evaluators, despite lower consistency and higher subjectivity, exhibit strengths in capturing deeper contextual insights. These findings highlight the need for the further refinement of LLMs to address the complexities of creative writing evaluation.

Original languageEnglish
Article number2971
JournalApplied Sciences (Switzerland)
Volume15
Issue number6
DOIs
StatePublished - Mar 2025

Keywords

  • AI evaluation
  • creative writing evaluation
  • creativity
  • human evaluation
  • large language models (LLMs) evaluation

Fingerprint

Dive into the research topics of 'Evaluating Creativity: Can LLMs Be Good Evaluators in Creative Writing Tasks?'. Together they form a unique fingerprint.

Cite this