상세 보기
Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic Cues
- Jeon, Heejeong;
- Park, Minsu;
- Choi, YunSeok;
- Park, Eunil
SCOPUS
0초록
As Large Language Models (LLMs) are increasingly used for content creation, detecting AI-generated text has become a critical challenge. Prior work has largely focused on English, leaving low-resource languages such as Korean underexplored. We propose an unsupervised detection framework that integrates two complementary signals: syntactic token cohesiveness (TOCSIN) and semantic regeneration similarity (SimLLM). To support evaluation, we construct a Korean pairwise dataset of 1,000 anchors with continuation- and regeneration-style generations and further assess performance across domains (news, research paper abstracts, essays) and model families (GPT-3.5 Turbo, GPT-4o, HyperCLOVA X, LLaMA-3-8B). Without any training, our ensemble achieves up to 0.963 F1 and 0.985 ROC-AUC, outperforming baselines. These results demonstrate that the combination of syntactic and semantic cues enables robust unsupervised detection in low-resource settings. Code available at https://github.com/dxlabskku/llm-detection-main.
- 제목
- Unsupervised Detection of LLM-Generated Text in Korean Using Syntactic and Semantic Cues
- 저자
- Jeon, Heejeong; Park, Minsu; Choi, YunSeok; Park, Eunil
- 발행일
- 2026
- 유형
- Conference Paper
- 저널명
- 19th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2026
- 페이지
- 1504 ~ 1518