상세 보기
- Yoo, Sanghyun;
- Jeong, Doowon
WEB OF SCIENCE
0SCOPUS
0초록
Keyword-based search is widely used in digital forensic investigations, yet its effectiveness depends strongly on investigator experience, leading to inconsistent results and missed evidence. While previous studies have explored machine learning and large language models (LLMs) to address this, practical deployment is often constrained by confidentiality requirements and the infrastructure costs of maintaining high-performance models locally. We propose a practical LLM-based keyword augmentation method that expands investigator-supplied seed keywords, whether single words or multiple words, while restricting inputs to non-sensitive case context. This enables rapid evidence triage using file names without transmitting primary evidence content to external services. We validate the approach in three stages: (i) using 426 documents, we confirm that file names correlate with document bodies through semantic similarity and keyword coverage analyses, showing clear separation from randomized file name-content pairings; (ii) in a benchmark of 1500 file names comprising 500 relevant cases and 1000 controls, prompt-only keyword generation using ChatGPT models demonstrates effective retrieval performance, with ChatGPT-4.1 achieving the best overall balance; and (iii) in a usability study involving 20 digital forensic investigators, augmented keywords improve evidence detection, with junior investigators showing statistically significant gains as assessed by the Wilcoxon signed-rank test. Overall, the method supports efficient triage in geographically distributed and large-scale investigations by applying LLM-augmented keywords, thereby reducing experience-related performance gaps.
키워드
- 제목
- LLM-based keyword augmentation for title-driven evidence selection: A practical approach
- 저자
- Yoo, Sanghyun; Jeong, Doowon
- 발행일
- 2026-03-12
- 유형
- Article; Early Access