Jailbreaking LLMs Through Cross-Cultural Prompts
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

We examine how linguistic and cultural framing affect jailbreak success in three commercial LLMs (GPT-4, Claude 3, Gemini), using semantically equivalent prompts in direct, indirect, and metaphorical styles across four high-resource languages. Indirect prompts most effectively bypassed filters, with framing and style significantly influencing alignment. GPT-4 was especially vulnerable to indirect framing, Claude 3 remained consistently robust, and Gemini showed high sensitivity to cultural and linguistic variation. Our findings highlight the need for alignment strategies resilient to diverse expression styles and cultural contexts.

키워드

alignmentcross-cultural promptsjailbreakinglarge language models
제목
Jailbreaking LLMs Through Cross-Cultural Prompts
저자
Kim, DaminHur, MinseokLee, JeonginMin, Moohong
DOI
10.1145/3746252.3760892
발행일
2025
유형
Proceedings Paper
저널명
CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
페이지
4874 ~ 4878