Beyond Masking: Landmark-based Representation Learning and Knowledge-Distillation for Audio-Visual Deepfake Detection
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Audio-visual deepfake detection methods demonstrate strong performance on academic datasets but fail significantly when applied to real-world. To address the shortcomings of previous approaches, we utilize landmarks dynamic information. First, we propose Landmark-based Distillation (LBD), motivated by I-JEPA's representation learning approach. LBD utilizes KL-divergence to align facial landmark predictions from visual and audio encoders, enforcing focus on geometric facial features rather than spurious background information. Second, we introduce Multimodal Temporal Information Alignment (MTIA), which employs contrastive learning to enhance temporal consistency between audio and visual representations. We conduct experiments on academic datasets and web-based deepfakes collected from diverse social media platforms, serving as real-world examples. Our proposed landmark-guided distillation framework achieves computational efficiency while improving multimodal video deepfake detection performance across a diverse range of deepfakes compared to existing methods. The code is available at https://github.com/Ckck12/Beyond-Masking.

키워드

deepfake detectionmultimodal deepfakesreal-world deepfakes
제목
Beyond Masking: Landmark-based Representation Learning and Knowledge-Distillation for Audio-Visual Deepfake Detection
저자
Park, ChanMuneer, Muhammad ShahidWoo, Simon S.
DOI
10.1145/3746252.3760853
발행일
2025
유형
Proceedings Paper
저널명
CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
페이지
5084 ~ 5088