상세 보기
- Park, Chan;
- Muneer, Muhammad Shahid;
- Woo, Simon S.
WEB OF SCIENCE
0SCOPUS
0초록
Audio-visual deepfake detection methods demonstrate strong performance on academic datasets but fail significantly when applied to real-world. To address the shortcomings of previous approaches, we utilize landmarks dynamic information. First, we propose Landmark-based Distillation (LBD), motivated by I-JEPA's representation learning approach. LBD utilizes KL-divergence to align facial landmark predictions from visual and audio encoders, enforcing focus on geometric facial features rather than spurious background information. Second, we introduce Multimodal Temporal Information Alignment (MTIA), which employs contrastive learning to enhance temporal consistency between audio and visual representations. We conduct experiments on academic datasets and web-based deepfakes collected from diverse social media platforms, serving as real-world examples. Our proposed landmark-guided distillation framework achieves computational efficiency while improving multimodal video deepfake detection performance across a diverse range of deepfakes compared to existing methods. The code is available at https://github.com/Ckck12/Beyond-Masking.
키워드
- 제목
- Beyond Masking: Landmark-based Representation Learning and Knowledge-Distillation for Audio-Visual Deepfake Detection
- 저자
- Park, Chan; Muneer, Muhammad Shahid; Woo, Simon S.
- 발행일
- 2025
- 유형
- Proceedings Paper
- 저널명
- CIKM 2025 - Proceedings of the 34th ACM International Conference on Information and Knowledge Management
- 페이지
- 5084 ~ 5088