Prediction-Feedback DETR for Temporal Action Detection
  • Kim, Jihwan
  • Lee, Miso
  • Cho, Cheol-Ho
  • Lee, Jihyun
  • Heo, Jae-Pil
Citations

WEB OF SCIENCE

2
Citations

SCOPUS

0

초록

Temporal Action Detection (TAD) is fundamental yet challenging for real-world video applications. Leveraging the unique benefits of transformers, various DETR-based approaches have been adopted in TAD. However, it has recently been identified that the attention collapse in self-attention causes the performance degradation of DETR for TAD. Building upon previous research, this paper newly addresses the attention collapse problem in cross-attention within DETR-based TAD methods. Moreover, our findings reveal that cross-attention exhibits patterns distinct from predictions, indicating a short-cut phenomenon. To resolve this, we propose a new framework, Prediction-Feedback DETR (Pred-DETR), which utilizes predictions to restore the collapse and align the cross- and self-attention with predictions. Specifically, we devise novel prediction-feedback objectives using guidance from the relations of the predictions. As a result, Pred-DETR significantly alleviates the collapse and achieves state-of-the-art performance among DETR-based methods on various challenging benchmarks, including THUMOS14, ActivityNet-v1.3, HACS, and FineAction.

제목
Prediction-Feedback DETR for Temporal Action Detection
저자
Kim, JihwanLee, MisoCho, Cheol-HoLee, JihyunHeo, Jae-Pil
DOI
10.1609/aaai.v39i4.32448
발행일
2025
유형
Proceedings Paper
저널명
THIRTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, AAAI-25, VOL 39 NO 4
페이지
4266 ~ 4274