상세 보기
- Han, Jimyeong;
- Kim, Sangpil;
- Park, Hogun
WEB OF SCIENCE
0SCOPUS
0초록
The field of Video Question Answering (VideoQA) addresses the challenge of answering questions about the content within videos. Recent VideoQA models that leverage large language models (LLMs) transform frame features extracted by vision encoders, enabling LLMs to understand better and utilize visual information. While vision language model (VLM) approaches that adopt LLMs have improved the understanding of individual video frames, they tend to overlook the multiple event concepts present in the video, such as human-object interactions, which arise from temporal changes in visual information. To leverage information from multiple events using LLMs, we propose the Multi-Event Localization Answering (MELA) framework, a novel method that detects multiple events within a video and utilizes them for keyframe localization and question answering. By analyzing the relationships between the events mentioned in the question and other events in the video, MELA identifies the set of essential events related to the question. The Multi-event Localizer in MELA then identifies and selects keyframes corresponding to these essential events from the relevant video segments. Afterward, the Event-aware Answerer determines the answer to the question by utilizing the selected keyframes and the detected event information. Incorporating event information significantly improves MELA's ability to interpret complex human-object interactions, leading to improved performance on the STAR VideoQA dataset in both fine-tuning and zero-shot settings when compared to baseline techniques. We also provide an in-depth analysis of our framework, including the impact of the Multi-event Localizer and Event-aware Answerer, a comparison with the baseline Localizer, and the effect of the event detector module. Copyright © 2025 held by the owner/author(s).
키워드
- 제목
- MELA: Multi-Event Localization Answering Framework for Video Question Answering
- 저자
- Han, Jimyeong; Kim, Sangpil; Park, Hogun
- 발행일
- 2025-05
- 유형
- Proceedings Paper
- 저널명
- Proceedings of the ACM Symposium on Applied Computing
- 페이지
- 1282 ~ 1289