상세 보기
- Tran, Dai Quoc;
- Jeon, Yuntae;
- Son, Seongwoo;
- Kulinan, Almo Senja;
- Lee, Changjun;
- ... Park, Seunghee
WEB OF SCIENCE
0SCOPUS
0초록
A construction safety report is crucial for reducing the accident rate in this complex and dangerous working area. With deep learning-based computer vision applications, many computer vision techniques, such as object detection, image segmentation, and semantic segmentation, are utilized for recognizing hazards. However, there is a lack of research on identifying hazards from a natural language processing and computer vision perspective. Visual question answering (VQA) and the segmentation model can fill the gap in this challenge. VQA is a vision and language model that can infer safety violated rules based on the input image. In addition, segmentation models can be applied to automated segmenting objects that are relevant to the violated rules. From that, an end-to-end automatic safety report system is developed. In this research, we proposed a novel approach that combines VQA and the segmentation model in a construction safety report system. First, the large “scenario-questions” with 200,000 images and 16 questions are created based on a public segmentation dataset. Then the vision-and-language transformer model is trained and validated. Next, the segmentation model is utilized for postprocessing to explain the reason for hazard inference and increase baseline model accuracy. As a result, both the VQA and segmentation models showed robustness in validation accuracy and can be considered as a robust approach for safety management and monitoring in construction sites. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.
키워드
- 제목
- Automatic Construction Safety Report Using Visual Question Answering and Segmentation Model
- 저자
- Tran, Dai Quoc; Jeon, Yuntae; Son, Seongwoo; Kulinan, Almo Senja; Lee, Changjun; Park, Seunghee
- 발행일
- 2025-08
- 유형
- Proceedings Paper
- 권
- 630 LNCE
- 페이지
- 307 ~ 317