Classification of Sewer Defects Using Point Clouds Based on a Novel Sewer Vision Transformer With Cross-Modal In-Domain Knowledge
  • Jing, Shuju
  • Li, Xiangyang
  • Beyene, Daniel Asefa
  • Cha, Gichun
  • Park, Seunghee
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

The high-precision geometric measurement capabilities of sensor-based point clouds provide significant advantages for sewer defect detection. To enhance the classification of valuable yet data-scarce sewer-defect knowledge within the point cloud community, this study proposes a cross-modal framework that combines self-supervised pre-training with supervised fine-tuning. The proposed Sewer Vision Transformer (Sewer-ViT) integrates key-edge sampling, neighborhood dilation learning, dual-domain feature fusion, and inverted bottleneck structures to reinforce defect feature embedding and inductive bias. These features are subsequently processed by a transformer encoder pre-trained with two-dimensional in-domain knowledge, and the latent representations are further optimized through weight fusion within a unified vector space, thereby improving classification performance. The method achieved average precision, recall, and F1-scores of 75.87%, 76.73%, and 75.44% on the overall test set, and 65.09%, 62.47%, and 62.58% on a real-world test set, respectively—surpassing existing approaches. These results highlight the practical potential of this method for sewer defect detection and point to a promising future for multi-modal fusion research.

키워드

Cross-modal LearningPoint CloudsSelf-supervised LearningSewer-Defect ClassificationVision Transformer
제목
Classification of Sewer Defects Using Point Clouds Based on a Novel Sewer Vision Transformer With Cross-Modal In-Domain Knowledge
저자
Jing, ShujuLi, XiangyangBeyene, Daniel AsefaCha, GichunPark, Seunghee
DOI
10.1109/JSEN.2025.3609788
발행일
2025-11
유형
Article
저널명
IEEE Sensors Journal
25
21
페이지
40188 ~ 40202