Detailed Information

Cited 0 time in webofscience Cited 0 time in scopus
Metadata Downloads

Memory-efficient cross-modal attention for RGB-X segmentation and crowd counting

Full metadata record
DC Field Value Language
dc.contributor.authorZhang, Youjia-
dc.contributor.authorChoi, Soyun-
dc.contributor.authorHong, Sungeun-
dc.date.accessioned2025-02-04T02:30:25Z-
dc.date.available2025-02-04T02:30:25Z-
dc.date.issued2025-06-
dc.identifier.issn0031-3203-
dc.identifier.issn1873-5142-
dc.identifier.urihttps://scholarx.skku.edu/handle/2021.sw.skku/120192-
dc.description.abstractIn multimodal visual understanding, fusing RGB images with additional modalities like depth or thermal data is essential for improving both accuracy and robustness. However, traditional approaches often rely on task-specific architectures that are difficult to generalize across different multimodal scenarios. To address this limitation, we propose the Cross-modal Spatio-Channel Attention (CSCA) module, designed to flexibly integrate diverse modalities into various model architectures while enhancing performance. CSCA employs spatial attention to capture interactions between modalities effectively, improving model adaptability. Additionally, we introduce a patch-based cross-modal interaction mechanism that optimizes the processing of spatial and channel features, reducing memory overhead while preserving critical spatial information. These refinements significantly simplify cross-modal interactions, increasing computational efficiency. Extensive experiments demonstrate that CSCA generalizes well across various multimodal combinations, achieving promising performance in crowd counting and image segmentation tasks, particularly in RGB-Depth, RGB-Thermal, and RGB-Polarization scenarios. Our approach provides a scalable and efficient solution for multimodal integration, with the potential for broader applications in future work. © 2025 Elsevier Ltd-
dc.language영어-
dc.language.isoENG-
dc.publisherElsevier Ltd-
dc.titleMemory-efficient cross-modal attention for RGB-X segmentation and crowd counting-
dc.typeArticle-
dc.publisher.location영국-
dc.identifier.doi10.1016/j.patcog.2025.111376-
dc.identifier.scopusid2-s2.0-85215868535-
dc.identifier.wosid001410592600001-
dc.identifier.bibliographicCitationPattern Recognition, v.162-
dc.citation.titlePattern Recognition-
dc.citation.volume162-
dc.type.docTypeArticle-
dc.description.isOpenAccessN-
dc.description.journalRegisteredClassscie-
dc.description.journalRegisteredClassscopus-
dc.relation.journalResearchAreaComputer Science-
dc.relation.journalResearchAreaEngineering-
dc.relation.journalWebOfScienceCategoryComputer Science, Artificial Intelligence-
dc.relation.journalWebOfScienceCategoryEngineering, Electrical & Electronic-
dc.subject.keywordAuthorMultimodal learning-
dc.subject.keywordAuthorNon-local attention-
dc.subject.keywordAuthorRGB-D semantic segmentation-
dc.subject.keywordAuthorRGB-D/T crowd counting-
Files in This Item
There are no files associated with this item.
Appears in
Collections
Computing and Informatics > Convergence > 1. Journal Articles

qrcode

Items in ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.

Related Researcher

Researcher HONG, SUNGEUN photo

HONG, SUNGEUN
Computing and Informatics (Convergence)
Read more

Altmetrics

Total Views & Downloads

BROWSE