RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data
Citations

WEB OF SCIENCE

0
Citations

SCOPUS

0

초록

Visuo-tactile perception aims to understand an object's tactile properties. However, the field remains underexplored due to the high cost of data collection. We observe that visually distinct objects can exhibit similar surface textures or material properties. For example, a leather sofa and a leather jacket can share similar tactile properties. This implies that tactile understanding can be guided by material cues in visual data, even without direct tactile supervision. In this paper, we introduce RA-Touch, a retrieval-augmented framework that improves visuo-tactile perception by leveraging visual data enriched with tactile semantics. We carefully recaption a large-scale visual dataset with tactile-focused descriptions, enabling the model to access tactile semantics typically absent from conventional visual datasets. A key challenge remains in effectively utilizing these tactile-aware external descriptions. RA-Touch addresses this by retrieving visual-textual representations aligned with tactile inputs and integrating them to focus on relevant textural and material properties. By outperforming prior methods, we demonstrate the potential of retrieval-based visual reuse for tactile understanding. Code is available at https://aim-skku.github.io/RA-Touch.

키워드

multimodal learningretrieval-augmented methodsvision-language modelsvisuo-tactile perception
제목
RA-Touch: Retrieval-Augmented Touch Understanding with Enriched Visual Data
저자
Cho, YoorhimKim, HongyeobKim, SeminZhang, YoujiaChoi, YunSeokHong, Sungeun
DOI
10.1145/3746027.3755106
발행일
2025-10
유형
Proceedings Paper
저널명
MM 2025 - Proceedings of the 33rd ACM International Conference on Multimedia, Co-Located with MM 2025
페이지
1288 ~ 1297