VisTaNet: Attention Guided Deep Fusion for Surface Roughness Classification
Routray, Prasanna Kumar, Kanade, Aditya Sanjiv, Bhanushali, Jay, Muniyandi, Manivannan
–arXiv.org Artificial Intelligence
Human texture perception is a weighted average of multi-sensory inputs: visual and tactile. While the visual sensing mechanism extracts global features, the tactile mechanism complements it by extracting local features. The lack of coupled visuotactile datasets in the literature is a challenge for studying multimodal fusion strategies analogous to human texture perception. This paper presents a visual dataset that augments an existing tactile dataset. We propose a novel deep fusion architecture that fuses visual and tactile data using four types of fusion strategies: summation, concatenation, max-pooling, and attention. Our model shows significant performance improvements (97.22%) in surface roughness classification accuracy over tactile only (SVM - 92.60%) and visual only (FENet-50 - 85.01%) architectures. Among the several fusion techniques, attention-guided architecture results in better classification accuracy. Our study shows that analogous to human texture perception, the proposed model chooses a weighted combination of the two modalities (visual and tactile), thus resulting in higher surface roughness classification accuracy; and it chooses to maximize the weightage of the tactile modality where the visual modality fails and vice-versa.
arXiv.org Artificial Intelligence
Sep-18-2022
- Country:
- North America > United States
- Nevada > Clark County > Las Vegas (0.04)
- Europe
- United Kingdom > England (0.04)
- Switzerland > Geneva
- Geneva (0.04)
- Asia
- North America > United States
- Genre:
- Research Report > New Finding (0.46)
- Technology: