Goto

Collaborating Authors

 correlation map










An approach based on class activation maps for investigating the effects of data augmentation on neural networks for image classification

Dorneles, Lucas M., Garcia, Luan Fonseca, Carbonera, Joel Luís

arXiv.org Artificial Intelligence

Neural networks have become increasingly popular in the last few years as an effective tool for the task of image classification due to the impressive performance they have achieved on this task. In image classification tasks, it is common to use data augmentation strategies to increase the robustness of trained networks to changes in the input images and to avoid overfitting. Although data augmentation is a widely adopted technique, the literature lacks a body of research analyzing the effects data augmentation methods have on the patterns learned by neural network models working on complex datasets. The primary objective of this work is to propose a methodology and set of metrics that may allow a quantitative approach to analyzing the effects of data augmentation in convolutional networks applied to image classification. An important tool used in the proposed approach lies in the concept of class activation maps for said models, which allow us to identify and measure the importance these models assign to each individual pixel in an image when executing the classification task. From these maps, we may then extract metrics over the similarities and differences between maps generated by these models trained on a given dataset with different data augmentation strategies. Experiments made using this methodology suggest that the effects of these data augmentation techniques not only can be analyzed in this way but also allow us to identify different impact profiles over the trained models.


XeMap: Contextual Referring in Large-Scale Remote Sensing Environments

Li, Yuxi, Si, Lu, Hou, Yujie, Liu, Chengaung, Li, Bin, Fang, Hongjian, Zhang, Jun

arXiv.org Artificial Intelligence

Advancements in remote sensing (RS) imagery have provided high-resolution detail and vast coverage, yet existing methods, such as image-level captioning/retrieval and object-level detection/segmentation, often fail to capture mid-scale semantic entities essential for interpreting large-scale scenes. To address this, we propose the conteXtual referring Map (XeMap) task, which focuses on contextual, fine-grained localization of text-referred regions in large-scale RS scenes. Unlike traditional approaches, XeMap enables precise mapping of mid-scale semantic entities that are often overlooked in image-level or object-level methods. To achieve this, we introduce XeMap-Network, a novel architecture designed to handle the complexities of pixel-level cross-modal contextual referring mapping in RS. The network includes a fusion layer that applies self- and cross-attention mechanisms to enhance the interaction between text and image embeddings. Furthermore, we propose a Hierarchical Multi-Scale Semantic Alignment (HMSA) module that aligns multiscale visual features with the text semantic vector, enabling precise multimodal matching across large-scale RS imagery. To support XeMap task, we provide a novel, annotated dataset, XeMap-set, specifically tailored for this task, overcoming the lack of XeMap datasets in RS imagery. XeMap-Network is evaluated in a zero-shot setting against state-of-the-art methods, demonstrating superior performance. This highlights its effectiveness in accurately mapping referring regions and providing valuable insights for interpreting large-scale RS environments.