Goto

Collaborating Authors

 mapping relationship


Robust Multimodal Sentiment Analysis of Image-Text Pairs by Distribution-Based Feature Recovery and Fusion

arXiv.org Artificial Intelligence

As posts on social media increase rapidly, analyzing the sentiments embedded in image-text pairs has become a popular research topic in recent years. Although existing works achieve impressive accomplishments in simultaneously harnessing image and text information, they lack the considerations of possible low-quality and missing modalities. In real-world applications, these issues might frequently occur, leading to urgent needs for models capable of predicting sentiment robustly. Therefore, we propose a Distribution-based feature Recovery and Fusion (DRF) method for robust multimodal sentiment analysis of image-text pairs. Specifically, we maintain a feature queue for each modality to approximate their feature distributions, through which we can simultaneously handle low-quality and missing modalities in a unified framework. For low-quality modalities, we reduce their contributions to the fusion by quantitatively estimating modality qualities based on the distributions. For missing modalities, we build inter-modal mapping relationships supervised by samples and distributions, thereby recovering the missing modalities from available ones. In experiments, two disruption strategies that corrupt and discard some modalities in samples are adopted to mimic the low-quality and missing modalities in various real-world scenarios. Through comprehensive experiments on three publicly available image-text datasets, we demonstrate the universal improvements of DRF compared to SOTA methods under both two strategies, validating its effectiveness in robust multimodal sentiment analysis.


Flexbee: A Grasping and Perching UAV Based on Soft Vector-Propulsion Nozzle

arXiv.org Artificial Intelligence

Abstract--The aim of this paper is to design a new type of grasping and perching unmanned aerial vehicle (UA V), Flexbee, characterized by its soft vector-propulsion nozzle (SVPN). Compared to previous UA Vs, Flexbee integrates flight, grasping, and perching functionalities into the four SVPNs, offering advantages such as decoupled position and attitude control, high structural reuse, and strong adaptability for grasping and perching. A dynamics model of Flexbee has been developed, and the nonlinear coupling issue of the moment has been resolved through lin-earization of the equivalent moment model. Hierarchical control strategy was employed to design the controllers for Flexbee's two operational modes. Finally, flight, grasping, and perching experiments were conducted to validate Flexbee's kinematic capabilities and the effectiveness of the control strategy. UL TI-ROTOR unmanned aerial vehicles (UA Vs), with their three-dimensional maneuverabilities, have demonstrated remarkable effectiveness in environments that are difficult for humans to reach [1]-[5]. As people's requirements for UA V endurance performance and adaptability to complex environments offer greater advantages, compared with large UA Vs, small UA Vs have the characteristics of small size, light weight, low cost, and high maneuverability, which play a greater advantage in complex environments [6]-[8].


RALAD: Bridging the Real-to-Sim Domain Gap in Autonomous Driving with Retrieval-Augmented Learning

arXiv.org Artificial Intelligence

In the pursuit of robust autonomous driving systems, models trained on real-world datasets often struggle to adapt to new environments, particularly when confronted with corner cases such as extreme weather conditions. Collecting these corner cases in the real world is non-trivial, which necessitates the use of simulators for validation. However,the high computational cost and the domain gap in data distribution have hindered the seamless transition between real and simulated driving scenarios. To tackle this challenge, we propose Retrieval-Augmented Learning for Autonomous Driving (RALAD), a novel framework designed to bridge the real-to-sim gap at a low cost. RALAD features three primary designs, including (1) domain adaptation via an enhanced Optimal Transport (OT) method that accounts for both individual and grouped image distances, (2) a simple and unified framework that can be applied to various models, and (3) efficient fine-tuning techniques that freeze the computationally expensive layers while maintaining robustness. Experimental results demonstrate that RALAD compensates for the performance degradation in simulated environments while maintaining accuracy in real-world scenarios across three different models. Taking Cross View as an example, the mIOU and mAP metrics in real-world scenarios remain stable before and after RALAD fine-tuning, while in simulated environments,the mIOU and mAP metrics are improved by 10.30% and 12.29%, respectively. Moreover, the re-training cost of our approach is reduced by approximately 88.1%. Our code is available at https://github.com/JiachengZuo/RALAD.git.


Cross-Dataset Generalization in Deep Learning

arXiv.org Artificial Intelligence

Deep learning has been extensively used in various fields, such as phase imaging, 3D imag ing reconstruction, phase unwrapping, and laser speckle reduction, particularly for complex problems that lack analytic models. Its data - driven nature allows for implicit construction of mathematical relationships within the network through training with abun dant data. However, a critical challenge in practical applications is the generalization issue, where a network trained on one dataset struggles to recognize an unknown target from a different dataset. In this study, we investigate imaging through scatteri ng media and discover that the mathematical relationship learned by the network is an approximation dependent on the training dataset, rather than the true mapping relationship of the model. W e demonstrate that enhancing the diversity of the training datas et can improve this approximation, thereby achieving generalization across different datasets, as the mapping relationship of a linear physical model is independent of inputs. This study elucidates the nature of generalization across different datasets and provides insights into the design of training datasets to ultimately address the generalization issue in various deep learning - based applications . Introduction The study of imaging through scattering media is a challenging and cutting - edge field. Scattering media are ubiquitous in everyday life, such as rough surfaces, clouds, fog, dust, water, and biological tissues. Image reconstruction through these media is p articularly important in areas such as transportation, military, and biomedicine .


Metacognition-Enhanced Few-Shot Prompting With Positive Reinforcement

arXiv.org Artificial Intelligence

Few-shot prompting elicits the remarkable abilities of large language models by equipping them with a few demonstration examples in the input. However, the traditional method of providing large language models with all demonstration input-output pairs at once may not effectively guide large language models to learn the specific input-output mapping relationship. In this paper, inspired by the regulatory and supportive role of metacognition in students' learning, we propose a novel metacognition-enhanced few-shot prompting, which guides large language models to reflect on their thought processes to comprehensively learn the given demonstration examples. Furthermore, considering that positive reinforcement can improve students' learning motivation, we introduce positive reinforcement into our metacognition-enhanced few-shot prompting to promote the few-shot learning of large language models by providing response-based positive feedback. The experimental results on two real-world datasets show that our metacognition-enhanced few-shot prompting with positive reinforcement surpasses traditional few-shot prompting in classification accuracy and macro F1.


Imaging through multimode fibres with physical prior

arXiv.org Artificial Intelligence

Imaging through perturbed multimode fibres based on deep learning has been widely researched. However, existing methods mainly use target-speckle pairs in different configurations. It is challenging to reconstruct targets without trained networks. In this paper, we propose a physics-assisted, unsupervised, learning-based fibre imaging scheme. The role of the physical prior is to simplify the mapping relationship between the speckle pattern and the target image, thereby reducing the computational complexity. The unsupervised network learns target features according to the optimized direction provided by the physical prior. Therefore, the reconstruction process of the online learning only requires a few speckle patterns and unpaired targets. The proposed scheme also increases the generalization ability of the learning-based method in perturbed multimode fibres. Our scheme has the potential to extend the application of multimode fibre imaging.


PoKE: Prior Knowledge Enhanced Emotional Support Conversation with Latent Variable

arXiv.org Artificial Intelligence

Emotional support conversation (ESC) task can utilize various support strategies to help people relieve emotional distress and overcome the problem they face, which has attracted much attention in these years. However, most state-of-the-art works rely heavily on external commonsense knowledge to infer the mental state of the user in every dialogue round. Although effective, they may suffer from significant human effort, knowledge update and domain change in a long run. Therefore, in this article, we focus on exploring the task itself without using any external knowledge. We find all existing works ignore two significant characteristics of ESC. (a) Abundant prior knowledge exists in historical conversations, such as the responses to similar cases and the general order of support strategies, which has a great reference value for current conversation. (b) There is a one-to-many mapping relationship between context and support strategy, i.e.multiple strategies are reasonable for a single context. It lays a better foundation for the diversity of generations. Taking into account these two key factors, we propose Prior Knowledge Enhanced emotional support model with latent variable, PoKE. The proposed model fully taps the potential of prior knowledge in terms of exemplars and strategy sequence and then utilizes a latent variable to model the one-to-many relationship of strategy. Furthermore, we introduce a memory schema to incorporate the encoded knowledge into decoder. Experiment results on benchmark dataset show that our PoKE outperforms existing baselines on both automatic evaluation and human evaluation. Compared with the model using external knowledge, PoKE still can make a slight improvement in some metrics. Further experiments prove that abundant prior knowledge is conducive to high-quality emotional support, and a well-learned latent variable is critical to the diversity of generations.


Linear Leaky-Integrate-and-Fire Neuron Model Based Spiking Neural Networks and Its Mapping Relationship to Deep Neural Networks

arXiv.org Artificial Intelligence

Spiking neural networks (SNNs) are brain-inspired machine learning algorithms with merits such as biological plausibility and unsupervised learning capability. Previous works have shown that converting Artificial Neural Networks (ANNs) into SNNs is a practical and efficient approach for implementing an SNN. However, the basic principle and theoretical groundwork are lacking for training a non-accuracy-loss SNN. This paper establishes a precise mathematical mapping between the biological parameters of the Linear Leaky-Integrate-and-Fire model (LIF)/SNNs and the parameters of ReLU-AN/Deep Neural Networks (DNNs). Such mapping relationship is analytically proven under certain conditions and demonstrated by simulation and real data experiments. It can serve as the theoretical basis for the potential combination of the respective merits of the two categories of neural networks.


Multiple Information Sources Cooperative Learning

AAAI Conferences

Many applications are facing the problem of learning from an objective dataset, whereas information from other auxiliary sources may be beneficial but cannot be integrated into the objective dataset for learning. In this paper, we propose an omni-view learning approach to enable learning from multiple data collections. The theme is to organize heterogeneous data sources into a unified table with global data view. To achieve the omni-view learning goal, we consider that the objective dataset and the auxiliary datasets share some instance-level dependency structures. We then propose a relational k-means to cluster instances in each auxiliary dataset, such that clusters can help build new features to capture correlations between the objective and auxiliary datasets. Experimental results demonstrate that omni-view learning can help build models which outperform the ones learned from the objective dataset only. Comparisons with the co-training algorithm further assert that omni-view learning provides an alternative, yet effective, way for semi-supervised learning.