Information Fusion
Spatiotemporal Calibration of 3D Millimetre-Wavelength Radar-Camera Pairs
Wise, Emmett, Cheng, Qilong, Kelly, Jonathan
Autonomous vehicles (AVs) fuse data from multiple sensors and sensing modalities to impart a measure of robustness when operating in adverse conditions. Radars and cameras are popular choices for use in sensor fusion; although radar measurements are sparse in comparison to camera images, radar scans penetrate fog, rain, and snow. However, accurate sensor fusion depends upon knowledge of the spatial transform between the sensors and any temporal misalignment that exists in their measurement times. During the life cycle of an AV, these calibration parameters may change, so the ability to perform in-situ spatiotemporal calibration is essential to ensure reliable long-term operation. State-of-the-art 3D radar-camera spatiotemporal calibration algorithms require bespoke calibration targets that are not readily available in the field. In this paper, we describe an algorithm for targetless spatiotemporal calibration that does not require specialized infrastructure. Our approach leverages the ability of the radar unit to measure its own ego-velocity relative to a fixed, external reference frame. We analyze the identifiability of the spatiotemporal calibration problem and determine the motions necessary for calibration. Through a series of simulation studies, we characterize the sensitivity of our algorithm to measurement noise. Finally, we demonstrate accurate calibration for three real-world systems, including a handheld sensor rig and a vehicle-mounted sensor array. Our results show that we are able to match the performance of an existing, target-based method, while calibrating in arbitrary, infrastructure-free environments.
Datasets, Models, and Algorithms for Multi-Sensor, Multi-agent Autonomy Using AVstack
Hallyburton, R. Spencer, Pajic, Miroslav
Recent advancements in assured autonomy have brought autonomous vehicles (AVs) closer to fruition. Despite strong evidence that multi-sensor, multi-agent (MSMA) systems can yield substantial improvements in the safety and security of AVs, there exists no unified framework for developing and testing representative MSMA configurations. Using the recently-released autonomy platform, AVstack, this work proposes a new framework for datasets, models, and algorithms in MSMA autonomy. Instead of releasing a single dataset, we deploy a dataset generation pipeline capable of generating unlimited volumes of ground-truth-labeled MSMA perception data. The data derive from cameras (semantic segmentation, RGB, depth), LiDAR, and radar, and are sourced from ground-vehicles and, for the first time, infrastructure platforms. Pipelining generating labeled MSMA data along with AVstack's third-party integrations defines a model training framework that allows training multi-sensor perception for vehicle and infrastructure applications. We provide the framework and pretrained models open-source. Finally, the dataset and model training pipelines culminate in insightful multi-agent case studies. While previous works used specific ego-centric multi-agent designs, our framework considers the collaborative autonomy space as a network of noisy, time-correlated sensors. Within this environment, we quantify the impact of the network topology and data fusion pipeline on an agent's situational awareness.
Bayesian data fusion with shared priors
Wu, Peng, Imbiriba, Tales, Elvira, Victor, Closas, Pau
The integration of data and knowledge from several sources is known as data fusion. When data is only available in a distributed fashion or when different sensors are used to infer a quantity of interest, data fusion becomes essential. In Bayesian settings, a priori information of the unknown quantities is available and, possibly, present among the different distributed estimators. When the local estimates are fused, the prior knowledge used to construct several local posteriors might be overused unless the fusion node accounts for this and corrects it. In this paper, we analyze the effects of shared priors in Bayesian data fusion contexts. Depending on different common fusion rules, our analysis helps to understand the performance behavior as a function of the number of collaborative agents and as a consequence of different types of priors. The analysis is performed by using two divergences which are common in Bayesian inference, and the generality of the results allows to analyze very generic distributions. These theoretical results are corroborated through experiments in a variety of estimation and classification problems, including linear and nonlinear models, and federated learning schemes.
Urban Region Representation Learning with Attentive Fusion
Sun, Fengze, Qi, Jianzhong, Chang, Yanchuan, Fan, Xiaoliang, Karunasekera, Shanika, Tanin, Egemen
An increasing number of related urban data sources have brought forth novel opportunities for learning urban region representations, i.e., embeddings. The embeddings describe latent features of urban regions and enable discovering similar regions for urban planning applications. Existing methods learn an embedding for a region using every different type of region feature data, and subsequently fuse all learned embeddings of a region to generate a unified region embedding. However, these studies often overlook the significance of the fusion process. The typical fusion methods rely on simple aggregation, such as summation and concatenation, thereby disregarding correlations within the fused region embeddings. To address this limitation, we propose a novel model named HAFusion. Our model is powered by a dual-feature attentive fusion module named DAFusion, which fuses embeddings from different region features to learn higher-order correlations between the regions as well as between the different types of region features. DAFusion is generic - it can be integrated into existing models to enhance their fusion process. Further, motivated by the effective fusion capability of an attentive module, we propose a hybrid attentive feature learning module named HALearning to enhance the embedding learning from each individual type of region features. Extensive experiments on three real-world datasets demonstrate that our model HAFusion outperforms state-of-the-art methods across three different prediction tasks. Using our learned region embedding leads to consistent and up to 31% improvements in the prediction accuracy.
A Comprehensive Review of Visual-Textual Sentiment Analysis from Social Media Networks
Al-Tameemi, Israa Khalaf Salman, Feizi-Derakhshi, Mohammad-Reza, Pashazadeh, Saeed, Asadpour, Mohammad
Social media networks have become a significant aspect of people's lives, serving as a platform for their ideas, opinions and emotions. Consequently, automated sentiment analysis (SA) is critical for recognising people's feelings in ways that other information sources cannot. The analysis of these feelings revealed various applications, including brand evaluations, YouTube film reviews and healthcare applications. As social media continues to develop, people post a massive amount of information in different forms, including text, photos, audio and video. Thus, traditional SA algorithms have become limited, as they do not consider the expressiveness of other modalities. By including such characteristics from various material sources, these multimodal data streams provide new opportunities for optimising the expected results beyond text-based SA. Our study focuses on the forefront field of multimodal SA, which examines visual and textual data posted on social media networks. Many people are more likely to utilise this information to express themselves on these platforms. To serve as a resource for academics in this rapidly growing field, we introduce a comprehensive overview of textual and visual SA, including data pre-processing, feature extraction techniques, sentiment benchmark datasets, and the efficacy of multiple classification methodologies suited to each field. We also provide a brief introduction of the most frequently utilised data fusion strategies and a summary of existing research on visual-textual SA. Finally, we highlight the most significant challenges and investigate several important sentiment applications.
Contrastive Learning-Based Spectral Knowledge Distillation for Multi-Modality and Missing Modality Scenarios in Semantic Segmentation
Sikdar, Aniruddh, Teotia, Jayant, Sundaram, Suresh
Improving the performance of semantic segmentation models using multispectral information is crucial, especially for environments with low-light and adverse conditions. Multi-modal fusion techniques pursue either the learning of cross-modality features to generate a fused image or engage in knowledge distillation but address multimodal and missing modality scenarios as distinct issues, which is not an optimal approach for multi-sensor models. To address this, a novel multi-modal fusion approach called CSK-Net is proposed, which uses a contrastive learning-based spectral knowledge distillation technique along with an automatic mixed feature exchange mechanism for semantic segmentation in optical (EO) and infrared (IR) images. The distillation scheme extracts detailed textures from the optical images and distills them into the optical branch of CSK-Net. The model encoder consists of shared convolution weights with separate batch norm (BN) layers for both modalities, to capture the multi-spectral information from different modalities of the same objects. A Novel Gated Spectral Unit (GSU) and mixed feature exchange strategy are proposed to increase the correlation of modality-shared information and decrease the modality-specific information during the distillation process. Comprehensive experiments show that CSK-Net surpasses state-of-the-art models in multi-modal tasks and for missing modalities when exclusively utilizing IR data for inference across three public benchmarking datasets. For missing modality scenarios, the performance increase is achieved without additional computational costs compared to the baseline segmentation models.
Virtual Fusion with Contrastive Learning for Single Sensor-based Activity Recognition
Nguyen, Duc-Anh, Pham, Cuong, Le-Khac, Nhien-An
Various types of sensors can be used for Human Activity Recognition (HAR), and each of them has different strengths and weaknesses. Sometimes a single sensor cannot fully observe the user's motions from its perspective, which causes wrong predictions. While sensor fusion provides more information for HAR, it comes with many inherent drawbacks like user privacy and acceptance, costly set-up, operation, and maintenance. To deal with this problem, we propose Virtual Fusion - a new method that takes advantage of unlabeled data from multiple time-synchronized sensors during training, but only needs one sensor for inference. Contrastive learning is adopted to exploit the correlation among sensors. Virtual Fusion gives significantly better accuracy than training with the same single sensor, and in some cases, it even surpasses actual fusion using multiple sensors at test time. We also extend this method to a more general version called Actual Fusion within Virtual Fusion (AFVF), which uses a subset of training sensors during inference. Our method achieves state-of-the-art accuracy and F1-score on UCI-HAR and PAMAP2 benchmark datasets. Implementation is available upon request.
Process Mining for Unstructured Data: Challenges and Research Directions
Koschmider, Agnes, Aleknonytė-Resch, Milda, Fonger, Frederik, Imenkamp, Christian, Lepsien, Arvid, Apaydin, Kaan, Harms, Maximilian, Janssen, Dominik, Langhammer, Dominic, Ziolkowski, Tobias, Zisgen, Yorck
The volume of data is continuously increasing and the ability and demand to efficiently analyze the data has become even more crucial. Machine learning and data mining are suitable techniques and tools to efficiently process and analyze the data. Complementary to both techniques is process mining [Aa16]. Process mining is a promising approach to find additional patterns (e.g., in terms of causal effects or bottlenecks) in data and in that way to give new insights into the data that could not be directly found with techniques like machine learning or data mining. The insights from processes are given by means of events that have been tracked by information systems. Then, this event data that is structured within a log (i.e., an event log), is used as input to any process mining algorithm. Process mining allows both an analysis based solely on event logs as well as a comparison between (manually generated or as-is) process models and an event log reflecting the to-be processes.
The Open Review-Based (ORB) dataset: Towards Automatic Assessment of Scientific Papers and Experiment Proposals in High-Energy Physics
Szumega, Jaroslaw, Bougueroua, Lamine, Gkotse, Blerina, Jouvelot, Pierre, Ravotti, Federico
With the Open Science approach becoming important for research, the evolution towards open scientific-paper reviews is making an impact on the scientific community. However, there is a lack of publicly available resources for conducting research activities related to this subject, as only a limited number of journals and conferences currently allow access to their review process for interested parties. In this paper, we introduce the new comprehensive Open Review-Based dataset (ORB); it includes a curated list of more than 36,000 scientific papers with their more than 89,000 reviews and final decisions. We gather this information from two sources: the OpenReview.net and SciPost.org websites. However, given the volatile nature of this domain, the software infrastructure that we introduce to supplement the ORB dataset is designed to accommodate additional resources in the future. The ORB deliverables include (1) Python code (interfaces and implementations) to translate document data and metadata into a structured and high-level representation, (2) an ETL process (Extract, Transform, Load) to facilitate the automatic updates from defined sources and (3) data files representing the structured data. The paper presents our data architecture and an overview of the collected data along with relevant statistics. For illustration purposes, we also discuss preliminary Natural-Language-Processing-based experiments that aim to predict (1) papers' acceptance based on their textual embeddings, and (2) grading statistics inferred from embeddings as well. We believe ORB provides a valuable resource for researchers interested in open science and review, with our implementation easing the use of this data for further analysis and experimentation. We plan to update ORB as the field matures as well as introduce new resources even more fitted to dedicated scientific domains such as High-Energy Physics.
PEOPLEx: PEdestrian Opportunistic Positioning LEveraging IMU, UWB, BLE and WiFi
Lajoie, Pierre-Yves, Baghi, Bobak Hamed, Herath, Sachini, Hogan, Francois, Liu, Xue, Dudek, Gregory
This paper advances the field of pedestrian localization by introducing a unifying framework for opportunistic positioning based on nonlinear factor graph optimization. While many existing approaches assume constant availability of one or multiple sensing signals, our methodology employs IMU-based pedestrian inertial navigation as the backbone for sensor fusion, opportunistically integrating Ultra-Wideband (UWB), Bluetooth Low Energy (BLE), and WiFi signals when they are available in the environment. The proposed PEOPLEx framework is designed to incorporate sensing data as it becomes available, operating without any prior knowledge about the environment (e.g. anchor locations, radio frequency maps, etc.). Our contributions are twofold: 1) we introduce an opportunistic multi-sensor and real-time pedestrian positioning framework fusing the available sensor measurements; 2) we develop novel factors for adaptive scaling and coarse loop closures, significantly improving the precision of indoor positioning. Experimental validation confirms that our approach achieves accurate localization estimates in real indoor scenarios using commercial smartphones.