Goto

Collaborating Authors

 Information Fusion


Distributed Multi-Object Tracking Under Limited Field of View Heterogeneous Sensors with Density Clustering

arXiv.org Artificial Intelligence

We consider the problem of tracking multiple, unknown, and time-varying numbers of objects using a distributed network of heterogeneous sensors. In an effort to derive a formulation for practical settings, we consider limited and unknown sensor field-of-views (FoVs), sensors with limited local computational resources and communication channel capacity. The resulting distributed multi-object tracking algorithm involves solving an NP-hard multidimensional assignment problem either optimally for small-size problems or sub-optimally for general practical problems. For general problems, we propose an efficient distributed multi-object tracking algorithm that performs track-to-track fusion using a clustering-based analysis of the state space transformed into a density space to mitigate the complexity of the assignment problem. The proposed algorithm can more efficiently group local track estimates for fusion than existing approaches. To ensure we achieve globally consistent identities for tracks across a network of nodes as objects move between FoVs, we develop a graph-based algorithm to achieve label consensus and minimise track segmentation. Numerical experiments with a synthetic and a real-world trajectory dataset demonstrate that our proposed method is significantly more computationally efficient than state-of-the-art solutions, achieving similar tracking accuracy and bandwidth requirements but with improved label consistency.


Less or More From Teacher: Exploiting Trilateral Geometry For Knowledge Distillation

arXiv.org Artificial Intelligence

Knowledge distillation aims to train a compact student network using soft supervision from a larger teacher network and hard supervision from ground truths. However, determining an optimal knowledge fusion ratio that balances these supervisory signals remains challenging. Prior methods generally resort to a constant or heuristic-based fusion ratio, which often falls short of a proper balance. In this study, we introduce a novel adaptive method for learning a sample-wise knowledge fusion ratio, exploiting both the correctness of teacher and student, as well as how well the student mimics the teacher on each sample. Our method naturally leads to the intra-sample trilateral geometric relations among the student prediction ($S$), teacher prediction ($T$), and ground truth ($G$). To counterbalance the impact of outliers, we further extend to the inter-sample relations, incorporating the teacher's global average prediction $\bar{T}$ for samples within the same class. A simple neural network then learns the implicit mapping from the intra- and inter-sample relations to an adaptive, sample-wise knowledge fusion ratio in a bilevel-optimization manner. Our approach provides a simple, practical, and adaptable solution for knowledge distillation that can be employed across various architectures and model sizes. Extensive experiments demonstrate consistent improvements over other loss re-weighting methods on image classification, attack detection, and click-through rate prediction.


LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition

arXiv.org Artificial Intelligence

Place recognition is one of the most crucial modules for autonomous vehicles to identify places that were previously visited in GPS-invalid environments. Sensor fusion is considered an effective method to overcome the weaknesses of individual sensors. In recent years, multimodal place recognition fusing information from multiple sensors has gathered increasing attention. However, most existing multimodal place recognition methods only use limited field-of-view camera images, which leads to an imbalance between features from different modalities and limits the effectiveness of sensor fusion. In this paper, we present a novel neural network named LCPR for robust multimodal place recognition, which fuses LiDAR point clouds with multi-view RGB images to generate discriminative and yaw-rotation invariant representations of the environment. A multi-scale attention-based fusion module is proposed to fully exploit the panoramic views from different modalities of the environment and their correlations. We evaluate our method on the nuScenes dataset, and the experimental results show that our method can effectively utilize multi-view camera and LiDAR data to improve the place recognition performance while maintaining strong robustness to viewpoint changes. Our open-source code and pre-trained models are available at https://github.com/ZhouZijie77/LCPR .


EvPlug: Learn a Plug-and-Play Module for Event and Image Fusion

arXiv.org Artificial Intelligence

Event cameras and RGB cameras exhibit complementary characteristics in imaging: the former possesses high dynamic range (HDR) and high temporal resolution, while the latter provides rich texture and color information. This makes the integration of event cameras into middle-and high-level RGB-based vision tasks highly promising. However, challenges arise in multi-modal fusion, data annotation, and model architecture design. In this paper, we propose EvPlug, which learns a plug-and-play event and image fusion module from the supervision of the existing RGB-based model. The learned fusion module integrates event streams with image features in the form of a plug-in, endowing the RGB-based model to be robust to HDR and fast motion scenes while enabling high temporal resolution inference. Our method only requires unlabeled event-image pairs (no pixel-wise alignment required) and does not alter the structure or weights of the RGB-based model. We demonstrate the superiority of EvPlug in several vision tasks such as object detection, semantic segmentation, and 3D hand pose estimation.


Federated Continual Learning via Knowledge Fusion: A Survey

arXiv.org Artificial Intelligence

Data privacy and silos are nontrivial and greatly challenging in many real-world applications. Federated learning is a decentralized approach to training models across multiple local clients without the exchange of raw data from client devices to global servers. However, existing works focus on a static data environment and ignore continual learning from streaming data with incremental tasks. Federated Continual Learning (FCL) is an emerging paradigm to address model learning in both federated and continual learning environments. The key objective of FCL is to fuse heterogeneous knowledge from different clients and retain knowledge of previous tasks while learning on new ones. In this work, we delineate federated learning and continual learning first and then discuss their integration, i.e., FCL, and particular FCL via knowledge fusion. In summary, our motivations are four-fold: we (1) raise a fundamental problem called ''spatial-temporal catastrophic forgetting'' and evaluate its impact on the performance using a well-known method called federated averaging (FedAvg), (2) integrate most of the existing FCL methods into two generic frameworks, namely synchronous FCL and asynchronous FCL, (3) categorize a large number of methods according to the mechanism involved in knowledge fusion, and finally (4) showcase an outlook on the future work of FCL.


Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers

arXiv.org Artificial Intelligence

Combining complementary sensor modalities is crucial to providing robust perception for safety-critical robotics applications such as autonomous driving (AD). Recent state-of-the-art camera-lidar fusion methods for AD rely on monocular depth estimation which is a notoriously difficult task compared to using depth information from the lidar directly. Here, we find that this approach does not leverage depth as expected and show that naively improving depth estimation does not lead to improvements in object detection performance and that, strikingly, removing depth estimation altogether does not degrade object detection performance. This suggests that relying on monocular depth could be an unnecessary architectural bottleneck during camera-lidar fusion. In this work, we introduce a novel fusion method that bypasses monocular depth estimation altogether and instead selects and fuses camera and lidar features in a bird's-eye-view grid using a simple attention mechanism. We show that our model can modulate its use of camera features based on the availability of lidar features and that it yields better 3D object detection on the nuScenes dataset than baselines relying on monocular depth estimation.


Lp-Norm Constrained One-Class Classifier Combination

arXiv.org Artificial Intelligence

Different realisations of this generic methodology may appear in accordance with the level where the fusion is practised, including data fusion, feature fusion, soft decision fusion, or hard decision fusion, etc. Classifier fusion, and in particular, a soft combination of the output scores of multiple learners has been established as a standard approach to improve classification performance in various learning scenarios [1]. The motivating principle behind adopting a classifier fusion approach is to leverage the collective ability of multiple models, presumed to be as independent as possible, to mitigate the shortcomings of a single model, thus improving the overall performance. In general, classifier fusion approaches are expected to yield better results by - reducing the risk of selecting an inaccurate individual learner; - minimising the chances of settling for a suboptimal solution when individual learners may be stuck in local optima; - allowing for a better exploration of the potential solution space; - potentially providing a better capacity to deal with imbalanced training data; - being more capable of adapting to dynamic scenarios where the representations and labels may change over time, and - helping to mitigate the curse of dimensionality and reducing the chances of overfitting [2]. Despite its appealing properties and its widespread application in multiclass classification scenarios where significant performance improvements have been observed [1], the one-class classifier fusion paradigm has not been explored widely. In a one-class classification (OCC) setting, one is interested in classifying an observation as normal/positive/target or as abnormal/negative/anomaly by mainly training on positive samples [3]. The prevalent application of OCC is often witnessed in scenarios where the accumulation of counterexamples is either highly demanding or simply infeasible [4], challenging binary/multi-class classification approaches.


To Fuse or Not to Fuse: Measuring Consistency in Multi-Sensor Fusion for Aerial Robots

arXiv.org Artificial Intelligence

Aerial vehicles are no longer limited to flying in open space: recent work has focused on aerial manipulation and up-close inspection. Such applications place stringent requirements on state estimation: the robot must combine state information from many sources, including onboard odometry and global positioning sensors. However, flying close to or in contact with structures is a degenerate case for many sensing modalities, and the robot's state estimation framework must intelligently choose which sensors are currently trustworthy. We evaluate a number of metrics to judge the reliability of sensing modalities in a multi-sensor fusion framework, then introduce a consensus-finding scheme that uses this metric to choose which sensors to fuse or not to fuse. Finally, we show that such a fusion framework is more robust and accurate than fusing all sensors all the time and demonstrate how such metrics can be informative in real-world experiments in indoor-outdoor flight and bridge inspection.


Building Flexible, Scalable, and Machine Learning-ready Multimodal Oncology Datasets

arXiv.org Artificial Intelligence

The advancements in data acquisition, storage, and processing techniques have resulted in the rapid growth of heterogeneous medical data. Integrating radiological scans, histopathology images, and molecular information with clinical data is essential for developing a holistic understanding of the disease and optimizing treatment. The need for integrating data from multiple sources is further pronounced in complex diseases such as cancer for enabling precision medicine and personalized treatments. This work proposes Multimodal Integration of Oncology Data System (MINDS) - a flexible, scalable, and cost-effective metadata framework for efficiently fusing disparate data from public sources such as the Cancer Research Data Commons (CRDC) into an interconnected, patient-centric framework. MINDS offers an interface for exploring relationships across data types and building cohorts for developing large-scale multimodal machine learning models. By harmonizing multimodal data, MINDS aims to potentially empower researchers with greater analytical ability to uncover diagnostic and prognostic insights and enable evidence-based personalized care. MINDS tracks granular end-to-end data provenance, ensuring reproducibility and transparency. The cloud-native architecture of MINDS can handle exponential data growth in a secure, cost-optimized manner while ensuring substantial storage optimization, replication avoidance, and dynamic access capabilities. Auto-scaling, access controls, and other mechanisms guarantee pipelines' scalability and security. MINDS overcomes the limitations of existing biomedical data silos via an interoperable metadata-driven approach that represents a pivotal step toward the future of oncology data integration.


Comparison of two data fusion approaches for land use classification

arXiv.org Artificial Intelligence

ABSTRACT: Accurate land use maps, describing the territory from an anthropic utilisation point of view, are useful tools for land management and planning. To produce them, the use of optical images alone remains limited. It is therefore necessary to make use of several heterogeneous sources, each carrying complementary or contradictory information due to their imperfections or their different specifications. This study compares two different approaches i.e. a pre-classification and a post-classification fusion approach for combining several sources of spatial data in the context of land use classification. The approaches are applied on authoritative land use data located in the Gers department in the south-west of France. Pre-classification fusion, while not explicitly modeling imperfections, has the best final results, reaching an overall accuracy of 97% and a macro-mean F1 score of 88%. 1. INTRODUCTION At the feature level, Fonte et al. (2018) identified building functions using Land Use (LU) describes the socio-economic human activity of a rule based classifications of OpenStreetMap (OSM), Facebook an area (e.g. Land al. (2022) identified building functions from images, POI and Use and Land Cover (LULC) maps are very useful for understanding, building footprint from Gaode map (authoritative database) and monitoring, planning and predicting the evolution of distance to OSM roads using a XGBoost classifier.