Goto

Collaborating Authors

 Information Fusion


One-Versus-Others Attention: Scalable Multimodal Integration

arXiv.org Artificial Intelligence

Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.


A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling

arXiv.org Machine Learning

Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.


Learning Type Inference for Enhanced Dataflow Analysis

arXiv.org Artificial Intelligence

Statically analyzing dynamically-typed code is a challenging endeavor, as even seemingly trivial tasks such as determining the targets of procedure calls are non-trivial without knowing the types of objects at compile time. Addressing this challenge, gradual typing is increasingly added to dynamically-typed languages, a prominent example being TypeScript that introduces static typing to JavaScript. Gradual typing improves the developer's ability to verify program behavior, contributing to robust, secure and debuggable programs. In practice, however, users only sparsely annotate types directly. At the same time, conventional type inference faces performance-related challenges as program size grows. Statistical techniques based on machine learning offer faster inference, but although recent approaches demonstrate overall improved accuracy, they still perform significantly worse on user-defined types than on the most common built-in types. Limiting their real-world usefulness even more, they rarely integrate with user-facing applications. We propose CodeTIDAL5, a Transformer-based model trained to reliably predict type annotations. For effective result retrieval and re-integration, we extract usage slices from a program's code property graph. Comparing our approach against recent neural type inference systems, our model outperforms the current state-of-the-art by 7.85% on the ManyTypes4TypeScript benchmark, achieving 71.27% accuracy overall. Furthermore, we present JoernTI, an integration of our approach into Joern, an open source static analysis tool, and demonstrate that the analysis benefits from the additional type information. As our model allows for fast inference times even on commodity CPUs, making our system available through Joern leads to high accessibility and facilitates security research.


Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation

arXiv.org Artificial Intelligence

Navigating complex indoor environments requires a deep understanding of the space the robotic agent is acting into to correctly inform the navigation process of the agent towards the goal location. In recent learning-based navigation approaches, the scene understanding and navigation abilities of the agent are achieved simultaneously by collecting the required experience in simulation. Unfortunately, even if simulators represent an efficient tool to train navigation policies, the resulting models often fail when transferred into the real world. One possible solution is to provide the navigation model with mid-level visual representations containing important domain-invariant properties of the scene. But, what are the best representations that facilitate the transfer of a model to the real-world? How can they be combined? In this work we address these issues by proposing a benchmark of Deep Learning architectures to combine a range of mid-level visual representations, to perform a PointGoal navigation task following a Reinforcement Learning setup. All the proposed navigation models have been trained with the Habitat simulator on a synthetic office environment and have been tested on the same real-world environment using a real robotic platform. To efficiently assess their performance in a real context, a validation tool has been proposed to generate realistic navigation episodes inside the simulator. Our experiments showed that navigation models can benefit from the multi-modal input and that our validation tool can provide good estimation of the expected navigation performance in the real world, while saving time and resources. The acquired synthetic and real 3D models of the environment, together with the code of our validation tool built on top of Habitat, are publicly available at the following link: https://iplab.dmi.unict.it/EmbodiedVN/


Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption

arXiv.org Machine Learning

The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.


MEM: Multi-Modal Elevation Mapping for Robotics and Learning

arXiv.org Artificial Intelligence

Elevation maps are commonly used to represent the environment of mobile robots and are instrumental for locomotion and navigation tasks. However, pure geometric information is insufficient for many field applications that require appearance or semantic information, which limits their applicability to other platforms or domains. In this work, we extend a 2.5D robot-centric elevation mapping framework by fusing multi-modal information from multiple sources into a popular map representation. The framework allows inputting data contained in point clouds or images in a unified manner. To manage the different nature of the data, we also present a set of fusion algorithms that can be selected based on the information type and user requirements. Our system is designed to run on the GPU, making it real-time capable for various robotic and learning tasks. We demonstrate the capabilities of our framework by deploying it on multiple robots with varying sensor configurations and showcasing a range of applications that utilize multi-modal layers, including line detection, human detection, and colorization.


Ensemble Learning for Fusion of Multiview Vision with Occlusion and Missing Information: Framework and Evaluations with Real-World Data and Applications in Driver Hand Activity Recognition

arXiv.org Artificial Intelligence

Multi-sensor frameworks provide opportunities for ensemble learning and sensor fusion to make use of redundancy and supplemental information, helpful in real-world safety applications such as continuous driver state monitoring which necessitate predictions even in cases where information may be intermittently missing. We define this problem of intermittent instances of missing information (by occlusion, noise, or sensor failure) and design a learning framework around these data gaps, proposing and analyzing an imputation scheme to handle missing information. We apply these ideas to tasks in camera-based hand activity classification for robust safety during autonomous driving. We show that a late-fusion approach between parallel convolutional neural networks can outperform even the best-placed single camera model in estimating the hands' held objects and positions when validated on within-group subjects, and that our multi-camera framework performs best on average in cross-group validation, and that the fusion approach outperforms ensemble weighted majority and model combination schemes.


MINS: Efficient and Robust Multisensor-aided Inertial Navigation System

arXiv.org Artificial Intelligence

Robust multisensor fusion of multi-modal measurements such as IMUs, wheel encoders, cameras, LiDARs, and GPS holds great potential due to its innate ability to improve resilience to sensor failures and measurement outliers, thereby enabling robust autonomy. To the best of our knowledge, this work is among the first to develop a consistent tightly-coupled Multisensor-aided Inertial Navigation System (MINS) that is capable of fusing the most common navigation sensors in an efficient filtering framework, by addressing the particular challenges of computational complexity, sensor asynchronicity, and intra-sensor calibration. In particular, we propose a consistent high-order on-manifold interpolation scheme to enable efficient asynchronous sensor fusion and state management strategy (i.e. dynamic cloning). The proposed dynamic cloning leverages motion-induced information to adaptively select interpolation orders to control computational complexity while minimizing trajectory representation errors. We perform online intrinsic and extrinsic (spatiotemporal) calibration of all onboard sensors to compensate for poor prior calibration and/or degraded calibration varying over time. Additionally, we develop an initialization method with only proprioceptive measurements of IMU and wheel encoders, instead of exteroceptive sensors, which is shown to be less affected by the environment and more robust in highly dynamic scenarios. We extensively validate the proposed MINS in simulations and large-scale challenging real-world datasets, outperforming the existing state-of-the-art methods, in terms of localization accuracy, consistency, and computation efficiency. We have also open-sourced our algorithm, simulator, and evaluation toolbox for the benefit of the community: https://github.com/rpng/mins.


Nonlinear Heterogeneous Bayesian Decentralized Data Fusion

arXiv.org Artificial Intelligence

The factor graph decentralized data fusion (FG-DDF) framework was developed for the analysis and exploitation of conditional independence in {heterogeneous Bayesian decentralized fusion problems, in which robots update and fuse pdfs over different, but overlapping subsets of random states. This allows robots to efficiently use smaller probabilistic models and sparse message passing to accurately and scalably fuse relevant local parts of a larger global joint state pdf while accounting for data dependencies between robots. Whereas prior work required limiting assumptions about network connectivity and model linearity, this paper relaxes these to explore the applicability and robustness of FG-DDF in more general settings. We develop a new heterogeneous fusion rule which generalizes the homogeneous covariance intersection algorithm for such cases and test it in multi-robot tracking and localization scenarios with non-linear motion/observation models under communication dropouts. Simulation and hardware experiments show that, in practice, the FG-DDF continues to provide consistent filtered estimates under these more practical operating conditions, while reducing computation and communication costs by more than 99\%, thus enabling the design of scalable real-world multi-robot systems.


Smart City Digital Twin Framework for Real-Time Multi-Data Integration and Wide Public Distribution

arXiv.org Artificial Intelligence

Digital Twins are digital replica of real entities and are becoming fundamental tools to monitor and control the status of entities, predict their future evolutions, and simulate alternative scenarios to understand the impact of changes. Thanks to the large deployment of sensors, with the increasing information it is possible to build accurate reproductions of urban environments including structural data and real-time information. Such solutions help city councils and decision makers to face challenges in urban development and improve the citizen quality of life, by ana-lysing the actual conditions, evaluating in advance through simulations and what-if analysis the outcomes of infrastructural or political chang-es, or predicting the effects of humans and/or of natural events. Snap4City Smart City Digital Twin framework is capable to respond to the requirements identified in the literature and by the international forums. Differently from other solutions, the proposed architecture provides an integrated solution for data gathering, indexing, computing and information distribution offered by the Snap4City IoT platform, therefore realizing a continuously updated Digital Twin. 3D building models, road networks, IoT devices, WoT Entities, point of interests, routes, paths, etc., as well as results from data analytical processes for traffic density reconstruction, pollutant dispersion, predictions of any kind, what-if analysis, etc., are all integrated into an accessible web interface, to support the citizens participation in the city decision processes. What-If analysis to let the user performs simulations and observe possible outcomes. As case of study, the Digital Twin of the city of Florence (Italy) is presented. Snap4City platform, is released as open-source, and made available through GitHub and as docker compose.