Information Fusion
Joint and individual variation explained (JIVE) for integrated analysis of multiple data types
Lock, Eric F., Hoadley, Katherine A., Marron, J. S., Nobel, Andrew B.
Research in several fields now requires the analysis of data sets in which multiple high-dimensional types of data are available for a common set of objects. In particular, The Cancer Genome Atlas (TCGA) includes data from several diverse genomic technologies on the same cancerous tumor samples. In this paper we introduce Joint and Individual Variation Explained (JIVE), a general decomposition of variation for the integrated analysis of such data sets. The decomposition consists of three terms: a low-rank approximation capturing joint variation across data types, low-rank approximations for structured variation individual to each data type, and residual noise. JIVE quantifies the amount of joint variation between data types, reduces the dimensionality of the data and provides new directions for the visual exploration of joint and individual structures. The proposed method represents an extension of Principal Component Analysis and has clear advantages over popular two-block methods such as Canonical Correlation Analysis and Partial Least Squares. A JIVE analysis of gene expression and miRNA data on Glioblastoma Multiforme tumor samples reveals gene-miRNA associations and provides better characterization of tumor types. Data and software are available at https://genome.unc.edu/jive/
Sequential testing over multiple stages and performance analysis of data fusion
The JIEDDO Analytic Decision Engine (JADE) is a flexible software toolkit for studying the performance of sensor configurations for the detection of person-borne explosive compounds and other threat substances. JADE is designed to enable performance and tradeoff analyses between different, user-specified scenarios with given sensor placements and data fusion networks. JADE contains fundamental physics-based models of several sensor technologies of interest, such as nonlinear acoustic and radar-based detectors, along with a data fusion system that we focus on in this paper. The fusion system consists of a static component that combines the decisions of individual sensors at a fixed point in time, and a dynamic, time-dependent component that in turn fuses the outputs of the static structure at different times. The static component is based on a probabilistic graphical model, or Bayesian network, and accepts probability matrices from the physicsbased sensor models as inputs (the details of which are abstracted from the fusion system). Its outputs are fed into the dynamic fusion framework, which is based on sequential hypothesis testing and produces performance metrics for the entire, fused sensor configuration. The purpose of the system is to determine the performance of a given fusion structure, as opposed to doing fusion on actual measurements.
A Fusion Algorithm for Solving Bayesian Decision Problems
This paper proposes a new method for solving Bayesian decision problems. The method consists of representing a Bayesian decision problem as a valuation-based system and applying a fusion algorithm for solving it. The fusion algorithm is a hybrid of local computational methods for computation of marginals of joint probability distributions and the local computational methods for discrete optimization problems.
Possibilistic Assumption based Truth Maintenance System, Validation in a Data Fusion Application
Monai, Francesco Fulvio, Chehire, Thomas
Data fusion allows the elaboration and the evaluation of a situation synthesized from low level informations provided by different kinds of sensors. The fusion of the collected data will result in fewer and higher level informations more easily assessed by a human operator and that will assist him effectively in his decision process. In this paper we present the suitability and the advantages of using a Possibilistic Assumption based Truth Maintenance System (0-ATMS) in a data fusion military application. We first describe the problem, the needed knowledge representation formalisms and problem solving paradigms. Then we remind the reader of the basic concepts of ATMSs, Possibilistic Logic and Il-ATMSs. Finally we detail the solution to the given data fusion problem and conclude with the results and comparison with a non-possibilistic solution.
QuerioCity: Accessing the Information of a City
Lopez, Vanessa (IBM Smarter Cities) | Kotoulas, Spyros (IBM Smarter Cities) | Sbodio, Marco Luca (IBM Smarter Cities) | Stephenson, Martin (IBM Smarter Cities) | Lloyd, Raymond (IBM Smarter Cities) | Gkoulalas-Divanis, Aris (IBM Smarter Cities) | Aonghusa, Pol Mac (IBM Smarter Cities)
QuerioCity aims at creating an ecosystem for managing and accessing the information of a city, with a particular focus on transforming, integrating and querying heterogenous semistructured data in an open environment. This raises unique challenges in terms of: - Fitness-for-use. The users of the system are not data integration experts and not qualified to use industry data integration tools. Furthermore, they are not able to query data using structured query languages. The domain of the information is very broad and open.
An Approach to Model Interest for Planetary Rover through Dezert-Smarandache Theory
Ceriotti, Matteo, Vasile, Massimiliano, Giardini, Giovanni, Massari, Mauro
In this paper, we propose an approach for assigning an interest level to the goals of a planetary rover. Assigning an interest level to goals, allows the rover autonomously to transform and reallocate the goals. The interest level is defined by data-fusing payload and navigation information. The fusion yields an "interest map", that quantifies the level of interest of each area around the rover. In this way the planner can choose the most interesting scientific objectives to be analyzed, with limited human intervention, and reallocates its goals autonomously. The Dezert-Smarandache Theory of Plausible and Paradoxical Reasoning was used for information fusion: this theory allows dealing with vague and conflicting data. In particular, it allows us directly to model the behavior of the scientists that have to evaluate the relevance of a particular set of goals. The paper shows an application of the proposed approach to the generation of a reliable interest map.
PAC-Bayesian Majority Vote for Late Classifier Fusion
Morvant, Emilie, Habrard, Amaury, Ayache, Stéphane
A lot of attention has been devoted to multimedia indexing over the past few years. In the literature, we often consider two kinds of fusion schemes: The early fusion and the late fusion. In this paper we focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the Machine Learning PAC-Bayes theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while making use of the voters' diversity. We provide evidence that this method is naturally adapted to late fusion procedure. We propose an extension of MinCq by adding an order- preserving pairwise loss for ranking, helping to improve Mean Averaged Precision measure. We confirm the good behavior of the MinCq-based fusion approaches with experiments on a real image benchmark.
Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure
Vyas, Nisarg (BodyMedia, Inc.) | Farringdon, Jonathan (BodyMedia Inc.) | Andre, David (Cerebellum Capital, Inc.) | Stivoric, John Ivo (BodyMedia)
In this article we provide insight into the BodyMedia FIT armband system -- a wearable multi-sensor technology that continuously monitors physiological events related to energy expenditure for weight management using machine learning and data modeling methods. Since becoming commercially available in 2001, more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system's success.
Machine Learning and Sensor Fusion for Estimating Continuous Energy Expenditure
Vyas, Nisarg (BodyMedia, Inc.) | Farringdon, Jonathan (BodyMedia Inc.) | Andre, David (Cerebellum Capital, Inc.) | Stivoric, John Ivo (BodyMedia)
In this article we provide insight into the BodyMedia FIT armband system — a wearable multi-sensor technology that continuously monitors physiological events related to energy expenditure for weight management using machine learning and data modeling methods. Since becoming commercially available in 2001, more than half a million users have used the system to track their physiological parameters and to achieve their individual health goals including weight-loss. We describe several challenges that arise in applying machine learning techniques to the health care domain and present various solutions utilized in the armband system. We demonstrate how machine learning and multi-sensor data fusion techniques are critical to the system’s success.
Decentralized Data Fusion and Active Sensing with Mobile Sensors for Modeling and Predicting Spatiotemporal Traffic Phenomena
Chen, Jie, Low, Kian Hsiang, Tan, Colin Keng-Yan, Oran, Ali, Jaillet, Patrick, Dolan, John M., Sukhatme, Gaurav S.
The problem of modeling and predicting spatiotemporal traffic phenomena over an urban road network is important to many traffic applications such as detecting and forecasting congestion hotspots. This paper presents a decentralized data fusion and active sensing (D2FAS) algorithm for mobile sensors to actively explore the road network to gather and assimilate the most informative data for predicting the traffic phenomenon. We analyze the time and communication complexity of D2FAS and demonstrate that it can scale well with a large number of observations and sensors. We provide a theoretical guarantee on its predictive performance to be equivalent to that of a sophisticated centralized sparse approximation for the Gaussian process (GP) model: The computation of such a sparse approximate GP model can thus be parallelized and distributed among the mobile sensors (in a Google-like MapReduce paradigm), thereby achieving efficient and scalable prediction. We also theoretically guarantee its active sensing performance that improves under various practical environmental conditions. Empirical evaluation on real-world urban road network data shows that our D2FAS algorithm is significantly more time-efficient and scalable than state-of-the-art centralized algorithms while achieving comparable predictive performance.