Goto

Collaborating Authors

 Information Fusion


Combining Experimental and Observational Data for Identification of Long-Term Causal Effects

arXiv.org Machine Learning

We consider the task of estimating the causal effect of a treatment variable on a long-term outcome variable using data from an observational domain and an experimental domain. The observational data is assumed to be confounded and hence without further assumptions, this dataset alone cannot be used for causal inference. Also, only a short-term version of the primary outcome variable of interest is observed in the experimental data, and hence, this dataset alone cannot be used for causal inference either. In a recent work, Athey et al. (2020) proposed a method for systematically combining such data for identifying the downstream causal effect in view. Their approach is based on the assumptions of internal and external validity of the experimental data, and an extra novel assumption called latent unconfoundedness. In this paper, we first review their proposed approach and discuss the latent unconfoundedness assumption. Then we propose two alternative approaches for data fusion for the purpose of estimating average treatment effect as well as the effect of treatment on the treated. Our first proposed approach is based on assuming equi-confounding bias for the short-term and long-term outcomes. Our second proposed approach is based on the proximal causal inference framework, in which we assume the existence of an extra variable in the system which is a proxy of the latent confounder of the treatment-outcome relation.


Senior Data Integrations Engineer, IT

#artificialintelligence

Here at Anaplan, we have reinvented how companies see, plan, and run their businesses. Our platform allows our customers to uncover new insights, connect their strategy to their plans, and work in ways they had not previously thought possible. We're growing fast, constantly innovating, and couldn't be prouder to help our customers move forward with confidence in a sophisticated and changing world. We are looking for forward-thinking people who bend over backward to put customers first. Individuals who thrive on challenge and are ready to grasp the opportunity of a lifetime.


Survey and Systematization of 3D Object Detection Models and Methods

arXiv.org Artificial Intelligence

This paper offers a comprehensive survey of recent developments in 3D object detection covering the full pipeline from input data, over data representation and feature extraction to the actual detection modules. We include basic concepts, focus our survey on a broad spectrum of different approaches arising in the last ten years and propose a systematization which offers a practical framework to compare those approaches on the methods level.


Why most machine learning projects stumble

#artificialintelligence

Despite widespread interest in machine learning (ML), relatively few projects leave the proof-of-concept phase and enter production. In fact, in a 2020 report, Capgemini found that roughly 85% of all ML projects grind to a halt across Capgemini's client organizations--despite successful preliminary models and ample support from executive leaders. Further, the study found, only half of the world's leading AI-powered enterprises successfully roll out artificial intelligence projects, including ML models, and this number drops substantially among organizations without dedicated ML teams. In recent years, AI solutions have attracted the interest of executive leadership across industries. Machine-learning models, perhaps the leading subset of AI, have particularly interested enterprises racing to digitize in the modern market because of their ability to automatically "learn" and update.


Spatiotemporal Analysis Using Riemannian Composition of Diffusion Operators

arXiv.org Machine Learning

Multivariate time-series have become abundant in recent years, as many data-acquisition systems record information through multiple sensors simultaneously. In this paper, we assume the variables pertain to some geometry and present an operator-based approach for spatiotemporal analysis. Our approach combines three components that are often considered separately: (i) manifold learning for building operators representing the geometry of the variables, (ii) Riemannian geometry of symmetric positive-definite matrices for multiscale composition of operators corresponding to different time samples, and (iii) spectral analysis of the composite operators for extracting different dynamic modes. We propose a method that is analogous to the classical wavelet analysis, which we term Riemannian multi-resolution analysis (RMRA). We provide some theoretical results on the spectral analysis of the composite operators, and we demonstrate the proposed method on simulations and on real data.


ETL Tool Apache Hop Graduates Incubator

#artificialintelligence

Apache Hop, a metadata-driven data orchestration tool used to design and build pipelines, today emerged from incubator status and was named a Top-Level Project at the Apache Software Foundation, clearing the way for more intensive production use. Apache Hop, which stands for Hop Orchestration Platform, is a Java-based product designed to help data professionals manage a variety of data and metadata orchestration and integration needs. The software sports a visual design environment that allows users to create ETL pipelines, as well as an execution engine that can run by itself or embedded into Spark, Flink, Google Dataflow, or on AWS EMR via Apache Beam. "Hop is entirely metadata driven," it states on the Apache Hop website. "Every object type in Hop describes how data is read, manipulated or written, or how workflows and pipelines need to be orchestrated. Metadata is what drives Hop internally as well. Hop uses a kernel architecture with a robust engine. Plugins add functionality to the engine through their own metadata."


Coupled Support Tensor Machine Classification for Multimodal Neuroimaging Data

arXiv.org Machine Learning

Multimodal data arise in various applications where information about the same phenomenon is acquired from multiple sensors and across different imaging modalities. Learning from multimodal data is of great interest in machine learning and statistics research as this offers the possibility of capturing complementary information among modalities. Multimodal modeling helps to explain the interdependence between heterogeneous data sources, discovers new insights that may not be available from a single modality, and improves decision-making. Recently, coupled matrix-tensor factorization has been introduced for multimodal data fusion to jointly estimate latent factors and identify complex interdependence among the latent factors. However, most of the prior work on coupled matrix-tensor factors focuses on unsupervised learning and there is little work on supervised learning using the jointly estimated latent factors. This paper considers the multimodal tensor data classification problem. A Coupled Support Tensor Machine (C-STM) built upon the latent factors jointly estimated from the Advanced Coupled Matrix Tensor Factorization (ACMTF) is proposed. C-STM combines individual and shared latent factors with multiple kernels and estimates a maximal-margin classifier for coupled matrix tensor data. The classification risk of C-STM is shown to converge to the optimal Bayes risk, making it a statistically consistent rule. C-STM is validated through simulation studies as well as a simultaneous EEG-fMRI analysis. The empirical evidence shows that C-STM can utilize information from multiple sources and provide a better classification performance than traditional single-mode classifiers.


Data Harmonisation for Information Fusion in Digital Healthcare: A State-of-the-Art Systematic Review, Meta-Analysis and Future Research Directions

arXiv.org Artificial Intelligence

Removing the bias and variance of multicentre data has always been a challenge in large scale digital healthcare studies, which requires the ability to integrate clinical features extracted from data acquired by different scanners and protocols to improve stability and robustness. Previous studies have described various computational approaches to fuse single modality multicentre datasets. However, these surveys rarely focused on evaluation metrics and lacked a checklist for computational data harmonisation studies. In this systematic review, we summarise the computational data harmonisation approaches for multi-modality data in the digital healthcare field, including harmonisation strategies and evaluation metrics based on different theories. In addition, a comprehensive checklist that summarises common practices for data harmonisation studies is proposed to guide researchers to report their research findings more effectively. Last but not least, flowcharts presenting possible ways for methodology and metric selection are proposed and the limitations of different methods have been surveyed for future research.


Data Integration Engineer

#artificialintelligence

We are seeking a collaborative, curious, and enthusiastic Software Engineer to help build our scalable file processing engine ("Hydra") and the customer-specific processes that use this engine. You would have the opportunity to work with APIs, backend services, our deployment pipeline, AWS, Kubernetes, and more. As part of the Extended Integrations team, you would also have the opportunity to work on customer-specific integrations, which would expose you to real world problems and allow you to make an immediate impact. We use a number of pretty neat technologies including node, mongo, and typescript, and we deploy roughly every two weeks – so nothing sits on the shelf very long. Clarabridge helps hundreds of the world's leading brands understand and improve their customer experience.


Data Fusion with Latent Map Gaussian Processes

arXiv.org Machine Learning

Multi-fidelity modeling and calibration are data fusion tasks that ubiquitously arise in engineering design. In this paper, we introduce a novel approach based on latent-map Gaussian processes (LMGPs) that enables efficient and accurate data fusion. In our approach, we convert data fusion into a latent space learning problem where the relations among different data sources are automatically learned. This conversion endows our approach with attractive advantages such as increased accuracy, reduced costs, flexibility to jointly fuse any number of data sources, and ability to visualize correlations between data sources. This visualization allows the user to detect model form errors or determine the optimum strategy for high-fidelity emulation by fitting LMGP only to the subset of the data sources that are well-correlated. We also develop a new kernel function that enables LMGPs to not only build a probabilistic multi-fidelity surrogate but also estimate calibration parameters with high accuracy and consistency. The implementation and use of our approach are considerably simpler and less prone to numerical issues compared to existing technologies. We demonstrate the benefits of LMGP-based data fusion by comparing its performance against competing methods on a wide range of examples.