Goto

Collaborating Authors

 Information Fusion


Automated migration from on-premise Hadoop to Databricks Delta Lake using StreamAnalytix

#artificialintelligence

Most enterprises are undertaking a digital transformation initiative. Data and analytics modernization is an integral part of this journey. On-premise legacy systems like Hadoop clusters and data warehouses limit innovation and growth due to their old architectures. New cloud-based platforms are becoming an inevitable consideration for many such enterprises. However, they are seeking to reduce the risk and complexity of manual migration from their conventional ETL tools and data lakes to a modern, future state.


Kalman Filter, Sensor Fusion, and Constrained Regression: Equivalences and Insights

Neural Information Processing Systems

The Kalman filter (KF) is one of the most widely used tools for data assimilation and sequential estimation. In this work, we show that the state estimates from the KF in a standard linear dynamical system setting are equivalent to those given by the KF in a transformed system, with infinite process noise (i.e., a flat prior'') and an augmented measurement space. This reformulation---which we refer to as augmented measurement sensor fusion (SF)---is conceptually interesting, because the transformed system here is seemingly static (as there is effectively no process model), but we can still capture the state dynamics inherent to the KF by folding the process model into the measurement space. Further, this reformulation of the KF turns out to be useful in settings in which past states are observed eventually (at some lag). Here, when the measurement noise covariance is estimated by the empirical covariance, we show that the state predictions from SF are equivalent to those from a regression of past states on past measurements, subject to particular linear constraints (reflecting the relationships encoded in the measurement map).


On Single Source Robustness in Deep Fusion Models

Neural Information Processing Systems

Algorithms that fuse multiple input sources benefit from both complementary and shared information. Shared information may provide robustness against faulty or noisy inputs, which is indispensable for safety-critical applications like self-driving cars. We investigate learning fusion algorithms that are robust against noise added to a single source. We first demonstrate that robustness against single source noise is not guaranteed in a linear fusion model. Motivated by this discovery, two possible approaches are proposed to increase robustness: a carefully designed loss with corresponding training algorithms for deep fusion models, and a simple convolutional fusion layer that has a structural advantage in dealing with noise.


PPDM Association on Twitter

#artificialintelligence

For those who may not yet have heard, given the current situation, the 2020 Houston Professional Petroleum Data Expo will be undergoing some changes. We are working on re-booking the physical event, but are pleased to offer a virtual conference.


Artificial Intelligence in Energy Market 2020-2024 Demand for Data Integration and Visual Analytics to Boost Growth Technavio

#artificialintelligence

Technavio is a leading global technology research and advisory company. Their research and analysis focus on emerging market trends and provides actionable insights to help businesses identify market opportunities and develop effective strategies to optimize their market positions. With over 500 specialized analysts, Technavio's report library consists of more than 17,000 reports and counting, covering 800 technologies, spanning across 50 countries. Their client base consists of enterprises of all sizes, including more than 100 Fortune 500 companies. This growing client base relies on Technavio's comprehensive coverage, extensive research, and actionable market insights to identify opportunities in existing and potential markets and assess their competitive positions within changing market scenarios.


Health State Estimation

arXiv.org Artificial Intelligence

Life's most valuable asset is health. Continuously understanding the state of our health and modeling how it evolves is essential if we wish to improve it. Given the opportunity that people live with more data about their life today than any other time in history, the challenge rests in interweaving this data with the growing body of knowledge to compute and model the health state of an individual continually. This dissertation presents an approach to build a personal model and dynamically estimate the health state of an individual by fusing multi-modal data and domain knowledge. The system is stitched together from four essential abstraction elements: 1. the events in our life, 2. the layers of our biological systems (from molecular to an organism), 3. the functional utilities that arise from biological underpinnings, and 4. how we interact with these utilities in the reality of daily life. Connecting these four elements via graph network blocks forms the backbone by which we instantiate a digital twin of an individual. Edges and nodes in this graph structure are then regularly updated with learning techniques as data is continuously digested. Experiments demonstrate the use of dense and heterogeneous real-world data from a variety of personal and environmental sensors to monitor individual cardiovascular health state. State estimation and individual modeling is the fundamental basis to depart from disease-oriented approaches to a total health continuum paradigm. Precision in predicting health requires understanding state trajectory. By encasing this estimation within a navigational approach, a systematic guidance framework can plan actions to transition a current state towards a desired one. This work concludes by presenting this framework of combining the health state and personal graph model to perpetually plan and assist us in living life towards our goals.


Decentralized Poisson Multi-Bernoulli Filtering for Vehicle Tracking

arXiv.org Artificial Intelligence

A decentralized Poisson multi-Bernoulli filter is proposed to track multiple vehicles using multiple high-resolution sensors. Independent filters estimate the vehicles' presence, state, and shape using a Gaussian process extent model; a decentralized filter is realized through fusion of the filters posterior densities. An efficient implementation is achieved by parametric state representation, utilization of single hypothesis tracks, and fusion of vehicle information based on a fusion mapping. Numerical results demonstrate the performance.


10 Popular Python-Based ETL Tools To Learn - Analytics India Magazine

#artificialintelligence

ETL stands for Extract Transform Load, which is a crucial procedure in the process of data preparation. With the help of ETL, one can easily access data from various interfaces. This means it can collect and migrate data from various data structures across various platforms. In this article, we list down 10 Python-Based top ETL tools. Apache Airflow is a Python-based workflow automation tool, which can be used to author workflows as Directed Acyclic Graphs (DAGs) of tasks.


Six Leading Trends in Big Data

#artificialintelligence

Fremont, CA: The digital economy today is powered by big data. Generated in abundance by both individuals and enterprises, these data is stored in large data centers and some of which cover hundreds of thousands of square feet. Technology vendors are implementing pre-enriched machine-readable data, specific to given industries to speed time-to-market for custom-built AI tools. These kits are intended to help data scientists and AI engineers and include the data necessary to speed up the creation of AI models. Big data vendors had to take up the issue of data governance, security, and management, taking a back seat to accessibility and speed.


Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise

Neural Information Processing Systems

Modern machine learning-based approaches to computer vision require very large databases of labeled images. Some contemporary vision systems already require on the order of millions of images for training (e.g., Omron face detector). While the collection of these large databases is becoming a bottleneck, new Internet-based services that allow labelers from around the world to be easily hired and managed provide a promising solution. However, using these services to label large databases brings with it new theoretical and practical challenges: (1) The labelers may have wide ranging levels of expertise which are unknown a priori, and in some cases may be adversarial; (2) images may vary in their level of difficulty; and (3) multiple labels for the same image must be combined to provide an estimate of the actual label of the image. Probabilistic approaches provide a principled way to approach these problems. In this paper we present a probabilistic model and use it to simultaneously infer the label of each image, the expertise of each labeler, and the difficulty of each image.