Goto

Collaborating Authors

 Information Fusion


Top 10 Data Warehouse Automation Tools Of 2021

#artificialintelligence

With Data warehouse automation, one can achieve near-term automation of the full lifecycle of a data warehouse, starting from source code analysis to comprehensive documentation to operationalizing the warehouse. Data warehouse automation is an excellent way of cutting down costs and a way to boost the bottom-line margins. Thus, having the right data warehouse automation tools in place makes it easier for companies to achieve their objectives. On that note, here are the top 10 data warehouse automation tools of 2021. Teradata, headquartered in Ohio, is an internationally renowned company excelling in the field of database services and products. Teradata DWH is widely used for insights, analytics & decision making.


Asynchronous Collaborative Localization by Integrating Spatiotemporal Graph Learning with Model-Based Estimation

arXiv.org Artificial Intelligence

Collaborative localization is an essential capability for a team of robots such as connected vehicles to collaboratively estimate object locations from multiple perspectives with reliant cooperation. To enable collaborative localization, four key challenges must be addressed, including modeling complex relationships between observed objects, fusing observations from an arbitrary number of collaborating robots, quantifying localization uncertainty, and addressing latency of robot communications. In this paper, we introduce a novel approach that integrates uncertainty-aware spatiotemporal graph learning and model-based state estimation for a team of robots to collaboratively localize objects. Specifically, we introduce a new uncertainty-aware graph learning model that learns spatiotemporal graphs to represent historical motions of the objects observed by each robot over time and provides uncertainties in object localization. Moreover, we propose a novel method for integrated learning and model-based state estimation, which fuses asynchronous observations obtained from an arbitrary number of robots for collaborative localization. We evaluate our approach in two collaborative object localization scenarios in simulations and on real robots. Experimental results show that our approach outperforms previous methods and achieves state-of-the-art performance on asynchronous collaborative localization.


The Powerful Use of AI in the Energy Sector: Intelligent Forecasting

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) techniques continue to broaden across governmental and public sectors, such as power and energy - which serve as critical infrastructures for most societal operations. However, due to the requirements of reliability, accountability, and explainability, it is risky to directly apply AI-based methods to power systems because society cannot afford cascading failures and large-scale blackouts, which easily cost billions of dollars. To meet society requirements, this paper proposes a methodology to develop, deploy, and evaluate AI systems in the energy sector by: (1) understanding the power system measurements with physics, (2) designing AI algorithms to forecast the need, (3) developing robust and accountable AI methods, and (4) creating reliable measures to evaluate the performance of the AI model. The goal is to provide a high level of confidence to energy utility users. For illustration purposes, the paper uses power system event forecasting (PEF) as an example, which carefully analyzes synchrophasor patterns measured by the Phasor Measurement Units (PMUs). Such a physical understanding leads to a data-driven framework that reduces the dimensionality with physics and forecasts the event with high credibility. Specifically, for dimensionality reduction, machine learning arranges physical information from different dimensions, resulting inefficient information extraction. For event forecasting, the supervised learning model fuses the results of different models to increase the confidence. Finally, comprehensive experiments demonstrate the high accuracy, efficiency, and reliability as compared to other state-of-the-art machine learning methods.


Certifiable Artificial Intelligence Through Data Fusion

arXiv.org Artificial Intelligence

This paper reviews and proposes concerns in adopting, fielding, and maintaining artificial intelligence (AI) systems. While the AI community has made rapid progress, there are challenges in certifying AI systems. Using procedures from design and operational test and evaluation, there are opportunities towards determining performance bounds to manage expectations of intended use. A notional use case is presented with image data fusion to support AI object recognition certifiability considering precision versus distance.


ETL and ELT: A Guide and Market Analysis - KDnuggets

#artificialintelligence

ETL (Extract-Transform-Load) is the most widespread approach to data integration, the practice of consolidating data from disparate source systems with the aim of improving access to data. The story is still the same: businesses have a sea of data at disposition, and making sense of this data fuels business performance. ETL plays a central role in this quest: it is the process of turning raw, messy data into clean, fresh, and reliable data from which business insights can be derived. This article seeks to bring clarity on how this process is conducted, how ETL tools have evolved, and the best tools available for your organization today. Today, organizations collect data from multiple different business source systems: Cloud applications, CRM systems, files, etc.


Data Integration Guide

#artificialintelligence

This guide is a one-stop introduction to data integration. Learn how to make data-driven decisions a reality for your organization! According to the World Economic Forum, at the beginning of 2020, the number of bytes in the digital universe was 40 times bigger than the number of stars in the observable universe. With data volume and usages growing, the need for data Integration is becoming more and more central topic. Data Integration is mainly about exchanging data across multiple systems and tools.


How to Transform Your Data in Snowflake - KDnuggets

#artificialintelligence

Data transformation is integral to the analytics workflow and process. With analytics data coming from an ever growing array of disparate data sources, data transformation models the data to make it more understandable and consumable by the analytics and business teams. However, data transformation is the biggest bottleneck in the analytics workflow. According to IDC, analytics teams only spend 45% of their time performing analysis, with the remaining time spent searching for and preparing data. Additionally, a survey by TDWI cites a "lack of skilled personnel to model data" (36% of respondents) as the top challenge in cloud data integration.


Mechanistic Interpretation of Machine Learning Inference: A Fuzzy Feature Importance Fusion Approach

arXiv.org Artificial Intelligence

With the widespread use of machine learning to support decision-making, it is increasingly important to verify and understand the reasons why a particular output is produced. Although post-training feature importance approaches assist this interpretation, there is an overall lack of consensus regarding how feature importance should be quantified, making explanations of model predictions unreliable. In addition, many of these explanations depend on the specific machine learning approach employed and on the subset of data used when calculating feature importance. A possible solution to improve the reliability of explanations is to combine results from multiple feature importance quantifiers from different machine learning approaches coupled with re-sampling. Current state-of-the-art ensemble feature importance fusion uses crisp techniques to fuse results from different approaches. There is, however, significant loss of information as these approaches are not context-aware and reduce several quantifiers to a single crisp output. More importantly, their representation of 'importance' as coefficients is misleading and incomprehensible to end-users and decision makers. Here we show how the use of fuzzy data fusion methods can overcome some of the important limitations of crisp fusion methods.


Senior Product Manager, Data Science & Analytics

#artificialintelligence

Open Junior Data Analyst Jobs Open Machine Learning Scientist Jobs Open Sr. Machine Learning Engineer Jobs Open Data Architect Jobs Open Data Analytics Manager Jobs Open Data Engineer - Toronto Hub Jobs Open Data Scientist, Machine Learning Jobs Open Data Science Manager Jobs Open Manager, Data Engineering Jobs Open Head of Data Science Jobs Open Senior Software Engineer, Machine Learning Jobs Open Applied Data Scientist - B2B Sales Incrementality Jobs Open Data Science Intern Jobs Open Data Analyst II Jobs Open Data Analyst Intern Jobs Open Lead Data Analyst Jobs Open Data Scientist (Remote) Jobs Open Business Data Analyst Jobs Open Data Engineer: Data Integration Jobs Open Senior Data Analyst (Bangkok Based, relocation provided) Jobs Open Senior Software Engineer - Machine Learning - Toronto Hub Jobs Open Data Engineer - New York Hub Jobs Open Financial Data Analyst Jobs Open Sr. Data Analyst Jobs Open Staff Data Scientist Jobs


Cross-Domain Data Integration for Named Entity Disambiguation in Biomedical Text

arXiv.org Artificial Intelligence

Named entity disambiguation (NED), which involves mapping textual mentions to structured entities, is particularly challenging in the medical domain due to the presence of rare entities. Existing approaches are limited by the presence of coarse-grained structural resources in biomedical knowledge bases as well as the use of training datasets that provide low coverage over uncommon resources. In this work, we address these issues by proposing a cross-domain data integration method that transfers structural knowledge from a general text knowledge base to the medical domain. We utilize our integration scheme to augment structural resources and generate a large biomedical NED dataset for pretraining. Our pretrained model with injected structural knowledge achieves state-of-the-art performance on two benchmark medical NED datasets: MedMentions and BC5CDR. Furthermore, we improve disambiguation of rare entities by up to 57 accuracy points.