Goto

Collaborating Authors

 Information Fusion


100%OFF

#artificialintelligence

DBT data build tool helps data teams work like software engineers, transform data and control the flow to ship trusted data, faster. It means that we first load the data as is to the target and then use SQL (DBT data build tool) to transform it. DBT data build tool will materialize your SQL selects into table views and manage the flow of executing the SQL. ETL developers, DBA, BI developers, decision-makers that consider DBT, SQL programmers, data analysts, data engineers.


Top Stories, Dec 20 - Jan 2: 3 Tools to Track and Visualize the Execution of Your Python Code - KDnuggets

#artificialintelligence

Also: 6 Predictive Models Every Beginner Data Scientist Should Master; The Best ETL Tools in 2021; Write Clean Python Code Using Pipes; Three R Libraries Every Data Scientist Should Know (Even if You Use Python)


Reliability Estimation of an Advanced Nuclear Fuel using Coupled Active Learning, Multifidelity Modeling, and Subset Simulation

arXiv.org Machine Learning

Tristructural isotropic (TRISO)-coated particle fuel is a robust nuclear fuel and determining its reliability is critical for the success of advanced nuclear technologies. However, TRISO failure probabilities are small and the associated computational models are expensive. We used coupled active learning, multifidelity modeling, and subset simulation to estimate the failure probabilities of TRISO fuels using several 1D and 2D models. With multifidelity modeling, we replaced expensive high-fidelity (HF) model evaluations with information fusion from two low-fidelity (LF) models. For the 1D TRISO models, we considered three multifidelity modeling strategies: only Kriging, Kriging LF prediction plus Kriging correction, and deep neural network (DNN) LF prediction plus Kriging correction. While the results across these multifidelity modeling strategies compared satisfactorily, strategies employing information fusion from two LF models consistently called the HF model least often. Next, for the 2D TRISO model, we considered two multifidelity modeling strategies: DNN LF prediction plus Kriging correction (data-driven) and 1D TRISO LF prediction plus Kriging correction (physics-based). The physics-based strategy, as expected, consistently required the fewest calls to the HF model. However, the data-driven strategy had a lower overall simulation time since the DNN predictions are instantaneous, and the 1D TRISO model requires a non-negligible simulation time.


Supervised Homogeneity Fusion: a Combinatorial Approach

arXiv.org Machine Learning

Fusing regression coefficients into homogenous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called $L_0$-Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called grouping sensitivity that underpins the difficulty of recovering the true groups. We show that $L_0$-Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply $L_0$-Fusion coupled with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide a MIO formulation for $L_0$-Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that $L_0$-Fusion exhibits superiority over its competitors in terms of grouping accuracy.


Informed Multi-context Entity Alignment

arXiv.org Artificial Intelligence

Entity alignment is a crucial step in integrating knowledge graphs (KGs) from multiple sources. Previous attempts at entity alignment have explored different KG structures, such as neighborhood-based and path-based contexts, to learn entity embeddings, but they are limited in capturing the multi-context features. Moreover, most approaches directly utilize the embedding similarity to determine entity alignment without considering the global interaction among entities and relations. In this work, we propose an Informed Multi-context Entity Alignment (IMEA) model to address these issues. In particular, we introduce Transformer to flexibly capture the relation, path, and neighborhood contexts, and design holistic reasoning to estimate alignment probabilities based on both embedding similarity and the relation/entity functionality. The alignment evidence obtained from holistic reasoning is further injected back into the Transformer via the proposed soft label editing to inform embedding learning. Experimental results on several benchmark datasets demonstrate the superiority of our IMEA model compared with existing state-of-the-art entity alignment methods.


Applying data science in the life insurance industry -- a perspective from a qualified actuary

#artificialintelligence

To summarise, this use case presents a way for actuaries to automatically classify free-text claims causes data into pre-defined categories for further analyses. Ultimately, with the help of BERT, computers are able to understand human language. For this instance, computers are able to understand and compare medical terms or description of a claims event, which can be messy at times. The alternative which is manual filtering in Excel is not practical, especially for large number of claims. As mentioned previously, Excel has been the primary ETL tool for most life insurance actuaries.


Confidence-Aware Multi-Teacher Knowledge Distillation

arXiv.org Artificial Intelligence

Knowledge distillation is initially introduced to utilize additional supervision from a single teacher model for the student model training. To boost the student performance, some recent variants attempt to exploit diverse knowledge sources from multiple teachers. However, existing studies mainly integrate knowledge from diverse sources by averaging over multiple teacher predictions or combining them using other various label-free strategies, which may mislead student in the presence of low-quality teacher predictions. To tackle this problem, we propose Confidence-Aware Multi-teacher Knowledge Distillation (CA-MKD), which adaptively assigns sample-wise reliability for each teacher prediction with the help of ground-truth labels, with those teacher predictions close to one-hot labels assigned large weights. Besides, CA-MKD incorporates intermediate layers to further improve student performance. Extensive experiments show that our CA-MKD consistently outperforms all compared state-of-the-art methods across various teacher-student architectures.


Res2NetFuse: A Fusion Method for Infrared and Visible Images

arXiv.org Artificial Intelligence

This paper presents a novel Res2Net-based fusion framework for infrared and visible images. The proposed fusion model has three parts: an encoder, a fusion layer and a decoder, respectively. The Res2Net-based encoder is used to extract multi-scale features of source images, the paper introducing a new training strategy for training a Res2Net-based encoder that uses only a single image. Then, a new fusion strategy is developed based on the attention model. Finally, the fused image is reconstructed by the decoder. The proposed approach is also analyzed in detail. Experiments show that our method achieves state-of-the-art fusion performance in objective and subjective assessment by comparing with the existing methods.


Cloud turns data transformation on its head

#artificialintelligence

The traditional data transformation procedure of extract, transform and load (ETL) is rapidly being turned on its head in a modern twist enabled by cloud technologies. The Cloud's lower costs, its flexibility and scalability, and the huge processing capability of cloud data warehouses, have driven a major change: the ability to load all data into the cloud, before transforming it. This trend means that ETL itself has been transformed--into extract, load and transform, or ELT. ELT offers several advantages, including retention of data granularity, reduced need for expensive software engineers and significantly reduced project turnaround times. Data is vital for organizations, who use it to understand their customers, identify new opportunities and support decision-makers with mission-critical and up-to-date information.


Talend: Healthy Data, Healthy Business - Modern Cloud ETL

#artificialintelligence

Talend Data Fabric offers a single suite of cloud apps for data integration and data integrity to help enterprises collect, govern, transform, and share data.