Goto

Collaborating Authors

 Directed Networks


Robust Variational Autoencoders for Outlier Detection in Mixed-Type Data

arXiv.org Machine Learning

We focus on the problem of unsupervised cell outlier detection in mixed type tabular datasets. Traditional methods for outlier detection are concerned only on detecting which rows in the dataset are outliers. However, identifying which cells in the dataset corrupt a specific row is an important problem in practice, especially in high-dimensional tables. We introduce the Robust Variational Autoencoder (RVAE), a deep generative model that learns the joint distribution of the clean data while identifying the outlier cells in the dataset. RVAE learns the probability of each cell in the dataset being an outlier, balancing the contributions of the different likelihood models in the row outlier score, making the method suitable for outlier detection in mixed type datasets. We show experimentally that the RVAE performs better than several state of the art methods in cell outlier detection for tabular datasets, while providing comparable or better results for row outlier detection.


A Causal Bayesian Networks Viewpoint on Fairness

arXiv.org Machine Learning

We offer a graphical interpretation of unfairness in a dataset as the presence of an unfair causal path in the causal Bayesian network representing the data-generation mechanism. We use this viewpoint to revisit the recent debate surrounding the COMPAS pretrial risk assessment tool and, more generally, to point out that fairness evaluation on a model requires careful considerations on the patterns of unfairness underlying the training data. We show that causal Bayesian networks provide us with a powerful tool to measure unfairness in a dataset and to design fair models in complex unfairness scenarios.


Sequential online prediction in the presence of outliers and change points: an instant temporal structure learning approach

arXiv.org Machine Learning

In this paper, we consider sequential online prediction (SOP) for streaming data in the presence of outliers and change points. We propose an INstant TEmporal structure Learning (INTEL) algorithm to address this problem.Our INTEL algorithm is developed based on a full consideration to the duality between online prediction and anomaly detection. We first employ a mixture of weighted GP models (WGPs) to cover the expected possible temporal structures of the data. Then, on the basis of the rich modeling capacity of this WGP mixture, we develop an efficient technique to instantly learn (capture) the temporal structure of the data that follows a regime shift. This instant learning is achieved only by adjusting one hyper-parameter value of the mixture model. A weighted generalization of the product of experts (POE) model is used for fusing predictions yielded from multiple GP models. An outlier is declared once a real observation seriously deviates from the fused prediction. If a certain number of outliers are consecutively declared, then a change point is declared. Extensive experiments are performed using a diverse of real datasets. Results show that the proposed algorithm is significantly better than benchmark methods for SOP in the presence of outliers and change points.


Reflection on modern methods: when worlds collide--prediction, machine learning and causal inference

#artificialintelligence

Causal inference requires theory and prior knowledge to structure analyses, and is not usually thought of as an arena for the application of prediction modelling. However, contemporary causal inference methods, premised on counterfactual or potential outcomes approaches, often include processing steps before the final estimation step. The purposes of this paper are: (i) to overview the recent emergence of prediction underpinning steps in contemporary causal inference methods as a useful perspective on contemporary causal inference methods, and (ii) explore the role of machine learning (as one approach to'best prediction') in causal inference. Causal inference methods covered include propensity scores, inverse probability of treatment weights (IPTWs), G computation and targeted maximum likelihood estimation (TMLE). Machine learning has been used more for propensity scores and TMLE, and there is potential for increased use in G computation and estimation of IPTWs.


Estimation and Feature Selection in Mixtures of Generalized Linear Experts Models

arXiv.org Machine Learning

Mixtures-of-Experts (MoE) are conditional mixture models that have shown their performance in modeling heterogeneity in data in many statistical learning approaches for prediction, including regression and classification, as well as for clustering. Their estimation in high-dimensional problems is still however challenging. We consider the problem of parameter estimation and feature selection in MoE models with different generalized linear experts models, and propose a regularized maximum likelihood estimation that efficiently encourages sparse solutions for heterogeneous data with high-dimensional predictors. The developed proximal-Newton EM algorithm includes proximal Newton-type procedures to update the model parameter by monotonically maximizing the objective function and allows to perform efficient estimation and feature selection. An experimental study shows the good performance of the algorithms in terms of recovering the actual sparse solutions, parameter estimation, and clustering of heterogeneous regression data, compared to the main state-of-the art competitors.


Learning Neural Networks with Adaptive Regularization

arXiv.org Machine Learning

Although deep neural networks have been widely applied in various domains [19, 25, 27], usually its parameters are learned via the principle of maximum likelihood, hence its success crucially hinges on the availability of large scale datasets. When training rich models on small datasets, explicit regularization techniques are crucial to alleviate overfitting. Previous works have explored various regularization [39] and data augmentation [19, 38] techniques to learn diversified representations. In this paper, we look into an alternative direction by proposing an adaptive and data-dependent regularization method to encourage neurons of the same layer to share statistical strength. The goal of our method is to prevent overfitting when training (large) networks on small dataset. Our key insight stems from the famous argument by Efron [8] in the literature of the empirical Bayes method: It is beneficial to learn from the experience of others. From an algorithmic perspective, we argue that the connection weights of neurons in the same layer (row/column vectors of the weight matrix) will be correlated with each other through the backpropagation learning. Hence, by learning the correlations of the weight matrix, a neuron can "borrow statistical strength" from other neurons in the same layer.


Bayesian Synthesis of Probabilistic Programs for Automatic Data Modeling

arXiv.org Artificial Intelligence

We present new techniques for automatically constructing probabilistic programs for data analysis, interpretation, and prediction. These techniques work with probabilistic domain-specific data modeling languages that capture key properties of a broad class of data generating processes, using Bayesian inference to synthesize probabilistic programs in these modeling languages given observed data. We provide a precise formulation of Bayesian synthesis for automatic data modeling that identifies sufficient conditions for the resulting synthesis procedure to be sound. We also derive a general class of synthesis algorithms for domain-specific languages specified by probabilistic context-free grammars and establish the soundness of our approach for these languages. We apply the techniques to automatically synthesize probabilistic programs for time series data and multivariate tabular data. We show how to analyze the structure of the synthesized programs to compute, for key qualitative properties of interest, the probability that the underlying data generating process exhibits each of these properties. Second, we translate probabilistic programs in the domain-specific language into probabilistic programs in Venture, a general-purpose probabilistic programming system. The translated Venture programs are then executed to obtain predictions of new time series data and new multivariate data records. Experimental results show that our techniques can accurately infer qualitative structure in multiple real-world data sets and outperform standard data analysis methods in forecasting and predicting new data.


The Use of Gaussian Processes in System Identification

arXiv.org Machine Learning

Gaussian processes are used in machine learning to learn input-output mappings from observed data. Gaussian process regression is based on imposing a Gaussian process prior on the unknown regressor function and statistically conditioning it on the observed data. In system identification, Gaussian processes are used to form time series prediction models such as non-linear finite-impulse response (NFIR) models as well as non-linear autoregressive (NARX) models. Gaussian process state-space models (GPSS) can be used to learn the dynamic and measurement models for a state-space representation of the input-output data. Temporal and spatio-temporal Gaussian processes can be directly used to form regressor on the data in the time domain. The aim of this article is to briefly outline the main directions in system identification methods using Gaussian processes.


Learning an Urban Air Mobility Encounter Model from Expert Preferences

arXiv.org Artificial Intelligence

Airspace models have played an important role in the development and evaluation of aircraft collision avoidance systems for both manned and unmanned aircraft. As Urban Air Mobility (UAM) systems are being developed, we need new encounter models that are representative of their operational environment. Developing such models is challenging due to the lack of data on UAM behavior in the airspace. While previous encounter models for other aircraft types rely on large datasets to produce realistic trajectories, this paper presents an approach to encounter modeling that instead relies on expert knowledge. In particular, recent advances in preference-based learning are extended to tune an encounter model from expert preferences. The model takes the form of a stochastic policy for a Markov decision process (MDP) in which the reward function is learned from pairwise queries of a domain expert. We evaluate the performance of two querying methods that seek to maximize the information obtained from each query. Ultimately, we demonstrate a method for generating realistic encounter trajectories with only a few minutes of an expert's time.


New superomniphobic glass soars high on butterfly wings using machine learning: Engineers develop new superclear, supertransparent, stain-resistant, anti-fogging nanostructured glass based on butterfly wing

#artificialintelligence

The team recently published a paper detailing their findings: "Creating Glasswing-Butterfly Inspired Durable Antifogging Omniphobic Supertransmissive, Superclear Nanostructured Glass Through Bayesian Learning and Optimization" in Materials Horizons (doi:10.1039/C9MH00589G). They recently presented this work at the ICML conference in the "Climate Change: How Can AI Help?" workshop. The nanostructured glass has random nanostructures, like the glasswing butterfly wing, that are smaller than the wavelengths of visible light. This allows the glass to have a very high transparency of 99.5% when the random nanostructures are on both sides of the glass. This high transparency can reduce the brightness and power demands on displays that could, for example, extend battery life.