Performance Analysis
Failures detection at directional drilling using real-time analogues search
Gurina, Ekaterina, Klyuchnikov, Nikita, Zaytsev, Alexey, Romanenkova, Evgenya, Antipova, Ksenia, Simon, Igor, Makarov, Victor, Koroteev, Dmitry
One of the main challenges in the construction of oil and gas wells is the need to detect and avoid abnormal situations, which can lead to accidents. Accidents have some indicators that help to find them during the drilling process. In this article, we present a data-driven model trained on historical data from drilling accidents that can detect different types of accidents using real-time signals. The results show that using the time-series comparison, based on aggregated statistics and gradient boosting classification, it is possible to detect an anomaly and identify its type by comparing current measurements while drilling with the stored ones from the database of accidents.
Globally-Aware Multiple Instance Classifier for Breast Cancer Screening
Shen, Yiqiu, Wu, Nan, Phang, Jason, Park, Jungkyu, Kim, Gene, Moy, Linda, Cho, Kyunghyun, Geras, Krzysztof J.
Deep learning models designed for visual classification tasks on natural images have become prevalent in medical image analysis. However, medical images differ from typical natural images in many ways, such as significantly higher resolutions and smaller regions of interest. Moreover, both the global structure and local details play important roles in medical image analysis tasks. To address these unique properties of medical images, we propose a neural network that is able to classify breast cancer lesions utilizing information from both a global saliency map and multiple local patches. The proposed model outperforms the ResNet-based baseline and achieves radiologist-level performance in the interpretation of screening mammography. Although our model is trained only with image-level labels, it is able to generate pixel-level saliency maps that provide localization of possible malignant findings.
Likelihood Ratios for Out-of-Distribution Detection
Ren, Jie, Liu, Peter J., Fertig, Emily, Snoek, Jasper, Poplin, Ryan, DePristo, Mark A., Dillon, Joshua V., Lakshminarayanan, Balaji
Discriminative neural networks offer little or no performance guarantees when deployed on data not generated by the same process as the training distribution. On such out-of-distribution (OOD) inputs, the prediction may not only be erroneous, but confidently so, limiting the safe deployment of classifiers in real-world applications. One such challenging application is bacteria identification based on genomic sequences, which holds the promise of early detection of diseases, but requires a model that can output low confidence predictions on OOD genomic sequences from new bacteria that were not present in the training data. We introduce a genomics dataset for OOD detection that allows other researchers to benchmark progress on this important problem. We investigate deep generative model based approaches for OOD detection and observe that the likelihood score is heavily affected by population level background statistics. We propose a likelihood ratio method for deep generative models which effectively corrects for these confounding background statistics. We benchmark the OOD detection performance of the proposed method against existing approaches on the genomics dataset and show that our method achieves state-of-the-art performance. We demonstrate the generality of the proposed method by showing that it significantly improves OOD detection when applied to deep generative models of images.
A General $\mathcal{O}(n^2)$ Hyper-Parameter Optimization for Gaussian Process Regression with Cross-Validation and Non-linearly Constrained ADMM
Xu, Linning, Yin, Feng, Zhang, Jiawei, Luo, Zhi-Quan, Cui, Shuguang
Hyper-parameter optimization remains as the core issue of Gaussian process (GP) for machine learning nowadays. The benchmark method using maximum likelihood (ML) estimation and gradient descent (GD) is impractical for processing big data due to its $O(n^3)$ complexity. Many sophisticated global or local approximation models, for instance, sparse GP, distributed GP, have been proposed to address such complexity issue. In this paper, we propose two novel and general-purpose GP hyper-parameter training schemes (GPCV-ADMM) by replacing ML with cross-validation (CV) as the fitting criterion and replacing GD with a non-linearly constrained alternating direction method of multipliers (ADMM) as the optimization method. The proposed schemes are of $O(n^2)$ complexity for any covariance matrix without special structure. We conduct various experiments based on both synthetic and real data sets, wherein the proposed schemes show excellent performance in terms of convergence, hyper-parameter estimation accuracy, and computational time in comparison with the traditional ML based routines given in the GPML toolbox.
Class-Conditional Compression and Disentanglement: Bridging the Gap between Neural Networks and Naive Bayes Classifiers
Amjad, Rana Ali, Geiger, Bernhard C.
In this draft, which reports on work in progress, we 1) adapt the information bottleneck functional by replacing the compression term by class-conditional compression, 2) relax this functional using a variational bound related to class-conditional disentanglement, 3) consider this functional as a training objective for stochastic neural networks, and 4) show that the latent representations are learned such that they can be used in a naive Bayes classifier. We continue by suggesting a series of experiments along the lines of Nonlinear Information Bottleneck [Kolchinsky et al., 2018], Deep Variational Information Bottleneck [Alemi et al., 2017], and Information Dropout [Achille and Soatto, 2018]. We furthermore suggest a neural network where the decoder architecture is a parameterized naive Bayes decoder. We consider a classification task with a feature random variable (RV) X on R and a class RV Y on the finite set Y of classes. We further consider stochastic feed-forward neural networks (NNs).
A systematic framework for natural perturbations from videos
Shankar, Vaishaal, Dave, Achal, Roelofs, Rebecca, Ramanan, Deva, Recht, Benjamin, Schmidt, Ludwig
We introduce a systematic framework for quantifying the robustness of classifiers to naturally occurring perturbations of images found in videos. As part of this framework, we construct Imagenet-Video-Robust, a human-expert--reviewed dataset of 22,178 images grouped into 1,109 sets of perceptually similar images derived from frames in the ImageNet Video Object Detection dataset. We evaluate a diverse array of classifiers trained on ImageNet, including models trained for robustness, and show a median classification accuracy drop of 16%. Additionally, we evaluate the Faster R-CNN and R-FCN models for detection, and show that natural perturbations induce both classification as well as localization errors, leading to a median drop in detection mAP of 14 points. Our analysis shows that natural perturbations in the real world are heavily problematic for current CNNs, posing a significant challenge to their deployment in safety-critical environments that require reliable, low-latency predictions.
CCMI : Classifier based Conditional Mutual Information Estimation
Mukherjee, Sudipto, Asnani, Himanshu, Kannan, Sreeram
Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection and time-series analysis. While k-nearest neighbor (kNN) based estimators as well as kernel-based methods have been widely used for CMI estimation, they suffer severely from the curse of dimensionality. In this paper, we leverage advances in classifiers and generative models to design methods for CMI estimation. Specifically, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator. Finally, as an application of accurate CMI estimation, we use our best estimator for conditional independence testing and achieve superior performance than the state-of-the-art tester on both simulated and real data-sets.
ASP-based Discovery of Semi-Markovian Causal Models under Weaker Assumptions
Zhalama, null, Zhang, Jiji, Eberhardt, Frederick, Mayer, Wolfgang, Li, Mark Junjie
In recent years the possibility of relaxing the so-called Faithfulness assumption in automated causal discovery has been investigated. The investigation showed (1) that the Faithfulness assumption can be weakened in various ways that in an important sense preserve its power, and (2) that weakening of Faithfulness may help to speed up methods based on Answer Set Programming. However, this line of work has so far only considered the discovery of causal models without latent variables. In this paper, we study weakenings of Faithfulness for constraint-based discovery of semi-Markovian causal models, which accommodate the possibility of latent variables, and show that both (1) and (2) remain the case in this more realistic setting.
Glossary
The intent of this glossary is to provide clear definitions of the technical terms specific to deep artificial neural networks. It is a work in progress. An activation, or activation function, for a neural network is defined as the mapping of the input to the output via a non-linear transform function at each "node", which is simply a locus of computation within the net. Each layer in a neural net consists of many nodes, and the number of nodes in a layer is known as its width. Activation algorithms are the gates that determine, at each node in the net, whether and to what extent to transmit the signal the node has received from the previous layer. A combination of weights (coefficients) and biases work on the input data from the previous layer to determine whether that signal surpasses a given treshhold and is deemed significant. Those weights and biases are slowly updated as the neural net minimizes its error; i.e. the level of nodes' activation change in the course of learning. These activation functions allow neural networks to make complex boundary decisions for features at various levels of abstraction. Adadelta is an updater, or learning algorithm, related to gradient descent. Unlike SGD, which applies the same learning rate to all parameters of the network, Adadelta adapts the learning rate per parameter. Adagrad, short for adaptive gradient, is an updater or learning algorithm that adjust the learning rate for each parameter in the net by monitoring the squared gradients in the course of learning. It is a substitute for SGD, and can be useful when processing sparse data. Affine is a fancy word for a fully connected layer in a neural network. "Fully connected" means that all the nodes of one layer connect to all the nodes of the subsequent layer. A restricted Boltzmann machine, for example, is a fully connected layer. Convolutional networks use affine layers interspersed with both their namesake convolutional layers (which create feature maps based on convolutions) and downsampling layers, which throw out a lot of data and only keep the maximum value. "Affine" derives from the Latin affinis, which means bordering or connected with. Each connection, in an affine layer, is a passage whereby input is multiplied by a weight and added to a bias before it accumulates with all other inputs at a given node, the sum of which is then passed through an activation function: e.g.
Assessing Disparate Impacts of Personalized Interventions: Identifiability and Bounds
Personalized interventions in social services, education, and healthcare leverage individual-level causal effect predictions in order to give the best treatment to each individual or to prioritize program interventions for the individuals most likely to benefit. While the sensitivity of these domains compels us to evaluate the fairness of such policies, we show that actually auditing their disparate impacts per standard observational metrics, such as true positive rates, is impossible since ground truths are unknown. Whether our data is experimental or observational, an individual's actual outcome under an intervention different than that received can never be known, only predicted based on features. We prove how we can nonetheless point-identify these quantities under the additional assumption of monotone treatment response, which may be reasonable in many applications. We further provide a sensitivity analysis for this assumption by means of sharp partial-identification bounds under violations of monotonicity of varying strengths. We show how to use our results to audit personalized interventions using partially-identified ROC and xROC curves and demonstrate this in a case study of a French job training dataset.