Block-Wise MAP Inference for Determinantal Point Processes with Application to Change-Point Detection

arXiv.org Machine Learning

Existing MAP inference algorithms for determinantal point processes (DPPs) need to calculate determinants or conduct eigenvalue decomposition generally at the scale of the full kernel, which presents a great challenge for real-world applications. In this paper, we introduce a class of DPPs, called BwDPPs, that are characterized by an almost block diagonal kernel matrix and thus can allow efficient block-wise MAP inference. Furthermore, BwDPPs are successfully applied to address the difficulty of selecting change-points in the problem of change-point detection (CPD), which results in a new BwDPP-based CPD method, named BwDppCpd. In BwDppCpd, a preliminary set of change-point candidates is first created based on existing well-studied metrics. Then, these change-point candidates are treated as DPP items, and DPP-based subset selection is conducted to give the final estimate of the change-points that favours both quality and diversity. The effectiveness of BwDppCpd is demonstrated through extensive experiments on five real-world datasets.


Stanford AI detects even the smallest earthquakes from seismic data

#artificialintelligence

Microearthquakes -- low-intensity earthquakes that register 2.0 or less magnitude on the moment magnitude scale -- rarely cause property damage. And as a result of background noise, small events, and false positives, they're not always picked up by seismic monitoring systems. A possible solution is described in a new paper from the Department of Geophysics at Stanford University, where scientists have developed an AI system -- dubbed Cnn-Rnn Earthquake Detector, or CRED -- that can isolate and identify a range of seismic signals from historical and continuous data. It builds on the work of Harvard and Google, which in August created an AI model capable of predicting the location of aftershocks up to one year after a major earthquake. The researchers' system consists of neural network layers -- interconnected processing nodes that loosely mimic the function of neurons in the brain -- of two types: convolutional neural networks and recurrent neural networks.


A novel multiclassSVM based framework to classify lithology from well logs: a real-world application

arXiv.org Machine Learning

Support vector machines (SVMs) have been recognized as a potential tool for supervised classification analyses in different domains of research. In essence, SVM is a binary classifier. Therefore, in case of a multiclass problem, the problem is divided into a series of binary problems which are solved by binary classifiers, and finally the classification results are combined following either the one-against-one or one-against-all strategies. In this paper, an attempt has been made to classify lithology using a multiclass SVM based framework using well logs as predictor variables. Here, the lithology is classified into four classes such as sand, shaly sand, sandy shale and shale based on the relative values of sand and shale fractions as suggested by an expert geologist. The available dataset consisting well logs (gamma ray, neutron porosity, density, and P-sonic) and class information from four closely spaced wells from an onshore hydrocarbon field is divided into training and testing sets. We have used one-against-all strategy to combine the results of multiple binary classifiers. The reported results established the superiority of multiclass SVM compared to other classifiers in terms of classification accuracy. The selection of kernel function and associated parameters has also been investigated here. It can be envisaged from the results achieved in this study that the proposed framework based on multiclass SVM can further be used to solve classification problems. In future research endeavor, seismic attributes can be introduced in the framework to classify the lithology throughout a study area from seismic inputs.


Cost-sensitive detection with variational autoencoders for environmental acoustic sensing

arXiv.org Machine Learning

Environmental acoustic sensing involves the retrieval and processing of audio signals to better understand our surroundings. While large-scale acoustic data make manual analysis infeasible, they provide a suitable playground for machine learning approaches. Most existing machine learning techniques developed for environmental acoustic sensing do not provide flexible control of the trade-off between the false positive rate and the false negative rate. This paper presents a cost-sensitive classification paradigm, in which the hyper-parameters of classifiers and the structure of variational autoencoders are selected in a principled Neyman-Pearson framework. We examine the performance of the proposed approach using a dataset from the HumBug project which aims to detect the presence of mosquitoes using sound collected by simple embedded devices.


On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests

arXiv.org Machine Learning

Nonparametric two sample or homogeneity testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. The literature is old and rich, with a wide variety of statistics having being intelligently designed and analyzed, both for the unidimensional and the multivariate setting. Our contribution is to tie together many of these tests, drawing connections between seemingly very different statistics. In this work, our central object is the Wasserstein distance, as we form a chain of connections from univariate methods like the Kolmogorov-Smirnov test, PP/QQ plots and ROC/ODC curves, to multivariate tests involving energy statistics and kernel based maximum mean discrepancy. Some connections proceed through the construction of a \textit{smoothed} Wasserstein distance, and others through the pursuit of a "distribution-free" Wasserstein test. Some observations in this chain are implicit in the literature, while others seem to have not been noticed thus far. Given nonparametric two sample testing's classical and continued importance, we aim to provide useful connections for theorists and practitioners familiar with one subset of methods but not others.