Goto

Collaborating Authors

 Country


Sparse Pseudo-input Local Kriging for Large Non-stationary Spatial Datasets with Exogenous Variables

arXiv.org Machine Learning

Gaussian process (GP) regression is a powerful tool for building predictive models for spatial systems. However, it does not scale efficiently for large datasets. Particularly, for high-dimensional spatial datasets, i.e., spatial datasets that contain exogenous variables, the performance of GP regression further deteriorates. This paper presents the Sparse Pseudo-input Local Kriging (SPLK) which approximates the full GP for spatial datasets with exogenous variables. SPLK employs orthogonal cuts which decompose the domain into smaller subdomains and then applies a sparse approximation of the full GP in each subdomain. We obtain the continuity of the global predictor by imposing continuity constraints on the boundaries of the neighboring subdomains. The domain decomposition scheme applies independent covariance structures in each region, and as a result, SPLK captures heterogeneous covariance structures. SPLK achieves computational efficiency by utilizing sparse approximation in each subdomain which enables SPLK to accommodate large subdomains that contain many data points and possess a homogenous covariance structure. We Apply the proposed method to real and simulated datasets. We conclude that the combination of orthogonal cuts and sparse approximation makes the proposed method an efficient algorithm for high-dimensional large spatial datasets.


Direct Estimation of the Derivative of Quadratic Mutual Information with Application in Supervised Dimension Reduction

arXiv.org Machine Learning

A typical goal of supervised dimension reduction is to find a low-dimensional subspace of the input space such that the projected input variables preserve maximal information about the output variables. The dependence maximization approach solves the supervised dimension reduction problem through maximizing a statistical dependence between projected input variables and output variables. A well-known statistical dependence measure is mutual information (MI) which is based on the Kullback-Leibler (KL) divergence. However, it is known that the KL divergence is sensitive to outliers. On the other hand, quadratic MI (QMI) is a variant of MI based on the $L_2$ distance which is more robust against outliers than the KL divergence, and a computationally efficient method to estimate QMI from data, called least-squares QMI (LSQMI), has been proposed recently. For these reasons, developing a supervised dimension reduction method based on LSQMI seems promising. However, not QMI itself, but the derivative of QMI is needed for subspace search in supervised dimension reduction, and the derivative of an accurate QMI estimator is not necessarily a good estimator of the derivative of QMI. In this paper, we propose to directly estimate the derivative of QMI without estimating QMI itself. We show that the direct estimation of the derivative of QMI is more accurate than the derivative of the estimated QMI. Finally, we develop a supervised dimension reduction algorithm which efficiently uses the proposed derivative estimator, and demonstrate through experiments that the proposed method is more robust against outliers than existing methods.


A MAP approach for $\ell_q$-norm regularized sparse parameter estimation using the EM algorithm

arXiv.org Machine Learning

In this paper, Bayesian parameter estimation through the consideration of the Maximum A Posteriori (MAP) criterion is revisited under the prism of the Expectation-Maximization (EM) algorithm. By incorporating a sparsity-promoting penalty term in the cost function of the estimation problem through the use of an appropriate prior distribution, we show how the EM algorithm can be used to efficiently solve the corresponding optimization problem. To this end, we rely on variance-mean Gaussian mixtures (VMGM) to describe the prior distribution, while we incorporate many nice features of these mixtures to our estimation problem. The corresponding MAP estimation problem is completely expressed in terms of the EM algorithm, which allows for handling nonlinearities and hidden variables that cannot be easily handled with traditional methods. For comparison purposes, we also develop a Coordinate Descent algorithm for the $\ell_q$-norm penalized problem and present the performance results via simulations.


Non-isometric Curve to Surface Matching with Incomplete Data for Functional Calibration

arXiv.org Machine Learning

Calibration refers to the process of adjusting features of a computational model that are not observed in the physical process so that the model matches the real process. We propose a framework for calibration when the unobserved features, i.e. calibration parameters, do not assume a single value, but are functionally dependent on other inputs. We demonstrate that this problem is curve to surface matching where the matched curve does not possess the same length as the original curve. Therefore, we perform non-isometric matching of a curve to a surface. Since in practical applications we do not observe a continuous curve but a sample of data points, we use a graph-theoretic approach to solve this matching of incomplete data. We define a graph structure in which the nodes are selected from the incomplete surface and the weights of the edges are decided based on the response values of the curve and surface. We show that the problem of non-isometric incomplete curve to surface matching is a shortest path problem in a directed acyclic graph. We apply the proposed method, graph-theoretic non-isometric matching, to real and synthetic data and demonstrate that the proposed method improves the prediction accuracy in functional calibration.


Ontology Bulding vs Data Harvesting and Cleaning for Smart-city Services

arXiv.org Artificial Intelligence

Presently, a very large number of public and private data sets are available around the local governments. In most cases, they are not semantically interoperable and a huge human effort is needed to create integrated ontologies and knowledge base for smart city. Smart City ontology is not yet standardized, and a lot of research work is needed to identify models that can easily support the data reconciliation, the management of the complexity and reasoning. In this paper, a system for data ingestion and reconciliation smart cities related aspects as road graph, services available on the roads, traffic sensors etc., is proposed. The system allows managing a big volume of data coming from a variety of sources considering both static and dynamic data. These data are mapped to smart-city ontology and stored into an RDF-Store where they are available for applications via SPARQL queries to provide new services to the users. The paper presents the process adopted to produce the ontology and the knowledge base and the mechanisms adopted for the verification, reconciliation and validation. Some examples about the possible usage of the coherent knowledge base produced are also offered and are accessible from the RDF-Store.


A Deep-structured Conditional Random Field Model for Object Silhouette Tracking

arXiv.org Machine Learning

In this work, we introduce a deep-structured conditional random field (DS-CRF) model for the purpose of state-based object silhouette tracking. The proposed DS-CRF model consists of a series of state layers, where each state layer spatially characterizes the object silhouette at a particular point in time. The interactions between adjacent state layers are established by inter-layer connectivity dynamically determined based on inter-frame optical flow. By incorporate both spatial and temporal context in a dynamic fashion within such a deep-structured probabilistic graphical model, the proposed DS-CRF model allows us to develop a framework that can accurately and efficiently track object silhouettes that can change greatly over time, as well as under different situations such as occlusion and multiple targets within the scene. Experiment results using video surveillance datasets containing different scenarios such as occlusion and multiple targets showed that the proposed DS-CRF approach provides strong object silhouette tracking performance when compared to baseline methods such as mean-shift tracking, as well as state-of-the-art methods such as context tracking and boosted particle filtering.


Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing

arXiv.org Machine Learning

Nonparametric two sample testing is a decision theoretic problem that involves identifying differences between two random variables without making parametric assumptions about their underlying distributions. We refer to the most common settings as mean difference alternatives (MDA), for testing differences only in first moments, and general difference alternatives (GDA), which is about testing for any difference in distributions. A large number of test statistics have been proposed for both these settings. This paper connects three classes of statistics - high dimensional variants of Hotelling's t-test, statistics based on Reproducing Kernel Hilbert Spaces, and energy statistics based on pairwise distances. We ask the question: how much statistical power do popular kernel and distance based tests for GDA have when the unknown distributions differ in their means, compared to specialized tests for MDA? We formally characterize the power of popular tests for GDA like the Maximum Mean Discrepancy with the Gaussian kernel (gMMD) and bandwidth-dependent variants of the Energy Distance with the Euclidean norm (eED) in the high-dimensional MDA regime. Some practically important properties include (a) eED and gMMD have asymptotically equal power; furthermore they enjoy a free lunch because, while they are additionally consistent for GDA, they also have the same power as specialized high-dimensional t-test variants for MDA. All these tests are asymptotically optimal (including matching constants) under MDA for spherical covariances, according to simple lower bounds, (b) The power of gMMD is independent of the kernel bandwidth, as long as it is larger than the choice made by the median heuristic, (c) There is a clear and smooth computation-statistics tradeoff for linear-time, subquadratic-time and quadratic-time versions of these tests, with more computation resulting in higher power.


Asynchronous stochastic convex optimization

arXiv.org Machine Learning

We show that asymptotically, completely asynchronous stochastic gradient procedures achieve optimal (even to constant factors) convergence rates for the solution of convex optimization problems under nearly the same conditions required for asymptotic optimality of standard stochastic gradient procedures. Roughly, the noise inherent to the stochastic approximation scheme dominates any noise from asynchrony. We also give empirical evidence demonstrating the strong performance of asynchronous, parallel stochastic optimization schemes, demonstrating that the robustness inherent to stochastic approximation problems allows substantially faster parallel and asynchronous solution methods.


Sparse PCA via Bipartite Matchings

arXiv.org Machine Learning

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance. These components can be computed one by one, repeatedly solving the single-component problem and deflating the input data matrix, but as we show this greedy procedure is suboptimal. We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components. The extracted features capture variance that lies within a multiplicative factor arbitrarily close to 1 from the optimal. Our algorithm is combinatorial and computes the desired components by solving multiple instances of the bipartite maximum weight matching problem. Its complexity grows as a low order polynomial in the ambient dimension of the input data matrix, but exponentially in its rank. However, it can be effectively applied on a low-dimensional sketch of the data; this allows us to obtain polynomial-time approximation guarantees via spectral bounds. We evaluate our algorithm on real data-sets and empirically demonstrate that in many cases it outperforms existing, deflation-based approaches.


Appropriate Causal Models and the Stability of Causation

arXiv.org Artificial Intelligence

Causal models defined in terms of structural equations have proved to be quite a powerful way of representing knowledge regarding causality. However, a number of authors have given examples that seem to show that the Halpern-Pearl (HP) definition of causality gives intuitively unreasonable answers. Here it is shown that, for each of these examples, we can give two stories consistent with the description in the example, such that intuitions regarding causality are quite different for each story. By adding additional variables, we can disambiguate the stories. Moreover, in the resulting causal models, the HP definition of causality gives the intuitively correct answer. It is also shown that, by adding extra variables, a modification to the original HP definition made to deal with an example of Hopkins and Pearl may not be necessary. Given how much can be done by adding extra variables, there might be a concern that the notion of causality is somewhat unstable. Can adding extra variables in a "conservative" way (i.e., maintaining all the relations between the variables in the original model) cause the answer to the question "Is X=x a cause of Y=y" to alternate between "yes" and "no"? It is shown that we can have such alternation infinitely often, but if we take normality into consideration, we cannot. Indeed, under appropriate normality assumptions. adding an extra variable can change the answer from "yes" to "no", but after that, it cannot cannot change back to "yes".