Goto

Collaborating Authors

 Statistical Learning


Online Learning for Ground Trajectory Prediction

arXiv.org Artificial Intelligence

This paper presents a model based on an hybrid system to numerically simulate the climbing phase of an aircraft. This model is then used within a trajectory prediction tool. Finally, the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) optimization algorithm is used to tune five selected parameters, and thus improve the accuracy of the model. Incorporated within a trajectory prediction tool, this model can be used to derive the order of magnitude of the prediction error over time, and thus the domain of validity of the trajectory prediction. A first validation experiment of the proposed model is based on the errors along time for a one-time trajectory prediction at the take off of the flight with respect to the default values of the theoretical BADA model. This experiment, assuming complete information, also shows the limit of the model. A second experiment part presents an on-line trajectory prediction, in which the prediction is continuously updated based on the current aircraft position. This approach raises several issues, for which improvements of the basic model are proposed, and the resulting trajectory prediction tool shows statistically significantly more accurate results than those of the default model.


Deviation optimal learning using greedy Q-aggregation

arXiv.org Machine Learning

Given a finite family of functions, the goal of model selection aggregation is to construct a procedure that mimics the function from this family that is the closest to an unknown regression function. More precisely, we consider a general regression model with fixed design and measure the distance between functions by the mean squared error at the design points. While procedures based on exponential weights are known to solve the problem of model selection aggregation in expectation, they are, surprisingly, sub-optimal in deviation. We propose a new formulation called Q-aggregation that addresses this limitation; namely, its solution leads to sharp oracle inequalities that are optimal in a minimax sense. Moreover, based on the new formulation, we design greedy Q-aggregation procedures that produce sparse aggregation models achieving the optimal rate. The convergence and performance of these greedy procedures are illustrated and compared with other standard methods on simulated examples.


Expectation-Propogation for the Generative Aspect Model

arXiv.org Machine Learning

The generative aspect model is an extension of the multinomial model for text that allows word probabilities to vary stochastically across documents. Previous results with aspect models have been promising, but hindered by the computational difficulty of carrying out inference and learning. This paper demonstrates that the simple variational methods of Blei et al (2001) can lead to inaccurate inferences and biased learning for the generative aspect model. We develop an alternative approach that leads to higher accuracy at comparable cost. An extension of Expectation-Propagation is used for inference and then embedded in an EM algorithm for learning. Experimental results are presented for both synthetic and real data sets.


Bayesian one-mode projection for dynamic bipartite graphs

arXiv.org Machine Learning

We propose a Bayesian methodology for one-mode projecting a bipartite network that is being observed across a series of discrete time steps. The resulting one mode network captures the uncertainty over the presence/absence of each link and provides a probability distribution over its possible weight values. Additionally, the incorporation of prior knowledge over previous states makes the resulting network less sensitive to noise and missing observations that usually take place during the data collection process. The methodology consists of computationally inexpensive update rules and is scalable to large problems, via an appropriate distributed implementation.


Advances in Boosting (Invited Talk)

arXiv.org Machine Learning

Boosting is a general method of generating many simple classification rules and combining them into a single, highly accurate rule. In this talk, I will review the AdaBoost boosting algorithm and some of its underlying theory, and then look at how this theory has helped us to face some of the challenges of applying AdaBoost in two domains: In the first of these, we used boosting for predicting and modeling the uncertainty of prices in complicated, interacting auctions. The second application was to the classification of caller utterances in a telephone spoken-dialogue system where we faced two challenges: the need to incorporate prior knowledge to compensate for initially insufficient data; and a later need to filter the large stream of unlabeled examples being collected to select the ones whose labels are likely to be most informative.


Optimal Time Bounds for Approximate Clustering

arXiv.org Machine Learning

Clustering is a fundamental problem in unsupervised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect the emph{k-median} objective function, a natural formulation of clustering in which we attempt to minimize the average distance to cluster centers. One of the main contributions of this paper is a simple but powerful sampling technique that we call emph{successive sampling} that could be of independent interest. We show that our sampling procedure can rapidly identify a small set of points (of size just O(klog{n/k})) that summarize the input points for the purpose of clustering. Using successive sampling, we develop an algorithm for the k-median problem that runs in O(nk) time for a wide range of values of k and is guaranteed, with high probability, to return a solution with cost at most a constant factor times optimal. We also establish a lower bound of Omega(nk) on any randomized constant-factor approximation algorithm for the k-median problem that succeeds with even a negligible (say 1/100) probability. Thus we establish a tight time bound of Theta(nk) for the k-median problem for a wide range of values of k. The best previous upper bound for the problem was O(nk), where the O-notation hides polylogarithmic factors in n and k. The best previous lower bound of O(nk) applied only to deterministic k-median algorithms. While we focus our presentation on the k-median objective, all our upper bounds are valid for the k-means objective as well. In this context our algorithm compares favorably to the widely used k-means heuristic, which requires O(nk) time for just one iteration and provides no useful approximation guarantees.


Staged Mixture Modelling and Boosting

arXiv.org Machine Learning

In this paper, we introduce and evaluate a data-driven staged mixture modeling technique for building density, regression, and classification models. Our basic approach is to sequentially add components to a finite mixture model using the structural expectation maximization (SEM) algorithm. We show that our technique is qualitatively similar to boosting. This correspondence is a natural byproduct of the fact that we use the SEM algorithm to sequentially fit the mixture model. Finally, in our experimental evaluation, we demonstrate the effectiveness of our approach on a variety of prediction and density estimation tasks using real-world data.


An Information-Theoretic External Cluster-Validity Measure

arXiv.org Machine Learning

In this paper we propose a measure of clustering quality or accuracy that is appropriate in situations where it is desirable to evaluate a clustering algorithm by somehow comparing the clusters it produces with ``ground truth' consisting of classes assigned to the patterns by manual means or some other means in whose veracity there is confidence. Such measures are refered to as ``external'. Our measure also has the characteristic of allowing clusterings with different numbers of clusters to be compared in a quantitative and principled way. Our evaluation scheme quantitatively measures how useful the cluster labels of the patterns are as predictors of their class labels. In cases where all clusterings to be compared have the same number of clusters, the measure is equivalent to the mutual information between the cluster labels and the class labels. In cases where the numbers of clusters are different, however, it computes the reduction in the number of bits that would be required to encode (compress) the class labels if both the encoder and decoder have free acccess to the cluster labels. To achieve this encoding the estimated conditional probabilities of the class labels given the cluster labels must also be encoded. These estimated probabilities can be seen as a model for the class labels and their associated code length as a model cost.


Tree-dependent Component Analysis

arXiv.org Machine Learning

We present a generalization of independent component analysis (ICA), where instead of looking for a linear transform that makes the data components independent, we look for a transform that makes the data components well fit by a tree-structured graphical model. Treating the problem as a semiparametric statistical problem, we show that the optimal transform is found by minimizing a contrast function based on mutual information, a function that directly extends the contrast function used for classical ICA. We provide two approximations of this contrast function, one using kernel density estimation, and another using kernel generalized variance. This tree-dependent component analysis framework leads naturally to an efficient general multivariate density estimation technique where only bivariate density estimation needs to be performed.


Learning Hierarchical Object Maps Of Non-Stationary Environments with mobile robots

arXiv.org Machine Learning

Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical representation, which links individual objects with generic shape templates of object classes. We derive an approximate EM algorithm for learning shape parameters at both levels of the hierarchy, using local occupancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of objects and object templates in the environment. Experimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office environments. The approach outperforms a previously developed non-hierarchical algorithm that models objects but lacks class templates.