AITopics

1212.178

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Xie, Yao, Huang, Jiaji, Willett, Rebecca

Changepoint detection for high-dimensional time series with missing data

arXiv.org Machine LearningDec-7-2012

This paper describes a novel approach to change-point detection when the observed high-dimensional data may have missing elements. The performance of classical methods for change-point detection typically scales poorly with the dimensionality of the data, so that a large number of observations are collected after the true change-point before it can be reliably detected. Furthermore, missing components in the observed data handicap conventional approaches. The proposed method addresses these challenges by modeling the dynamic distribution underlying the data as lying close to a time-varying low-dimensional submanifold embedded within the ambient observation space. Specifically, streaming data is used to track a submanifold approximation, measure deviations from this approximation, and calculate a series of statistics of the deviations for detecting when the underlying manifold has changed in a sharp or unexpected manner. The approach described in this paper leverages several recent results in the field of high-dimensional data analysis, including subspace tracking with missing data, multiscale analysis techniques for point clouds, online optimization, and change-point detection performance analysis. Simulations and experiments highlight the robustness and efficacy of the proposed approach in detecting an abrupt change in an otherwise slowly varying low-dimensional manifold.

approximation, data mining, machine learning, (20 more...)

doi: 10.1109/JSTSP.2012.2234082

1208.5062

Country: North America > United States (1.00)

Genre: Research Report > Promising Solution (0.34)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Garcia-Cardona, Cristina, Flenner, Arjuna, Percus, Allon G.

Multiclass Diffuse Interface Models for Semi-Supervised Learning on Graphs

We present a graph-based variational algorithm for multiclass classification of high-dimensional data, motivated by total variation techniques. The energy functional is based on a diffuse interface model with a periodic potential. We augment the model by introducing an alternative measure of smoothness that preserves symmetry among the class labels. Through this modification of the standard Laplacian, we construct an efficient multiclass method that allows for sharp transitions between classes. The experimental results demonstrate that our approach is competitive with the state of the art among other graph-based algorithms.

artificial intelligence, segmentation, upstream oil & gas, (15 more...)

1212.0945

Country:

North America > United States (0.68)
North America > Canada (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.70)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Bouvrie, Jake, Maggioni, Mauro

Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

arXiv.org Artificial IntelligenceDec-5-2012

Many problems in sequential decision making and stochastic control often have natural multiscale structure: sub-tasks are assembled together to accomplish complex goals. Systematically inferring and leveraging hierarchical structure, particularly beyond a single level of abstraction, has remained a longstanding challenge. We describe a fast multiscale procedure for repeatedly compressing, or homogenizing, Markov decision processes (MDPs), wherein a hierarchy of sub-problems at different scales is automatically determined. Coarsened MDPs are themselves independent, deterministic MDPs, and may be solved using existing algorithms. The multiscale representation delivered by this procedure decouples sub-tasks from each other and can lead to substantial improvements in convergence rates both locally within sub-problems and globally across sub-problems, yielding significant computational savings. A second fundamental aspect of this work is that these multiscale decompositions yield new transfer opportunities across different problems, where solutions of sub-tasks at different levels of the hierarchy may be amenable to transfer to new problems. Localized transfer of policies and potential operators at arbitrary scales is emphasized. Finally, we demonstrate compression and transfer in a collection of illustrative domains, including examples involving discrete and continuous statespaces. Keywords: Markov decision processes, hierarchical reinforcement learning, transfer, multiscale analysis.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1212.1143

Country:

Asia (0.67)
North America > United States > Massachusetts (0.45)

Genre:

Workflow (0.93)
Overview > Growing Problem (0.34)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.92)

Kon, Mark A., Plaskota, Leszek

On Some Integrated Approaches to Inference

It is claimed that an explicit partition of information into a priori (prior knowledge) and a posteriori information (data) is an important way of standardizing inference approaches so that they can be compared on a normative scale, and so that notions of optimal algorithms become farther-reaching. The inference methods considered include neural network approaches, information-based complexity, and Monte Carlo, spline, and regularization methods. The model is an extension of currently used continuous complexity models, with a class of algorithms in the form of optimization methods, in which an optimization functional (involving the data) is minimized. This extends the family of current approaches in continuous complexity theory, which include the use of interpolatory algorithms in worst and average case settings.

artificial intelligence, information, machine learning, (20 more...)

1212.118

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Jung, Hyun Joon, Lease, Matthew

Evaluating Classifiers Without Expert Labels

Machine Learning manuscript No. (will be inserted by the editor) Abstract This paper considers the challenge of evaluating a set of classifiers, as done in shared task evaluations like the KDD Cup or NIST TREC, without expert labels. While expert labels provide the traditional cornerstone for evaluating statistical learners, limited or expensive access to experts represents a practical bottleneck. Instead, we seek methodology for estimating performance of the classifiers (relative and absolute) which is more scalable than expert labeling yet preserves high correlation with evaluation based on expert labels. We consider both: 1) using only labels automatically generated by the classifiers themselves (blind evaluation); and 2) using labels obtained via crowdsourcing. While crowdsourcing methods are lauded for scalability, using such data for evaluation raises serious concerns given the prevalence of label noise. In regard to blind evaluation, two broad strategies are investigated: combine & score and score & combine. Combine & Score methods infer a single "pseudo-gold" label set by aggregating classifier labels; classifiers are then evaluated based on this single pseudo-gold label set. On the other hand, score & combine methods: i) sample multiple label sets from classifier outputs, ii) evaluate classifiers on each label set, and iii) average classifier performance across label sets. When additional crowd labels are also collected, we investigate two alternative avenues for exploiting them: 1) direct evaluation of classifiers; or 2) supervision of combine-and-score methods. To assess generality of our techniques, classifier performance is measured using four common classification metrics, with statistical significance tests establishing relative performance of the classifiers for each metric. Finally, we measure both score and rank correlations between estimated classifier performance vs. actual performance according to expert judgments. Rigorous evaluation of classifiers from the TREC 2011 Crowdsourcing Track shows reliable evaluation can be achieved without reliance on expert labels.

artificial intelligence, classifier, machine learning, (17 more...)

1212.096

Country: North America > United States (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Sutherland, Dougal J., Xiong, Liang, Póczos, Barnabás, Schneider, Jeff

Kernels on Sample Sets via Nonparametric Divergence Estimates

Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set from an underlying feature distribution for that group. Our approach employs kernel machines with a kernel on i.i.d. sample sets of vectors. We define certain kernel functions on pairs of distributions, and then use a nonparametric estimator to consistently estimate those functions based on sample sets. The projection of the estimated Gram matrix to the cone of symmetric positive semi-definite matrices enables us to use kernel machines for classification, regression, anomaly detection, and low-dimensional embedding in the space of distributions. We present several numerical experiments both on real and simulated datasets to demonstrate the advantages of our new approach.

artificial intelligence, kernel, machine learning, (18 more...)

1202.0302

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

arXiv.org Artificial IntelligenceDec-5-2012

Making Early Predictions of the Accuracy of Machine Learning Applications

Smith, J. E., Caleb-Solly, P., Tahir, M. A., Sannen, D., van-Brussel, H.

The accuracy of machine learning systems is a widely studied research topic. Established techniques such as cross-validation predict the accuracy on unseen data of the classifier produced by applying a given learning method to a given training data set. However, they do not predict whether incurring the cost of obtaining more data and undergoing further training will lead to higher accuracy. In this paper we investigate techniques for making such early predictions. We note that when a machine learning algorithm is presented with a training set the classifier produced, and hence its error, will depend on the characteristics of the algorithm, on training set's size, and also on its specific composition. In particular we hypothesise that if a number of classifiers are produced, and their observed error is decomposed into bias and variance terms, then although these components may behave differently, their behaviour may be predictable. We test our hypothesis by building models that, given a measurement taken from the classifier created from a limited number of samples, predict the values that would be measured from the classifier produced when the full data set is presented. We create separate models for bias, variance and total error. Our models are built from the results of applying ten different machine learning algorithms to a range of data sets, and tested with "unseen" algorithms and datasets. We analyse the results for various numbers of initial training samples, and total dataset sizes. Results show that our predictions are very highly correlated with the values observed after undertaking the extra training. Finally we consider the more complex case where an ensemble of heterogeneous classifiers is trained, and show how we can accurately estimate an upper bound on the accuracy achievable after further training.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1212.11

Country:

Europe (0.93)
North America > United States > California (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

arXiv.org Machine LearningDec-4-2012

On best subset regression

Xiong, Shifeng

In this paper we discuss the variable selection method from \ell0-norm constrained regression, which is equivalent to the problem of finding the best subset of a fixed size. Our study focuses on two aspects, consistency and computation. We prove that the sparse estimator from such a method can retain all of the important variables asymptotically for even exponentially growing dimensionality under regularity conditions. This indicates that the best subset regression method can efficiently shrink the full model down to a submodel of a size less than the sample size, which can be analyzed by well-developed regression techniques for such cases in a follow-up study. We provide an iterative algorithm, called orthogonalizing subset selection (OSS), to address computational issues in best subset regression. OSS is an EM algorithm, and thus possesses the monotonicity property. For any sparse estimator, OSS can improve its fit of the model by putting it as an initial point. After this improvement, the sparsity of the estimator is kept. Another appealing feature of OSS is that, similarly to an effective algorithm for a continuous optimization problem, OSS can converge to the global solution to the \ell0-norm constrained regression problem if the initial point lies in a neighborhood of the global solution. An accelerating algorithm of OSS and its combination with forward stepwise selection are also investigated. Simulations and a real example are presented to evaluate the performances of the proposed methods.

best subset regression

1112.0918

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)

arXiv.org Machine LearningDec-4-2012

A simple non-parametric Topic Mixture for Authors and Documents

Bleier, Arnim

This article reviews the Author-Topic Model and presents a new non-parametric extension based on the Hierarchical Dirichlet Process. The extension is especially suitable when no prior information about the number of components necessary is available. A blocked Gibbs sampler is described and focus put on staying as close as possible to the original model with only the minimum of theoretical and implementation overhead necessary.

machine learning, natural language, topic model, (19 more...)

1211.6248

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.78)
Information Technology > Artificial Intelligence > Natural Language (0.76)