AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

One-Class Classification: Taxonomy of Study and Review of Techniques

arXiv.org Artificial IntelligenceNov-29-2013

One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.

classifier, evolutionary algorithm, machine learning, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1017/S026988891300043X

1312.0049

Country:

Europe (1.00)
North America > Canada (0.67)
North America > United States > California (0.28)

Genre:

Overview (1.00)
Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(9 more...)

Add feedback

Viterbi training in PRISM

Sato, Taisuke, Kubota, Keiichi

arXiv.org Artificial IntelligenceNov-28-2013

VT (Viterbi training), or hard EM, is an efficient way of parameter learning for probabilistic models with hidden variables. Given an observation $y$, it searches for a state of hidden variables $x$ that maximizes $p(x,y \mid \theta)$ by coordinate ascent on parameters $\theta$ and $x$. In this paper we introduce VT to PRISM, a logic-based probabilistic modeling system for generative models. VT improves PRISM in three ways. First VT in PRISM converges faster than EM in PRISM due to the VT's termination condition. Second, parameters learned by VT often show good prediction performance compared to those learned by EM. We conducted two parsing experiments with probabilistic grammars while learning parameters by a variety of inference methods, i.e.\ VT, EM, MAP and VB. The result is that VT achieved the best parsing accuracy among them in both experiments. Also we conducted a similar experiment for classification tasks where a hidden variable is not a prediction target unlike probabilistic grammars. We found that in such a case VT does not necessarily yield superior performance. Third since VT always deals with a single probability of a single explanation, Viterbi explanation, the exclusiveness condition that is imposed on PRISM programs is no more required if we learn parameters by VT. Last but not least we can say that as VT in PRISM is general and applicable to any PRISM program, it largely reduces the need for the user to develop a specific VT algorithm for a specific model. Furthermore since VT in PRISM can be used just by setting a PRISM flag appropriately, it makes VT easily accessible to (probabilistic) logic programmers. To appear in Theory and Practice of Logic Programming (TPLP).

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1017/S1471068413000677

1303.5659

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(2 more...)

Add feedback

On Approximate Inference for Generalized Gaussian Process Models

Shang, Lifeng, Chan, Antoni B.

arXiv.org Machine LearningNov-27-2013

A generalized Gaussian process model (GGPM) is a unifying framework that encompasses many existing Gaussian process (GP) models, such as GP regression, classification, and counting. In the GGPM framework, the observation likelihood of the GP model is itself parameterized using the exponential family distribution (EFD). In this paper, we consider efficient algorithms for approximate inference on GGPMs using the general form of the EFD. A particular GP model and its associated inference algorithms can then be formed by changing the parameters of the EFD, thus greatly simplifying its creation for task-specific output domains. We demonstrate the efficacy of this framework by creating several new GP models for regressing to non-negative reals and to real intervals. We also consider a closed-form Taylor approximation for efficient inference on GGPMs, and elaborate on its connections with other model-specific heuristic closed-form approximations. Finally, we present a comprehensive set of experiments to compare approximate inference algorithms on a wide variety of GGPMs.

approximation, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1311.6371

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

The asymptotics of ranking algorithms

Duchi, John C., Mackey, Lester, Jordan, Michael I.

arXiv.org Machine LearningNov-26-2013

We consider the predictive problem of supervised ranking, where the task is to rank sets of candidate items returned in response to queries. Although there exist statistical procedures that come with guarantees of consistency in this setting, these procedures require that individuals provide a complete ranking of all items, which is rarely feasible in practice. Instead, individuals routinely provide partial preference information, such as pairwise comparisons of items, and more practical approaches to ranking have aimed at modeling this partial preference data directly. As we show, however, such an approach raises serious theoretical challenges. Indeed, we demonstrate that many commonly used surrogate losses for pairwise comparison data do not yield consistency; surprisingly, we show inconsistency even in low-noise settings. With these negative results as motivation, we present a new approach to supervised ranking based on aggregation of partial preferences, and we develop $U$-statistic-based empirical risk minimization procedures. We present an asymptotic analysis of these new procedures, showing that they yield consistency results that parallel those available for classification. We complement our theoretical results with an experiment studying the new procedures in a large-scale web-ranking task.

artificial intelligence, consistency, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1214/13-AOS1142

1204.1688

Country: North America > United States > California (0.68)

Genre: Research Report > New Finding (0.92)

Industry: Government (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Local and global asymptotic inference in smoothing spline models

Shang, Zuofeng, Cheng, Guang

arXiv.org Machine LearningNov-26-2013

This article studies local and global inference for smoothing spline estimation in a unified asymptotic framework. We first introduce a new technical tool called functional Bahadur representation, which significantly generalizes the traditional Bahadur representation in parametric models, that is, Bahadur [Ann. Inst. Statist. Math. 37 (1966) 577-580]. Equipped with this tool, we develop four interconnected procedures for inference: (i) pointwise confidence interval; (ii) local likelihood ratio testing; (iii) simultaneous confidence band; (iv) global likelihood ratio testing. In particular, our confidence intervals are proved to be asymptotically valid at any point in the support, and they are shorter on average than the Bayesian confidence intervals proposed by Wahba [J. R. Stat. Soc. Ser. B Stat. Methodol. 45 (1983) 133-150] and Nychka [J. Amer. Statist. Assoc. 83 (1988) 1134-1143]. We also discuss a version of the Wilks phenomenon arising from local/global likelihood ratio testing. It is also worth noting that our simultaneous confidence bands are the first ones applicable to general quasi-likelihood models. Furthermore, issues relating to optimality and efficiency are carefully addressed. As a by-product, we discover a surprising relationship between periodic and nonperiodic smoothing splines in terms of inference.

artificial intelligence, assumption, machine learning, (15 more...)

arXiv.org Machine Learning

doi: 10.1214/13-AOS1164

1212.6788

Country: North America > United States > Indiana (0.46)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Simon, Noah, Friedman, Jerome, Hastie, Trevor

arXiv.org Machine LearningNov-25-2013

In this paper we purpose a blockwise descent algorithm for group-penalized multiresponse regression. Using a quasi-newton framework we extend this to group-penalized multinomial regression. We give a publicly available implementation for these in R, and compare the speed of this algorithm to a competing algorithm --- we show that our implementation is an order of magnitude faster than its competitor, and can solve gene-expression-sized problems in real time.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1311.6529

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Summary Statistics for Partitionings and Feature Allocations

Fidaner, Işık Barış, Cemgil, Ali Taylan

arXiv.org Machine LearningNov-25-2013

Infinite mixture models are commonly used for clustering. One can sample from the posterior of mixture assignments by Monte Carlo methods or find its maximum a posteriori solution by optimization. However, in some problems the posterior is diffuse and it is hard to interpret the sampled partitionings. In this paper, we introduce novel statistics based on block sizes for representing sample sets of partitionings and feature allocations. We develop an element-based definition of entropy to quantify segmentation among their elements. Then we propose a simple algorithm called entropy agglomeration (EA) to summarize and visualize this information. Experiments on various infinite mixture posteriors as well as a feature allocation dataset demonstrate that the proposed statistics are useful in practice.

bioinformatics, information, machine learning, (19 more...)

arXiv.org Machine Learning

1310.0509

Country:

South America (1.00)
North America (1.00)
Europe (1.00)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Are all training examples equally valuable?

Lapedriza, Agata, Pirsiavash, Hamed, Bylinskii, Zoya, Torralba, Antonio

arXiv.org Machine LearningNov-25-2013

When learning a new concept, not all training examples may prove equally useful for training: some may have higher or lower training value than others. The goal of this paper is to bring to the attention of the vision community the following considerations: (1) some examples are better than others for training detectors or classifiers, and (2) in the presence of better examples, some examples may negatively impact performance and removing them may be beneficial. In this paper, we propose an approach for measuring the training value of an example, and use it for ranking and greedily sorting examples. We test our methods on different vision tasks, models, datasets and classifiers. Our experiments show that the performance of current state-of-the-art detectors and classifiers can be improved when training on a subset, rather than the whole training set.

artificial intelligence, machine learning, training example, (20 more...)

arXiv.org Machine Learning

1311.651

Country: North America > United States > Massachusetts (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Add feedback

Parallel Coordinate Descent Methods for Big Data Optimization

Richtárik, Peter, Takáč, Martin

arXiv.org Artificial IntelligenceNov-25-2013

In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function. The theoretical speedup, as compared to the serial method, and referring to the number of iterations needed to approximately solve the problem with high probability, is a simple expression depending on the number of parallel processors and a natural and easily computable measure of separability of the smooth component of the objective function. In the worst case, when no degree of separability is present, there may be no speedup; in the best case, when the problem is separable, the speedup is equal to the number of processors. Our analysis also works in the mode when the number of blocks being updated at each iteration is random, which allows for modeling situations with busy or unreliable processors. We show that our algorithm is able to solve a LASSO problem involving a matrix with 20 billion nonzeros in 2 hours on a large memory node with 24 cores.

artificial intelligence, iteration, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1212.0873

Country: Europe (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Hardware (0.87)
Information Technology > Data Science (0.64)

Add feedback

Robust Low-rank Tensor Recovery: Models and Algorithms

Goldfarb, Donald, Qin, Zhiwei

arXiv.org Machine LearningNov-24-2013

Robust tensor recovery plays an instrumental role in robustifying tensor decompositions for multilinear data analysis against outliers, gross corruptions and missing values and has a diverse array of applications. In this paper, we study the problem of robust low-rank tensor recovery in a convex optimization framework, drawing upon recent advances in robust Principal Component Analysis and tensor completion. We propose tailored optimization algorithms with global convergence guarantees for solving both the constrained and the Lagrangian formulations of the problem. These algorithms are based on the highly efficient alternating direction augmented Lagrangian and accelerated proximal gradient methods. We also propose a nonconvex model that can often improve the recovery results from the convex models. We investigate the empirical recoverability properties of the convex and nonconvex formulations and compare the computational performance of the algorithms on simulated data. We demonstrate through a number of real applications the practical effectiveness of this convex optimization framework for robust low-rank tensor recovery.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1137/130905010

1311.6182

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback