AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

A Theoretical and Experimental Comparison of the EM and SEM Algorithm

Blömer, Johannes, Bujna, Kathrin, Kuntze, Daniel

arXiv.org Machine LearningJul-2-2014

In this paper we provide a new analysis of the SEM algorithm. Unlike previous work, we focus on the analysis of a single run of the algorithm. First, we discuss the algorithm for general mixture distributions. Second, we consider Gaussian mixture models and show that with high probability the update equations of the EM algorithm and its stochastic variant are almost the same, given that the input set is sufficiently large. Our experiments confirm that this still holds for a large number of successive update steps. In particular, for Gaussian mixture models, we show that the stochastic variant runs nearly twice as fast.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1310.5034

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)

Add feedback

Relational Logistic Regression

Kazemi, Seyed Mehran (University of British Columbia) | Buchman, David (University of British Columbia) | Kersting, Kristian (Technical University of Dortmund) | Natarajan, Sriraam (Indiana University) | Poole, David (University of British Columbia)

AAAI ConferencesJul-1-2014

Logistic regression is a commonly used representation for aggregators in Bayesian belief networks when a child has multiple parents. In this paper we consider extending logistic regression to relational models, where we want to model varying populations and interactions among parents. In this paper, we first examine the representational problems caused by population variation. We show how these problems arise even in simple cases with a single parametrized parent, and propose a linear relational logistic regression which we show can represent arbitrary linear (in population size) decision thresholds, whereas the traditional logistic regression cannot. Then we examine representing interactions among the parents of a child node, and representing non-linear dependency on population size. We propose a multi-parent relational logistic regression which can represent interactions among parents and arbitrary polynomial decision thresholds. Finally, we show how other well-known aggregators can be represented using this relational logistic regression.

relational logistic regression

AAAI Conferences

Fourteenth International Conference on the Principles of Knowledge Representation and Reasoning

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Add feedback

Mind the Nuisance: Gaussian Process Classification using Privileged Noise

Hernández-Lobato, Daniel, Sharmanska, Viktoriia, Kersting, Kristian, Lampert, Christoph H., Quadrianto, Novi

arXiv.org Machine LearningJul-1-2014

The learning with privileged information setting has recently attracted a lot of attention within the machine learning community, as it allows the integration of additional knowledge into the training process of a classifier, even when this comes in the form of a data modality that is not available at test time. Here, we show that privileged information can naturally be treated as noise in the latent function of a Gaussian Process classifier (GPC). That is, in contrast to the standard GPC setting, the latent function is not just a nuisance but a feature: it becomes a natural measure of confidence about the training data by modulating the slope of the GPC sigmoid likelihood function. Extensive experiments on public datasets show that the proposed GPC method using privileged noise, called GPC+, improves over a standard GPC without privileged knowledge, and also over the current state-of-the-art SVM-based method, SVM+. Moreover, we show that advanced neural networks and deep learning methods can be compressed as privileged information.

artificial intelligence, machine learning, privileged information, (15 more...)

arXiv.org Machine Learning

1407.0179

Country:

Europe (0.93)
North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Rates of Convergence for Nearest Neighbor Classification

Chaudhuri, Kamalika, Dasgupta, Sanjoy

arXiv.org Machine LearningJul-1-2014

Nearest neighbor methods are a popular class of nonparametric estimators with several desirable properties, such as adaptivity to different distance scales in different regions of space. Prior work on convergence rates for nearest neighbor classification has not fully reflected these subtle properties. We analyze the behavior of these estimators in metric spaces and provide finite-sample, distribution-dependent rates of convergence under minimal assumptions. As a by-product, we are able to establish the universal consistency of nearest neighbor in a broader range of data spaces than was previously known. We illustrate our upper and lower bounds by introducing smoothness classes that are customized for nearest neighbor classification.

artificial intelligence, machine learning, probability, (18 more...)

arXiv.org Machine Learning

1407.0067

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.47)

Add feedback

Direct Density-Derivative Estimation and Its Application in KL-Divergence Approximation

Sasaki, Hiroaki, Noh, Yung-Kyun, Sugiyama, Masashi

arXiv.org Machine LearningJun-30-2014

Estimation of density derivatives is a versatile tool in statistical data analysis. A naive approach is to first estimate the density and then compute its derivative. However, such a two-step approach does not work well because a good density estimator does not necessarily mean a good density-derivative estimator. In this paper, we give a direct method to approximate the density derivative without estimating the density itself. Our proposed estimator allows analytic and computationally efficient approximation of multi-dimensional high-order density derivatives, with the ability that all hyper-parameters can be chosen objectively by cross-validation. We further show that the proposed density-derivative estimator is useful in improving the accuracy of non-parametric KL-divergence estimation via metric learning. The practical superiority of the proposed method is experimentally demonstrated in change detection and feature selection.

artificial intelligence, estimation, machine learning, (16 more...)

arXiv.org Machine Learning

1406.7638

Country: Asia (0.15)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.36)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.31)

Add feedback

An Efficient Hybrid CS and K-Means Algorithm for the Capacitated PMedian Problem

Mazinan, Hassan Gholami, Ahmadi, Gholam Reza, Khaji, Erfan

arXiv.org Artificial IntelligenceJun-29-2014

The capacitated P-median problem (CPMP) is an NPcomplete problem which investigates the problem of partitioning a set of N nodes into M different disjoint clusters, each represented by a certain node that is designed as concentrator. The NM nodes that are not chosen as concentrators are referred as terminals. The partitioning of the initial N nodes must be performed in such a way that a measure of total distance between the terminals and their corresponding concentrators is minimized. In addition, a capacity constraint imposed on the concentrators must be met, in order to obtain feasible solutions to the problem [1-4]. A direct application of the CPMP is in the context of communication networks deployment, where a set of terminals in the network must be assigned to the corresponding concentrator in order to compose access networks that satisfy the rate requirements of such terminals [5]. In this context, most of the efforts so far has focused on the topological design of communication networks (e.g. Wireless Sensor Networks (WSN), backbone networks or mobile networks [6-8]) since many of the processes involved in such networks can be approached as a CPMP problem, e.g.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

1406.7473

Country: Asia > Middle East > Iran (0.14)

Genre: Research Report (0.40)

Industry: Telecommunications (0.66)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Learning Nonlinear Functions Using Regularized Greedy Forest

Johnson, Rie, Zhang, Tong

arXiv.org Machine LearningJun-28-2014

We consider the problem of learning a forest of nonlinear decision rules with general loss functions. The standard methods employ boosted decision trees such as Adaboost for exponential loss and Friedman's gradient boosting for general loss. In contrast to these traditional boosting algorithms that treat a tree learner as a black box, the method we propose directly learns decision forests via fully-corrective regularized greedy search using the underlying forest structure. Our method achieves higher accuracy and smaller models than gradient boosting (and Adaboost with exponential loss) on many datasets.

artificial intelligence, machine learning, regularization, (18 more...)

arXiv.org Machine Learning

1109.0887

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Generalized Canonical Correlation Analysis for Classification

Shen, Cencheng, Sun, Ming, Tang, Minh, Priebe, Carey E.

arXiv.org Machine LearningJun-26-2014

It is common to find collections/measurements of related objects, such as the same article in different languages, similar talks given by different presenters, similar weather patterns in different years, etc. It remains to determine how much the available big data helps us in statistical analysis; simply throwing every collected dataset into the mix may not yield an optimal output. Thus it is natural and important to understand theoretically when and how additional datasets improve the performance of various statistical analysis tasks such as regression, clustering, classification, etc. This is our motivation to explore the following classification problem.

auxiliary feature, gcca projection, projection, (15 more...)

arXiv.org Machine Learning

doi: 10.1016/j.jmva.2014.05.011

1304.7981

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Government > Military (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Add feedback

Combining predictions from linear models when training and test inputs differ

van Ommen, Thijs

arXiv.org Machine LearningJun-24-2014

Methods for combining predictions from different models in a supervised learning setting must somehow estimate/predict the quality of a model's predictions at unknown future inputs. Many of these methods (often implicitly) make the assumption that the test inputs are identical to the training inputs, which is seldom reasonable. By failing to take into account that prediction will generally be harder for test inputs that did not occur in the training set, this leads to the selection of too complex models. Based on a novel, unbiased expression for KL divergence, we propose XAIC and its special case FAIC as versions of AIC intended for prediction that use different degrees of knowledge of the test inputs. Both methods substantially differ from and may outperform all the known versions of AIC even when the training and test inputs are iid, and are especially useful for deterministic inputs and under covariate shift. Our experiments on linear models suggest that if the test and training inputs differ substantially, then XAIC and FAIC predictively outperform AIC, BIC and several other methods including Bayesian model averaging.

artificial intelligence, experiment, machine learning, (17 more...)

arXiv.org Machine Learning

1406.62

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

On Soft Power Diagrams

Borgwardt, Steffen

arXiv.org Machine LearningJun-24-2014

Noname manuscript No. (will be inserted by the editor) Abstract Many applications in data analysis begin with a set of points in a Euclidean space that is partitioned into clusters. Common tasks then are to devise a classifier deciding which of the clusters a new point is associated to, finding outliers with respect to the clusters, or identifying the type of clustering used for the partition. One of the common kinds of clusterings are (balanced) least-squares assignments with respect to a given set of sites. For these, there is a'separating power diagram' for which each cluster lies in its own cell. In the present paper, we aim for efficient algorithms for outlier detection and the computation of thresholds that measure how similar a clustering is to a leastsquares assignment for fixed sites. For this purpose, we devise a new model for the computation of a'soft power diagram', which allows a soft separation of the clusters with'point counting properties'; e.g. As our results hold for a more general non-convex model of free sites, we describe it and our proofs in this more general way. Its locally optimal solutions satisfy the aforementioned point counting properties. For our target applications that use fixed sites, our algorithms are efficiently solvable to global optimality by linear programming.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1307.3949

Country: Europe > Germany (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback