AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Bayesian Masking: Sparse Bayesian Estimation with Weaker Shrinkage Bias

Kondo, Yohei, Hayashi, Kohei, Maeda, Shin-ichi

arXiv.org Machine LearningOct-6-2015

A common strategy for sparse linear regression is to introduce regularization, which eliminates irrelevant features by letting the corresponding weights be zeros. However, regularization often shrinks the estimator for relevant features, which leads to incorrect feature selection. Motivated by the above-mentioned issue, we propose Bayesian masking (BM), a sparse estimation method which imposes no regularization on the weights. The key concept of BM is to introduce binary latent variables that randomly mask features. Estimating the masking rates determines the relevance of the features automatically. We derive a variational Bayesian inference algorithm that maximizes the lower bound of the factorized information criterion (FIC), which is a recently developed asymptotic criterion for evaluating the marginal log-likelihood. In addition, we propose reparametrization to accelerate the convergence of the derived algorithm. Finally, we show that BM outperforms Lasso and automatic relevance determination (ARD) in terms of the sparsity-shrinkage trade-off.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1509.01004

Country: Asia > Japan > Honshū (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Jointly Learning Multiple Measures of Similarities from Triplet Comparisons

Zhang, Liwen, Maji, Subhransu, Tomioka, Ryota

arXiv.org Artificial IntelligenceOct-6-2015

Similarity between objects is multi-faceted and it can be easier for human annotators to measure it when the focus is on a specific aspect. We consider the problem of mapping objects into view-specific embeddings where the distance between them is consistent with the similarity comparisons of the form "from the t-th view, object A is more similar to B than to C". Our framework jointly learns view-specific embeddings exploiting correlations between views. Experiments on a number of datasets, including one of multi-view crowdsourced comparison on bird images, show the proposed method achieves lower triplet generalization error when compared to both learning embeddings independently for each view and all views pooled into one view. Our method can also be used to learn multiple measures of similarity over input features taking class labels into account and compares favorably to existing approaches for multi-task metric learning on the ISOLET dataset.

artificial intelligence, machine learning, triplet, (16 more...)

arXiv.org Artificial Intelligence

1503.01521

Country: North America > United States (0.46)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Communications > Social Media > Crowdsourcing (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Bayesian Inference via Approximation of Log-likelihood for Priors in Exponential Family

Ardeshiri, Tohid, Orguner, Umut, Gustafsson, Fredrik

arXiv.org Machine LearningOct-5-2015

In this paper, a Bayesian inference technique based on Taylor series approximation of the logarithm of the likelihood function is presented. The proposed approximation is devised for the case, where the prior distribution belongs to the exponential family of distributions. The logarithm of the likelihood function is linearized with respect to the sufficient statistic of the prior distribution in exponential family such that the posterior obtains the same exponential family form as the prior. Similarities between the proposed method and the extended Kalman filter for nonlinear filtering are illustrated. Furthermore, an extended target measurement update for target models where the target extent is represented by a random matrix having an inverse Wishart distribution is derived. The approximate update covers the important case where the spread of measurement is due to the target extent as well as the measurement noise in the sensor.

approximation, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1510.01225

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Improved Estimation of Class Prior Probabilities through Unlabeled Data

Matloff, Norman

arXiv.org Machine LearningOct-5-2015

Work in the classification literature has shown that in computing a classification function, one need not know the class membership of all observations in the training set; the unlabeled observations still provide information on the marginal distribution of the feature set, and can thus contribute to increased classification accuracy for future observations. The present paper will show that this scheme can also be used for the estimation of class prior probabilities, which would be very useful in applications in which it is difficult or expensive to determine class membership. Both parametric and nonparametric estimators are developed. Asymptotic distributions of the estimators are derived, and it is proven that the use of the unlabeled observations does reduce asymptotic variance. This methodology is also extended to the estimation of subclass probabilities.

artificial intelligence, machine learning, unlabeled data, (17 more...)

arXiv.org Machine Learning

1510.01422

Country: North America > United States > California (0.68)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.48)
(2 more...)

Add feedback

Boosting in the presence of outliers: adaptive classification with non-convex loss functions

Li, Alexander Hanbo, Bradic, Jelena

arXiv.org Artificial IntelligenceOct-5-2015

This paper examines the role and efficiency of the non-convex loss functions for binary classification problems. In particular, we investigate how to design a simple and effective boosting algorithm that is robust to the outliers in the data. The analysis of the role of a particular non-convex loss for prediction accuracy varies depending on the diminishing tail properties of the gradient of the loss -- the ability of the loss to efficiently adapt to the outlying data, the local convex properties of the loss and the proportion of the contaminated data. In order to use these properties efficiently, we propose a new family of non-convex losses named $\gamma$-robust losses. Moreover, we present a new boosting framework, {\it Arch Boost}, designed for augmenting the existing work such that its corresponding classification algorithm is significantly more adaptable to the unknown data contamination. Along with the Arch Boosting framework, the non-convex losses lead to the new class of boosting algorithms, named adaptive, robust, boosting (ARB). Furthermore, we present theoretical examples that demonstrate the robustness properties of the proposed algorithms. In particular, we develop a new breakdown point analysis and a new influence function analysis that demonstrate gains in robustness. Moreover, we present new theoretical results, based only on local curvatures, which may be used to establish statistical and optimization properties of the proposed Arch boosting algorithms with highly non-convex loss functions. Extensive numerical calculations are used to illustrate these theoretical properties and reveal advantages over the existing boosting methods when data exhibits a number of outliers.

artificial intelligence, loss function, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/01621459.2016.1273116

1510.01064

Country: North America > United States > California (0.46)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Distributed Parameter Map-Reduce

Li, Qi

arXiv.org Machine LearningOct-3-2015

This paper describes how to convert a machine learning problem into a series of map-reduce tasks. We study logistic regression algorithm. In logistic regression algorithm, it is assumed that samples are independent and each sample is assigned a probability. Parameters are obtained by maxmizing the product of all sample probabilities. Rapid expansion of training samples brings challenges to machine learning method. Training samples are so many that they can be only stored in distributed file system and driven by map-reduce style programs. The main step of logistic regression is inference. According to map-reduce spirit, each sample makes inference through a separate map procedure. But the premise of inference is that the map procedure holds parameters for all features in the sample. In this paper, we propose Distributed Parameter Map-Reduce, in which not only samples, but also parameters are distributed in nodes of distributed filesystem. Through a series of map-reduce tasks, we assign each sample parameters for its features, make inference for the sample and update paramters of the model. The above processes are excuted looply until convergence. We test the proposed algorithm in actual hadoop production environment. Experiments show that the acceleration of the algorithm is in linear relationship with the number of cluster nodes.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1510.00817

Genre: Research Report > New Finding (0.76)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

Convex Modeling of Interactions with Strong Heredity

Haris, Asad, Witten, Daniela, Simon, Noah

arXiv.org Machine LearningOct-3-2015

We consider the task of fitting a regression model involving interactions among a potentially large set of covariates, in which we wish to enforce strong heredity. We propose FAMILY, a very general framework for this task. Our proposal is a generalization of several existing methods, such as VANISH [Radchenko and James, 2010], hierNet [Bien et al., 2013], the all-pairs lasso, and the lasso using only main effects. It can be formulated as the solution to a convex optimization problem, which we solve using an efficient alternating directions method of multipliers (ADMM) algorithm. This algorithm has guaranteed convergence to the global optimum, can be easily specialized to any convex penalty function of interest, and allows for a straightforward extension to the setting of generalized linear models. We derive an unbiased estimator of the degrees of freedom of FAMILY, and explore its performance in a simulation study and on an HIV sequence data set.

artificial intelligence, interaction, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1080/10618600.2015.1067217

1410.3517

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Online Tensor Methods for Learning Latent Variable Models

Huang, Furong, Niranjan, U. N., Hakeem, Mohammad Umar, Anandkumar, Animashree

arXiv.org Machine LearningOct-3-2015

We introduce an online tensor decomposition based approach for two latent variable modeling problems namely, (1) community detection, in which we learn the latent communities that the social actors in social networks belong to, and (2) topic modeling, in which we infer hidden topics of text articles. We consider decomposition of moment tensors using stochastic gradient descent. We conduct optimization of multilinear operations in SGD and avoid directly forming the tensors, to save computational and storage costs. We present optimized algorithm in two platforms. Our GPU-based implementation exploits the parallelism of SIMD architectures to allow for maximum speed-up by a careful optimization of storage and data transfer, whereas our CPU-based implementation uses efficient sparse matrix computations and is suitable for large sparse datasets. For the community detection problem, we demonstrate accuracy and computational efficiency on Facebook, Yelp and DBLP datasets, and for the topic modeling problem, we also demonstrate good performance on the New York Times dataset. We compare our results to the state-of-the-art algorithms such as the variational method, and report a gain of accuracy and a gain of several orders of magnitude in the execution time.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1309.0787

Country: North America > United States > Arizona > Maricopa County (1.00)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Distributed Multitask Learning

Wang, Jialei, Kolar, Mladen, Srebro, Nathan

arXiv.org Machine LearningOct-2-2015

We consider the problem of distributed multi-task learning, where each machine learns a separate, but related, task. Specifically, each machine learns a linear predictor in high-dimensional space,where all tasks share the same small support. We present a communication-efficient estimator based on the debiased lasso and show that it is comparable with the optimal centralized method.

artificial intelligence, group lasso, machine learning, (18 more...)

arXiv.org Machine Learning

1510.00633

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Education (0.46)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Distinguishing short and long $Fermi$ gamma-ray bursts

Tarnopolski, Mariusz

arXiv.org Machine LearningOct-2-2015

Two classes of gamma-ray bursts (GRBs), short and long, have been determined without any doubts, and are usually ascribed to different progenitors, yet these classes overlap for a variety of descriptive parameters. A subsample of 46 long and 22 short $Fermi$ GRBs with estimated Hurst Exponents (HEs), complemented by minimum variability time-scales (MVTS) and durations ($T_{90}$) is used to perform a supervised Machine Learning (ML) and Monte Carlo (MC) simulation using a Support Vector Machine (SVM) algorithm. It is found that while $T_{90}$ itself performs very well in distinguishing short and long GRBs, the overall success ratio is higher when the training set is complemented by MVTS and HE. These results may allow to introduce a new (non-linear) parameter that might provide less ambiguous classification of GRBs.

artificial intelligence, grb, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1093/mnras/stv2061

1507.04886

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.86)

Add feedback