AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Oracle inequalities for ranking and U-processes with Lasso penalty

arXiv.org Machine LearningDec-17-2015

Model selection is an important challenge, if one works with data sets containing many predictors. Finding relevant variables helps to understand better the problem and improves statistical inference. In t he literature there are many methods solving such problems. One of them is empiri cal risk minimization with the penalty, for instance Lasso (Tibshir ani, 1996). The main characteristic of this procedure is an ability to selec t relevant predictors and estimate unknown parameters simultaneously. In th e paper we apply these ideas to the pairwise ranking problem (ordinal r egression) that 1 relates to predicting or guessing the ordering between obje cts on the basis of their observed predictors. The problem of ranking has num erous applications in practice, for instance in information retrieval, banking or quality control.

artificial intelligence, inequality, machine learning, (19 more...)

arXiv.org Machine Learning

1512.05698

Country: Europe > Poland (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Macau: Scalable Bayesian Multi-relational Factorization with Side Information using MCMC

Simm, Jaak, Arany, Adam, Zakeri, Pooya, Haber, Tom, Wegner, Jörg K., Chupakhin, Vladimir, Ceulemans, Hugo, Moreau, Yves

arXiv.org Machine LearningDec-17-2015

We propose Macau, a powerful and flexible Bayesian factorization method for heterogeneous data. Our model can factorize any set of entities and relations that can be represented by a relational model, including tensors and also multiple relations for each entity. Macau can also incorporate side information, specifically entity and relation features, which are crucial for predicting sparsely observed relations. Macau scales to millions of entity instances, hundred millions of observations, and sparse entity features with millions of dimensions. To achieve the scale up, we specially designed sampling procedure for entity and relation features that relies primarily on noise injection in linear regressions. We show performance and advanced features of Macau in a set of experiments, including challenging drug-protein activity prediction task.

artificial intelligence, machine learning, relation, (17 more...)

arXiv.org Machine Learning

1509.0461

Country: Asia > Macao (1.00)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Clustering and Inference From Pairwise Comparisons

Wu, Rui, Xu, Jiaming, Srikant, R., Massoulié, Laurent, Lelarge, Marc, Hajek, Bruce

arXiv.org Machine LearningDec-17-2015

Given a set of pairwise comparisons, the classical ranking problem computes a single ranking that best represents the preferences of all users. In this paper, we study the problem of inferring individual preferences, arising in the context of making personalized recommendations. In particular, we assume that there are $n$ users of $r$ types; users of the same type provide similar pairwise comparisons for $m$ items according to the Bradley-Terry model. We propose an efficient algorithm that accurately estimates the individual preferences for almost all users, if there are $r \max \{m, n\}\log m \log^2 n$ pairwise comparisons per type, which is near optimal in sample complexity when $r$ only grows logarithmically with $m$ or $n$. Our algorithm has three steps: first, for each user, compute the \emph{net-win} vector which is a projection of its $\binom{m}{2}$-dimensional vector of pairwise comparisons onto an $m$-dimensional linear subspace; second, cluster the users based on the net-win vectors; third, estimate a single preference for each cluster separately. The net-win vectors are much less noisy than the high dimensional vectors of pairwise comparisons and clustering is more accurate after the projection as confirmed by numerical experiments. Moreover, we show that, when a cluster is only approximately correct, the maximum likelihood estimation for the Bradley-Terry model is still close to the true preference.

artificial intelligence, machine learning, vector, (20 more...)

arXiv.org Machine Learning

1502.04631

Country: North America > United States (0.46)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

A Novel Minimum Divergence Approach to Robust Speaker Identification

Basu, Ayanendranath, Bose, Smarajit, Pal, Amita, Mukherjee, Anish, Das, Debasmita

arXiv.org Machine LearningDec-16-2015

In this work, a novel solution to the speaker identification problem is proposed through minimization of statistical divergences between the probability distribution (g). of feature vectors from the test utterance and the probability distributions of the feature vector corresponding to the speaker classes. This approach is made more robust to the presence of outliers, through the use of suitably modified versions of the standard divergence measures. The relevant solutions to the minimum distance methods are referred to as the minimum rescaled modified distance estimators (MRMDEs). Three measures were considered - the likelihood disparity, the Hellinger distance and Pearson's chi-square distance. The proposed approach is motivated by the observation that, in the case of the likelihood disparity, when the empirical distribution function is used to estimate g, it becomes equivalent to maximum likelihood classification with Gaussian Mixture Models (GMMs) for speaker classes, a highly effective approach used, for example, by Reynolds [22] based on Mel Frequency Cepstral Coefficients (MFCCs) as features. Significant improvement in classification accuracy is observed under this approach on the benchmark speech corpus NTIMIT and a new bilingual speech corpus NISIS, with MFCC features, both in isolation and in combination with delta MFCC features. Moreover, the ubiquitous principal component transformation, by itself and in conjunction with the principle of classifier combination, is found to further enhance the performance.

acoustic processing, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1512.05073

Country: North America > United States (0.93)

Genre: Research Report > Promising Solution (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.66)

Add feedback

Streaming Kernel Principal Component Analysis

Ghashami, Mina, Perry, Daniel, Phillips, Jeff M.

arXiv.org Machine LearningDec-16-2015

Kernel principal component analysis (KPCA) provides a concise set of basis vectors which capture non-linear structures within large data sets, and is a central tool in data analysis and learning. To allow for non-linear relations, typically a full $n \times n$ kernel matrix is constructed over $n$ data points, but this requires too much space and time for large values of $n$. Techniques such as the Nystr\"om method and random feature maps can help towards this goal, but they do not explicitly maintain the basis vectors in a stream and take more space than desired. We propose a new approach for streaming KPCA which maintains a small set of basis elements in a stream, requiring space only logarithmic in $n$, and also improves the dependence on the error parameter. Our technique combines together random feature maps with recent advances in matrix sketching, it has guaranteed spectral norm error bounds with respect to the original kernel matrix, and it compares favorably in practice to state-of-the-art approaches.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1512.05059

Country: North America > United States (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

An Average Classification Algorithm

van Rooyen, Brendan, Menon, Aditya Krishna, Williamson, Robert C.

arXiv.org Machine LearningDec-15-2015

Many classification algorithms produce a classifier that is a weighted average of kernel evaluations. When working with a high or infinite dimensional kernel, it is imperative for speed of evaluation and storage issues that as few training samples as possible are used in the kernel expansion. Popular existing approaches focus on altering standard learning algorithms, such as the Support Vector Machine, to induce sparsity, as well as post-hoc procedures for sparse approximations. Here we adopt the latter approach. We begin with a very simple classifier, given by the kernel mean $$ f(x) = \frac{1}{n} \sum\limits_{i=i}^{n} y_i K(x_i,x) $$ We then find a sparse approximation to this kernel mean via herding. The result is an accurate, easily parallelized algorithm for learning classifiers.

artificial intelligence, machine learning, preliminary version, (15 more...)

arXiv.org Machine Learning

1506.0152

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

Dimensionality-reduced subspace clustering

Heckel, Reinhard, Tschannen, Michael, Bölcskei, Helmut

arXiv.org Machine LearningDec-13-2015

Subspace clustering refers to the problem of clustering unlabeled high-dimensional data points into a union of low-dimensional linear subspaces, whose number, orientations, and dimensions are all unknown. In practice one may have access to dimensionality-reduced observations of the data only, resulting, e.g., from undersampling due to complexity and speed constraints on the acquisition device or mechanism. More pertinently, even if the high-dimensional data set is available it is often desirable to first project the data points into a lower-dimensional space and to perform clustering there; this reduces storage requirements and computational cost. The purpose of this paper is to quantify the impact of dimensionality reduction through random projection on the performance of three subspace clustering algorithms, all of which are based on principles from sparse signal recovery. Specifically, we analyze the thresholding based subspace clustering (TSC) algorithm, the sparse subspace clustering (SSC) algorithm, and an orthogonal matching pursuit variant thereof (SSC-OMP). We find, for all three algorithms, that dimensionality reduction down to the order of the subspace dimensions is possible without incurring significant performance degradation. Moreover, these results are order-wise optimal in the sense that reducing the dimensionality further leads to a fundamentally ill-posed clustering problem. Our findings carry over to the noisy case as illustrated through analytical results for TSC and simulations for SSC and SSC-OMP. Extensive experiments on synthetic and real data complement our theoretical findings.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1507.07105

Country: Europe (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.54)

Add feedback

Cross-validation of matching correlation analysis by resampling matching weights

Shimodaira, Hidetoshi

arXiv.org Machine LearningDec-13-2015

The strength of association between a pair of data vectors is represented by a nonnegative real number, called matching weight. For dimensionality reduction, we consider a linear transformation of data vectors, and define a matching error as the weighted sum of squared distances between transformed vectors with respect to the matching weights. Given data vectors and matching weights, the optimal linear transformation minimizing the matching error is solved by the spectral graph embedding of Yan et al. (2007). This method is a generalization of the canonical correlation analysis, and will be called as matching correlation analysis (MCA). In this paper, we consider a novel sampling scheme where the observed matching weights are randomly sampled from underlying true matching weights with small probability, whereas the data vectors are treated as constants. We then investigate a cross-validation by resampling the matching weights. Our asymptotic theory shows that the cross-validation, if rescaled properly, computes an unbiased estimate of the matching error with respect to the true matching weights. Existing ideas of cross-validation for resampling data vectors, instead of resampling matching weights, are not applicable here. MCA can be used for data vectors from multiple domains with different dimensions via an embarrassingly simple idea of coding the data vectors. This method will be called as cross-domain matching correlation analysis (CDMCA), and an interesting connection to the classical associative memory model of neural networks is also discussed.

artificial intelligence, machine learning, shimodaira cross-validation, (18 more...)

arXiv.org Machine Learning

1503.08471

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Cloud-based Electronic Health Records for Real-time, Region-specific Influenza Surveillance

Santillana, Mauricio, Nguyen, Andre, Louie, Tamara, Zink, Anna, Gray, Josh, Sung, Iyue, Brownstein, John S.

arXiv.org Machine LearningDec-12-2015

Introduction Influenza is a leading cause of death in the United States (US), where up to 50,000 are killed each year by influenza- ‐like illnesses (ILI) [1]. Therefore, monitoring, early detection, and prediction of influenza outbreaks are crucial to public health. Disease detection and surveillance systems provide epidemiologic intelligence that allows health officials to deploy preventive measures and help clinic and hospital administrators make optimal staffing and stocking decisions [2]. The US Centers for Disease Control and Prevention (CDC) monitors ILI in the US by gathering information from physicians' reports about patients with ILI seeking medical attention [3]. CDC's ILI data provides useful estimates of influenza activity; however, its availability has a known time lag of one to two weeks. This time lag is far from optimal since public health decisions need to be made based on information that is two weeks old. Systems capable of providing real- ‐time estimates of influenza activity are, thus, critical. Many attempts have been made to design methods capable of providing real- ‐time estimates of ILI activity in the US by leveraging Internet- ‐based data sources that could potentially measure ILI in an indirect manner [4, 5, 6, 7, 8, 9, 10, 11].

machine learning, prediction, real time system, (18 more...)

arXiv.org Machine Learning

1512.0399

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Architecture > Real Time Systems (0.94)
(2 more...)

Add feedback

Active Sampler: Light-weight Accelerator for Complex Data Analytics at Scale

Gao, Jinyang, Jagadish, H. V., Ooi, Beng Chin

arXiv.org Machine LearningDec-12-2015

Recent years have witnessed amazing outcomes from "Big Models" trained by "Big Data". Most popular algorithms for model training are iterative. Due to the surging volumes of data, we can usually afford to process only a fraction of the training data in each iteration. Typically, the data are either uniformly sampled or sequentially accessed. In this paper, we study how the data access pattern can affect model training. We propose an Active Sampler algorithm, where training data with more "learning value" to the model are sampled more frequently. The goal is to focus training effort on valuable instances near the classification boundaries, rather than evident cases, noisy data or outliers. We show the correctness and optimality of Active Sampler in theory, and then develop a light-weight vectorized implementation. Active Sampler is orthogonal to most approaches optimizing the efficiency of large-scale data analytics, and can be applied to most analytics models trained by stochastic gradient descent (SGD) algorithm. Extensive experimental evaluations demonstrate that Active Sampler can speed up the training procedure of SVM, feature selection and deep learning, for comparable training quality by 1.6-2.2x.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1512.0388

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
(2 more...)

Add feedback