AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Dissimilarity-based Ensembles for Multiple Instance Learning

Cheplygina, Veronika, Tax, David M. J., Loog, Marco

arXiv.org Machine LearningFeb-6-2014

In multiple instance learning, objects are sets (bags) of feature vectors (instances) rather than individual feature vectors. In this paper we address the problem of how these bags can best be represented. Two standard approaches are to use (dis)similarities between bags and prototype bags, or between bags and prototype instances. The first approach results in a relatively low-dimensional representation determined by the number of training bags, while the second approach results in a relatively high-dimensional representation, determined by the total number of instances in the training set. In this paper a third, intermediate approach is proposed, which links the two approaches and combines their strengths. Our classifier is inspired by a random subspace ensemble, and considers subspaces of the dissimilarity space, defined by subsets of instances, as prototypes. We provide guidelines for using such an ensemble, and show state-of-the-art performances on a range of multiple instance learning problems.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TNNLS.2015.2424254

1402.1349

Country: Europe > Netherlands (0.47)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.54)

Add feedback

An SIR Graph Growth Model for the Epidemics of Communicable Diseases

Dhanjal, Charanpal, Clémençon, Stéphan

arXiv.org Machine LearningFeb-5-2014

It is the main purpose of this paper to introduce a graph-valued stochastic process in order to model the spread of a communicable infectious disease. The major novelty of the SIR model we promote lies in the fact that the social network on which the epidemics is taking place is not specified in advance but evolves through time, accounting for the temporal evolution of the interactions involving infective individuals. Without assuming the existence of a fixed underlying network model, the stochastic process introduced describes, in a flexible and realistic manner, epidemic spread in non-uniformly mixing and possibly heterogeneous populations. It is shown how to fit such a (parametrised) model by means of Approximate Bayesian Computation methods based on graph-valued statistics. The concepts and statistical methods described in this paper are finally applied to a real epidemic dataset, related to the spread of HIV in Cuba in presence of a contact tracing system, which permits one to reconstruct partly the evolution of the graph of sexual partners diagnosed HIV positive between 1986 and 2006.

artificial intelligence, epidemic, machine learning, (17 more...)

arXiv.org Machine Learning

1312.2565

Country: North America > Cuba (0.48)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology > HIV (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)

Add feedback

Demystifying Information-Theoretic Clustering

Steeg, Greg Ver, Galstyan, Aram, Sha, Fei, DeDeo, Simon

arXiv.org Machine LearningFeb-5-2014

We propose a novel method for clustering data which is grounded in information-theoretic principles and requires no parametric assumptions. Previous attempts to use information theory to define clusters in an assumption-free way are based on maximizing mutual information between data and cluster labels. We demonstrate that this intuition suffers from a fundamental conceptual flaw that causes clustering performance to deteriorate as the amount of data increases. Instead, we return to the axiomatic foundations of information theory to define a meaningful clustering measure based on the notion of consistency under coarse-graining for finite data.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Machine Learning

1310.421

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Data Science (0.68)

Add feedback

Linear and Parallel Learning of Markov Random Fields

Mizrahi, Yariv Dror, Denil, Misha, de Freitas, Nando

arXiv.org Machine LearningFeb-5-2014

We introduce a new embarrassingly parallel parameter learning algorithm for Markov random fields with untied parameters which is efficient for a large class of practical models. Our algorithm parallelizes naturally over cliques and, for graphs of bounded degree, its complexity is linear in the number of cliques. Unlike its competitors, our algorithm is fully parallel and for log-linear models it is also data efficient, requiring only the local sufficient statistics of the data to estimate parameters.

artificial intelligence, machine learning, maximum likelihood, (14 more...)

arXiv.org Machine Learning

1308.6342

Country:

Asia (0.28)
North America (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Sequential Model-Based Ensemble Optimization

Lacoste, Alexandre, Larochelle, Hugo, Laviolette, François, Marchand, Mario

arXiv.org Machine LearningFeb-4-2014

One of the most tedious tasks in the application of machine learning is model selection, i.e. hyperparameter selection. Fortunately, recent progress has been made in the automation of this process, through the use of sequential model-based optimization (SMBO) methods. This can be used to optimize a cross-validation performance of a learning algorithm over the value of its hyperparameters. However, it is well known that ensembles of learned models almost consistently outperform a single model, even if properly selected. In this paper, we thus propose an extension of SMBO methods that automatically constructs such ensembles. This method builds on a recently proposed ensemble construction paradigm known as agnostic Bayesian learning. In experiments on 22 regression and 39 classification data sets, we confirm the success of this proposed approach, which is able to outperform model selection with SMBO.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1402.0796

Country: North America > Canada > Quebec (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.30)

Add feedback

Learning Ordered Representations with Nested Dropout

Rippel, Oren, Gelbart, Michael A., Adams, Ryan P.

arXiv.org Machine LearningFeb-4-2014

In this paper, we study ordered representations of data in which different dimensions have different degrees of importance. To learn these representations we introduce nested dropout, a procedure for stochastically removing coherent nested sets of hidden units in a neural network. We first present a sequence of theoretical results in the simple case of a semi-linear autoencoder. We rigorously show that the application of nested dropout enforces identifiability of the units, which leads to an exact equivalence with PCA. We then extend the algorithm to deep models and demonstrate the relevance of ordered representations to a number of applications. Specifically, we use the ordered property of the learned codes to construct hash-based data structures that permit very fast retrieval, achieving retrieval in time logarithmic in the database size and independent of the dimensionality of the representation. This allows codes that are hundreds of times longer than currently feasible for retrieval. We therefore avoid the diminished quality associated with short codes, while still performing retrieval that is competitive in speed with existing methods. We also show that ordered representations are a promising way to learn adaptive compression for efficient online data reconstruction.

artificial intelligence, machine learning, representation, (13 more...)

arXiv.org Machine Learning

1402.0915

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Jointly Clustering Rows and Columns of Binary Matrices: Algorithms and Trade-offs

Xu, Jiaming, Wu, Rui, Zhu, Kai, Hajek, Bruce, Srikant, R., Ying, Lei

arXiv.org Machine LearningFeb-4-2014

In standard clustering problems, data points are represented by vectors, and by stacking them together, one forms a data matrix with row or column cluster structure. In this paper, we consider a class of binary matrices, arising in many applications, which exhibit both row and column cluster structure, and our goal is to exactly recover the underlying row and column clusters by observing only a small fraction of noisy entries. We first derive a lower bound on the minimum number of observations needed for exact cluster recovery. Then, we propose three algorithms with different running time and compare the number of observations needed by them for successful cluster recovery. Our analytical results show smooth time-data trade-offs: one can gradually reduce the computational complexity when increasingly more observations are available.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1310.0512

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)

Add feedback

Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease

Duan, Hubert Haoyang

arXiv.org Machine LearningFeb-3-2014

From a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on genetic variations at the DNA base pair level, called Single-Nucleotide Polymorphisms (SNPs), collected from the Ontario Heart Genomics Study (OHGS). First, the thesis explains two commonly used supervised learning algorithms, the k-Nearest Neighbour (k-NN) and Random Forest classifiers, and includes a complete proof that the k-NN classifier is universally consistent in any finite dimensional normed vector space. Second, the thesis introduces two dimensionality reduction steps, Random Projections, a known feature extraction technique based on the Johnson-Lindenstrauss lemma, and a new method termed Mass Transportation Distance (MTD) Feature Selection for discrete domains. Then, this thesis compares the performance of Random Projections with the k-NN classifier against MTD Feature Selection and Random Forest, for predicting artery disease based on accuracy, the F-Measure, and area under the Receiver Operating Characteristic (ROC) curve. The comparative results demonstrate that MTD Feature Selection with Random Forest is vastly superior to Random Projections and k-NN. The Random Forest classifier is able to obtain an accuracy of 0.6660 and an area under the ROC curve of 0.8562 on the OHGS genetic dataset, when 3335 SNPs are selected by MTD Feature Selection for classification. This area is considerably better than the previous high score of 0.608 obtained by Davies et al. in 2010 on the same dataset.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Machine Learning

1402.0459

Country:

North America > United States (0.67)
North America > Canada > Ontario (0.48)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Transductive Learning with Multi-class Volume Approximation

Niu, Gang, Dai, Bo, Plessis, Marthinus Christoffel du, Sugiyama, Masashi

arXiv.org Machine LearningFeb-3-2014

Given a hypothesis space, the large volume principle by Vladimir Vapnik prioritizes equivalence classes according to their volume in the hypothesis space. The volume approximation has hitherto been successfully applied to binary learning problems. In this paper, we extend it naturally to a more general definition which can be applied to several transductive problem settings, such as multi-class, multi-label and serendipitous learning. Even though the resultant learning method involves a non-convex optimization problem, the globally optimal solution is almost surely unique and can be obtained in O(n^3) time. We theoretically provide stability and error analyses for the proposed method, and then experimentally show that it is promising.

artificial intelligence, machine learning, mavr, (15 more...)

arXiv.org Machine Learning

1402.0288

Country:

North America > United States (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Add feedback

Collaborative Receptive Field Learning

Kong, Shu, Jiang, Zhuolin, Yang, Qiang

arXiv.org Machine LearningFeb-2-2014

The challenge of object categorization in images is largely due to arbitrary translations and scales of the foreground objects. To attack this difficulty, we propose a new approach called collaborative receptive field learning to extract specific receptive fields (RF's) or regions from multiple images, and the selected RF's are supposed to focus on the foreground objects of a common category. To this end, we solve the problem by maximizing a submodular function over a similarity graph constructed by a pool of RF candidates. However, measuring pairwise distance of RF's for building the similarity graph is a nontrivial problem. Hence, we introduce a similarity metric called pyramid-error distance (PED) to measure their pairwise distances through summing up pyramid-like matching errors over a set of low-level features. Besides, in consistent with the proposed PED, we construct a simple nonparametric classifier for classification. Experimental results show that our method effectively discovers the foreground objects in images, and improves classification performance.

artificial intelligence, machine learning, submodular function, (15 more...)

arXiv.org Machine Learning

1402.017

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback