AITopics

1504.028

Country: North America > United States (0.14)

Genre:

Research Report > Experimental Study (0.49)
Research Report > New Finding (0.35)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.35)

arXiv.org Machine LearningApr-10-2015

Deep Narrow Boltzmann Machines are Universal Approximators

Montufar, Guido

We show that deep narrow Boltzmann machines are universal approximators of probability distributions on the activities of their visible units, provided they have sufficiently many hidden layers, each containing the same number of units as the visible layer. We show that, within certain parameter domains, deep Boltzmann machines can be studied as feedforward networks. We provide upper and lower bounds on the sufficient depth and width of universal approximators. These results settle various intuitions regarding undirected networks and, in particular, they show that deep narrow Boltzmann machines are at least as compact universal approximators as narrow sigmoid belief networks and restricted Boltzmann machines, with respect to the currently available bounds for those models.

deep learning, neural network, probability distribution, (18 more...)

1411.3784

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Gribonval, Rémi, Jenatton, Rodolphe, Bach, Francis, Kleinsteuber, Martin, Seibert, Matthias

Sample Complexity of Dictionary Learning and other Matrix Factorizations

Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), $K$-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to $\sqrt{\log(n)/n}$ w.r.t.\ the number of samples $n$ for the considered matrix factorization techniques.

artificial intelligence, constraint, machine learning, (14 more...)

1312.379

Country:

Europe > France > Île-de-France (0.14)
Europe > Germany > Bavaria (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

`local' vs. `global' parameters -- breaking the gaussian complexity barrier

Mendelson, Shahar

We show that if $F$ is a convex class of functions that is $L$-subgaussian, the error rate of learning problems generated by independent noise is equivalent to a fixed point determined by `local' covering estimates of the class, rather than by the gaussian averages. To that end, we establish new sharp upper and lower estimates on the error rate for such problems.

artificial intelligence, machine learning, probability, (18 more...)

1504.02191

Country:

Asia > Middle East > Israel (0.14)
Oceania > Australia (0.14)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Temerinac-Ott, Maja, Naik, Armaghan W., Murphy, Robert F.

Deciding when to stop: Efficient stopping of active learning guided drug-target prediction

Active learning has shown to reduce the number of experiments needed to obtain high-confidence drug-target predictions. However, in order to actually save experiments using active learning, it is crucial to have a method to evaluate the quality of the current prediction and decide when to stop the experimentation process. Only by applying reliable stoping criteria to active learning, time and costs in the experimental process can be actually saved. We compute active learning traces on simulated drug-target matrices in order to learn a regression model for the accuracy of the active learner. By analyzing the performance of the regression model on simulated data, we design stopping criteria for previously unseen experimental matrices. We demonstrate on four previously characterized drug effect data sets that applying the stopping criteria can result in upto 40% savings of the total experiments for highly accurate predictions.

artificial intelligence, experiment, health & medicine, (21 more...)

1504.02406

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Raiko, Tapani, Berglund, Mathias, Alain, Guillaume, Dinh, Laurent

Techniques for Learning Binary Stochastic Feedforward Neural Networks

Stochastic binary hidden units in a multi-layer perceptron (MLP) network give at least three potential benefits when compared to deterministic MLP networks. (1) They allow to learn one-to-many type of mappings. (2) They can be used in structured prediction problems, where modeling the internal structure of the output is important. (3) Stochasticity has been shown to be an excellent regularizer, which makes generalization performance potentially better in general. However, training stochastic networks is considerably more difficult. We study training using M samples of hidden activations per input. We show that the case M=1 leads to a fundamentally different behavior where the network tries to avoid stochasticity. We propose two new estimators for the training gradient and propose benchmark tests for comparing training algorithms. Our experiments confirm that training stochastic networks is difficult and show that the proposed two estimators perform favorably among all the five known estimators.

artificial intelligence, estimator, neural network, (15 more...)

1406.2989

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

arXiv.org Machine LearningApr-8-2015

Structured Matrix Completion with Applications to Genomic Data Integration

Cai, Tianxi, Cai, T. Tony, Zhang, Anru

Matrix completion has attracted significant recent attention in many fields including statistics, applied mathematics and electrical engineering. Current literature on matrix completion focuses primarily on independent sampling models under which the individual observed entries are sampled independently. Motivated by applications in genomic data integration, we propose a new framework of structured matrix completion (SMC) to treat structured missingness by design. Specifically, our proposed method aims at efficient matrix recovery when a subset of the rows and columns of an approximately low-rank matrix are observed. We provide theoretical justification for the proposed SMC method and derive lower bound for the estimation errors, which together establish the optimal rate of recovery over certain classes of approximately low-rank matrices. Simulation studies show that the method performs well in finite sample under a variety of configurations. The method is applied to integrate several ovarian cancer genomic studies with different extent of genomic measurements, which enables us to construct more accurate prediction rules for ovarian cancer survival.

information fusion, matrix, oncology, (21 more...)

1504.01823

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)

Genre:

Research Report > Experimental Study (0.67)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.84)

Konda, Kishore, Memisevic, Roland, Krueger, David

Zero-bias autoencoders and the benefits of co-adapting features

arXiv.org Machine LearningApr-8-2015

Regularized training of an autoencoder typically results in hidden unit biases that take on large negative values. We show that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input data and act as a selection mechanism that ensures sparsity of the representation. We then show that negative biases impede the learning of data distributions whose intrinsic dimensionality is high. We also propose a new activation function that decouples the two roles of the hidden layer and that allows us to learn representations on data with very high intrinsic dimensionality, where standard autoencoders typically fail. Since the decoupled activation function acts like an implicit regularizer, the model can be trained by minimizing the reconstruction error of training data, without requiring any additional regularization.

autoencoder, deep learning, neural network, (17 more...)

1402.3337

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

arXiv.org Machine LearningApr-8-2015

Protein Contact Prediction by Integrating Joint Evolutionary Coupling Analysis and Supervised Learning

Ma, Jianzhu, Wang, Sheng, Wang, Zhiyong, Xu, Jinbo

Protein contacts contain important information for protein structure and functional study, but contact prediction from sequence remains very challenging. Both evolutionary coupling (EC) analysis and supervised machine learning methods are developed to predict contacts, making use of different types of information, respectively. This paper presents a group graphical lasso (GGL) method for contact prediction that integrates joint multi-family EC analysis and supervised learning. Different from existing single-family EC analysis that uses residue co-evolution information in only the target protein family, our joint EC analysis uses residue co-evolution in both the target family and its related families, which may have divergent sequences but similar folds. To implement joint EC analysis, we model a set of related protein families using Gaussian graphical models (GGM) and then co-estimate their precision matrices by maximum-likelihood, subject to the constraint that the precision matrices shall share similar residue co-evolution patterns. To further improve the accuracy of the estimated precision matrices, we employ a supervised learning method to predict contact probability from a variety of evolutionary and non-evolutionary information and then incorporate the predicted probability as prior into our GGL framework. Experiments show that our method can predict contacts much more accurately than existing methods, and that our method performs better on both conserved and family-specific contacts.

inductive learning, optimization problem, prediction, (20 more...)

1312.2988

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Yang, Jiyan, Gittens, Alex

Tensor machines for learning target-specific polynomial features

arXiv.org Machine LearningApr-7-2015

Recent years have demonstrated that using random feature maps can significantly decrease the training and testing times of kernel-based algorithms without significantly lowering their accuracy. Regrettably, because random features are target-agnostic, typically thousands of such features are necessary to achieve acceptable accuracies. In this work, we consider the problem of learning a small number of explicit polynomial features. Our approach, named Tensor Machines, finds a parsimonious set of features by optimizing over the hypothesis class introduced by Kar and Karnick for random feature maps in a target-specific manner. Exploiting a natural connection between polynomials and tensors, we provide bounds on the generalization error of Tensor Machines. Empirically, Tensor Machines behave favorably on several real-world datasets compared to other state-of-the-art techniques for learning polynomial features, and deliver significantly more parsimonious models.

artificial intelligence, machine learning, polynomial, (14 more...)

1504.01697

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)