AITopics

1503.00135

Country:

North America > United States (0.46)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.16)

Genre: Research Report > Experimental Study (0.69)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.85)

Paul, Arnab, Venkatasubramanian, Suresh

Why does Deep Learning work? - A perspective from Group Theory

arXiv.org Machine LearningFeb-28-2015

Why does Deep Learning work? What representations does it capture? How do higher-order representations emerge? We study these questions from the perspective of group theory, thereby opening a new approach towards a theory of Deep learning. One factor behind the recent resurgence of the subject is a key algorithmic step called pre-training: first search for a good generative model for the input samples, and repeat the process one layer at a time. We show deeper implications of this simple principle, by establishing a connection with the interplay of orbits and stabilizers of group actions. Although the neural networks themselves may not form groups, we show the existence of {\em shadow} groups whose elements serve as close approximations. Over the shadow groups, the pre-training step, originally introduced as a mechanism to better initialize a network, becomes equivalent to a search for features with minimal orbits. Intuitively, these features are in a way the {\em simplest}. Which explains why a deep learning network learns simple features first. Next, we show how the same principle, when repeated in the deeper layers, can capture higher order representations, and why representation complexity increases as the layers get deeper.

deep learning, neural network, stabilizer, (18 more...)

1412.6621

Country: North America > United States > Utah (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Koolen, Wouter M., van Erven, Tim

Second-order Quantile Methods for Experts and Combinatorial Games

We aim to design strategies for sequential decision making that adjust to the difficulty of the learning problem. We study this question both in the setting of prediction with expert advice, and for more general combinatorial decision tasks. We are not satisfied with just guaranteeing minimax regret rates, but we want our algorithms to perform significantly better on easy data. Two popular ways to formalize such adaptivity are second-order regret bounds and quantile bounds. The underlying notions of 'easy data', which may be paraphrased as "the learning problem has small variance" and "multiple decisions are useful", are synergetic. But even though there are sophisticated algorithms that exploit one of the two, no existing algorithm is able to adapt to both. In this paper we outline a new method for obtaining such adaptive algorithms, based on a potential function that aggregates a range of learning rates (which are essential tuning parameters). By choosing the right prior we construct efficient algorithms and show that they reap both benefits by proving the first bounds that are both second-order and incorporate quantiles.

algorithm, artificial intelligence, machine learning, (16 more...)

1502.08009

Country: Europe (0.14)

Genre: Research Report (0.40)

Industry: Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Jamieson, Kevin, Talwalkar, Ameet

Non-stochastic Best Arm Identification and Hyperparameter Optimization

Motivated by the task of hyperparameter optimization, we introduce the non-stochastic best-arm identification problem. Within the multi-armed bandit literature, the cumulative regret objective enjoys algorithms and analyses for both the non-stochastic and stochastic settings while to the best of our knowledge, the best-arm identification framework has only been considered in the stochastic setting. We introduce the non-stochastic setting under this framework, identify a known algorithm that is well-suited for this setting, and analyze its behavior. Next, by leveraging the iterative nature of standard machine learning algorithms, we cast hyperparameter optimization as an instance of non-stochastic best-arm identification, and empirically evaluate our proposed algorithm on this task. Our empirical results show that, by allocating more resources to promising hyperparameter settings, we typically achieve comparable test accuracies an order of magnitude faster than baseline methods.

algorithm, artificial intelligence, big data, (14 more...)

1502.07943

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Nandy, Shaurabh, Golden, Richard M.

Generative Modeling of Hidden Functional Brain Networks

Functional connectivity refers to the temporal statistical relationship between spatially distinct brain regions and is usually inferred from the time series coherence/correlation in brain activity between regions of interest. In human functional brain networks, the network structure is often inferred from functional magnetic resonance imaging (fMRI) blood oxygen level dependent (BOLD) signal. Since the BOLD signal is a proxy for neuronal activity, it is of interest to learn the latent functional network structure. Additionally, despite a core set of observations about functional networks such as small-worldness, modularity, exponentially truncated degree distributions, and presence of various types of hubs, very little is known about the computational principles which can give rise to these observations. This paper introduces a Hidden Markov Random Field framework for the purpose of representing, estimating, and evaluating latent neuronal functional relationships between different brain regions using fMRI data.

artificial intelligence, health & medicine, neurology, (20 more...)

1412.6602

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Zhang, Zemin, Aeron, Shuchin

Exact tensor completion using t-SVD

In this paper we focus on the problem of completion of multidimensional arrays (also referred to as tensors) from limited sampling. Our approach is based on a recently proposed tensor-Singular Value Decomposition (t-SVD) [1]. Using this factorization one can derive notion of tensor rank, referred to as the tensor tubal rank, which has optimality properties similar to that of matrix rank derived from SVD. As shown in [2] some multidimensional data, such as panning video sequences exhibit low tensor tubal rank and we look at the problem of completing such data under random sampling of the data cube. We show that by solving a convex optimization problem, which minimizes the tensor nuclear norm obtained as the convex relaxation of tensor tubal rank, one can guarantee recovery with overwhelming probability as long as samples in proportion to the degrees of freedom in t-SVD are observed. In this sense our results are order-wise optimal. The conditions under which this result holds are very similar to the incoherency conditions for the matrix completion, albeit we define incoherency under the algebraic set-up of t-SVD. We show the performance of the algorithm on some real data sets and compare it with other existing approaches based on tensor flattening and Tucker decomposition.

artificial intelligence, optimization problem, tensor, (15 more...)

1502.04689

Country: North America > United States > Nevada (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

Sussillo, David, Abbott, L. F.

Random Walk Initialization for Training Very Deep Feedforward Networks

Training very deep networks is an important open problem in machine learning. One of many difficulties is that the norm of the back-propagated error gradient can grow or decay exponentially. Here we show that training very deep feed-forward networks (FFNs) is not as difficult as previously thought. Unlike when back-propagation is applied to a recurrent network, application to an FFN amounts to multiplying the error gradient by a different random matrix at each layer. We show that the successive application of correctly scaled random matrices to an initial vector results in a random walk of the log of the norm of the resulting vectors, and we compute the scaling that makes this walk unbiased. The variance of the random walk grows only linearly with network depth and is inversely proportional to the size of each layer. Practically, this implies a gradient whose log-norm scales with the square root of the network depth and shows that the vanishing gradient problem can be mitigated by increasing the width of the layers. Mathematical analyses and experimental results using stochastic gradient descent to optimize tasks related to the MNIST and TIMIT datasets are provided to support these claims. Equations for the optimal matrix scaling are provided for the linear and ReLU cases.

deep learning, equation, neural network, (20 more...)

1412.6558

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Siddiqui, Md Amran, Fern, Alan, Dietterich, Thomas G., Wong, Weng-Keen

Sequential Feature Explanations for Anomaly Detection

In many applications, an anomaly detection system presents the most anomalous data instance to a human analyst, who then must determine whether the instance is truly of interest (e.g. a threat in a security setting). Unfortunately, most anomaly detectors provide no explanation about why an instance was considered anomalous, leaving the analyst with no guidance about where to begin the investigation. To address this issue, we study the problems of computing and evaluating sequential feature explanations (SFEs) for anomaly detectors. An SFE of an anomaly is a sequence of features, which are presented to the analyst one at a time (in order) until the information contained in the highlighted features is enough for the analyst to make a confident judgement about the anomaly. Since analyst effort is related to the amount of information that they consider in an investigation, an explanation's quality is related to the number of features that must be revealed to attain confidence. One of our main contributions is to present a novel framework for large scale quantitative evaluations of SFEs, where the quality measure is based on analyst effort. To do this we construct anomaly detection benchmarks from real data sets along with artificial experts that can be simulated for evaluation. Our second contribution is to evaluate several novel explanation approaches within the framework and on traditional anomaly detection benchmarks, offering several insights into the approaches.

anomaly, data mining, explanation, (16 more...)

1503.00038

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceFeb-27-2015

Plagiarism Detection in Polyphonic Music using Monaural Signal Separation

De, Soham, Roy, Indradyumna, Prabhakar, Tarunima, Suneja, Kriti, Chaudhuri, Sourish, Singh, Rita, Raj, Bhiksha

Given the large number of new musical tracks released each year, automated approaches to plagiarism detection are essential to help us track potential violations of copyright. Most current approaches to plagiarism detection are based on musical similarity measures, which typically ignore the issue of polyphony in music. We present a novel feature space for audio derived from compositional modelling techniques, commonly used in signal separation, that provides a mechanism to account for polyphony without incurring an inordinate amount of computational overhead. We employ this feature representation in conjunction with traditional audio feature representations in a classification framework which uses an ensemble of distance features to characterize pairs of songs as being plagiarized or not. Our experiments on a database of about 3000 musical track pairs show that the new feature space characterization produces significant improvements over standard baselines.

educational technology, representation, text processing, (20 more...)

arXiv.org Artificial Intelligence

1503.00022

Country: Asia > India (0.47)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.83)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Kasarapu, Parthan, Allison, Lloyd

Minimum message length estimation of mixtures of multivariate Gaussian and von Mises-Fisher distributions

arXiv.org Machine LearningFeb-26-2015

Mixture modelling involves explaining some observed evidence using a combination of probability distributions. The crux of the problem is the inference of an optimal number of mixture components and their corresponding parameters. This paper discusses unsupervised learning of mixture models using the Bayesian Minimum Message Length (MML) criterion. To demonstrate the effectiveness of search and inference of mixture parameters using the proposed approach, we select two key probability distributions, each handling fundamentally different types of data: the multivariate Gaussian distribution to address mixture modelling of data distributed in Euclidean space, and the multivariate von Mises-Fisher (vMF) distribution to address mixture modelling of directional data distributed on a unit hypersphere. The key contributions of this paper, in addition to the general search and inference methodology, include the derivation of MML expressions for encoding the data using multivariate Gaussian and von Mises-Fisher distributions, and the analytical derivation of the MML estimates of the parameters of the two distributions. Our approach is tested on simulated and real world data sets. For instance, we infer vMF mixtures that concisely explain experimentally determined three-dimensional protein conformations, providing an effective null model description of protein structures that is central to many inference problems in structural bioinformatics. The experimental results demonstrate that the performance of our proposed search and inference method along with the encoding schemes improve on the state of the art mixture modelling techniques.

bayesian inference, health & medicine, message length, (17 more...)

1502.07813

Country:

Oceania (0.28)
Europe (0.27)
North America > United States > New Jersey > Hudson County (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (1.00)