AITopics

We consider the tree structured group Lasso where the structure over the features can be represented as a tree with leaf nodes as features and internal nodes as clusters of the features. The structured regularization with a pre-defined tree structure is based on a group-Lasso penalty, where one group is defined for each node in the tree. Such a regularization can help uncover the structured sparsity, which is desirable for applications with some meaningful tree structures on the features. However, the tree structured group Lasso is challenging to solve due to the complex regularization. In this paper, we develop an efficient algorithm for the tree structured group Lasso. One of the key steps in the proposed algorithm is to solve the Moreau-Yosida regularization associated with the grouped tree structure. The main technical contributions of this paper include (1) we show that the associated Moreau-Yosida regularization admits an analytical solution, and (2) we develop an efficient algorithm for determining the effective interval for the regularization parameter. Our experimental results on the AR and JAFFE face data sets demonstrate the efficiency and effectiveness of the proposed algorithm.

algorithm, artificial intelligence, optimization problem, (16 more...)

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.40)

Beygelzimer, Alina, Hsu, Daniel J., Langford, John, Zhang, Tong

Agnostic Active Learning Without Constraints

We present and analyze an agnostic active learning algorithm that works without keeping a version space. This is unlike all previous approaches where a restricted set of candidate hypotheses is maintained throughout learning, and only hypotheses from this set are ever returned. By avoiding this version space approach, our algorithm sheds the computational burden and brittleness associated with maintaining version spaces, yet still allows for substantial improvements over supervised learning for classification.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Katahira, Kentaro, Okanoya, Kazuo, Okada, Masato

Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks

When animals repeatedly choose actions from multiple alternatives, they can allocate their choices stochastically depending on past actions and outcomes. It is commonly assumed that this ability is achieved by modifications in synaptic weights related to decision making. Choice behavior has been empirically found to follow Herrnstein’s matching law. Loewenstein & Seung (2006) demonstrated that matching behavior is a steady state of learning in neural networks if the synaptic weights change proportionally to the covariance between reward and neural activities. However, their proof did not take into account the change in entire synaptic distributions. In this study, we show that matching behavior is not necessarily a steady state of the covariance-based learning rule when the synaptic strength is sufficiently strong so that the fluctuations in input from individual sensory neurons influence the net input to output neurons. This is caused by the increasing variance in the input potential due to the diffusion of synaptic weights. This effect causes an undermatching phenomenon, which has been observed in many behavioral experiments. We suggest that the synaptic diffusion effects provide a robust neural mechanism for stochastic choice behavior.

health & medicine, neural network, neurology, (20 more...)

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report > New Finding (0.89)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Bickson, Danny, Guestrin, Carlos

Inference with Multivariate Heavy-Tails in Linear Models

Heavy-tailed distributions naturally occur in many real life problems. Unfortunately, it is typically not possible to compute inference in closed-form in graphical models which involve such heavy tailed distributions. In this work, we propose a novel simple linear graphical model for independent latent random variables, called linear characteristic model (LCM), defined in the characteristic function domain. Using stable distributions, a heavy-tailed family of distributions which is a generalization of Cauchy, L\'evy and Gaussian distributions, we show for the first time, how to compute both exact and approximate inference in such a linear multivariate graphical model. LCMs are not limited to only stable distributions, in fact LCMs are always defined for any random variables (discrete, continuous or a mixture of both). We provide a realistic problem from the field of computer networks to demonstrate the applicability of our construction. Other potential application is iterative decoding of linear channels with non-Gaussian noise.

artificial intelligence, machine learning, stable distribution, (19 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Virginia (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England (0.14)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Humans Learn Using Manifolds, Reluctantly

Rogers, Tim, Kalish, Chuck, Harrison, Joseph, Zhu, Jerry, Gibson, Bryan R.

When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human's ability to use a manifold in a semi-supervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary.

artificial intelligence, machine learning, participant, (17 more...)

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Hoffman, Matthew, Bach, Francis R., Blei, David M.

Online Learning for Latent Dirichlet Allocation

We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time.

algorithm, computer based training, educational technology, (23 more...)

Country:

North America > United States (0.28)
Europe (0.28)

Industry: Education > Educational Setting > Online (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Large Margin Learning of Upstream Scene Understanding Models

Zhu, Jun, Li, Li-jia, Fei-fei, Li, Xing, Eric P.

Upstream supervised topic models have been widely used for complicated scene understanding. However, existing maximum likelihood estimation (MLE) schemes can make the prediction model learning independent of latent topic discovery and result in an imbalanced prediction rule for scene classification. This paper presents a joint max-margin and max-likelihood learning method for upstream scene understanding models, in which latent topic discovery and prediction model estimation are closely coupled and well-balanced. The optimization problem is efficiently solved with a variational EM procedure, which iteratively solves an online loss-augmented SVM. We demonstrate the advantages of the large-margin approach on both an 8-category sports dataset and the 67-class MIT indoor scene dataset for scene categorization.

artificial intelligence, bayesian inference, scene model, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > California > Santa Clara County (0.14)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)

Srebro, Nathan, Sridharan, Karthik, Tewari, Ambuj

Smoothness, Low Noise and Fast Rates

We establish an excess risk bound of O(H R_n^2 + sqrt{H L*} R_n) for ERM with an H-smooth loss function and a hypothesis class with Rademacher complexity R_n, where L* is the best risk achievable by the hypothesis class. For typical hypothesis classes where R_n = sqrt{R/n}, this translates to a learning rate of ̃ O(RH/n) in the separable (L* = 0) case and O(RH/n + sqrt{L* RH/n}) more generally. We also provide similar guarantees for online and stochastic convex optimization of a smooth non-negative objective.

artificial intelligence, loss function, machine learning, (16 more...)

Country: North America > United States > Texas (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Wohrer, Adrien, Romo, Ranulfo, Machens, Christian K.

Linear readout from a neural population with partial correlation data

How much information does a neural population convey about a stimulus? Answers to this question are known to strongly depend on the correlation of response variability in neural populations. These noise correlations, however, are essentially immeasurable as the number of parameters in a noise correlation matrix grows quadratically with population size. Here, we suggest to bypass this problem by imposing a parametric model on a noise correlation matrix. Our basic assumption is that noise correlations arise due to common inputs between neurons. On average, noise correlations will therefore reflect signal correlations, which can be measured in neural populations. We suggest an explicit parametric dependency between signal and noise correlations. We show how this dependency can be used to fill the gaps" in noise correlations matrices using an iterative application of the Wishart distribution over positive definitive matrices. We apply our method to data from the primary somatosensory cortex of monkeys performing a two-alternative-forced choice task. We compare the discrimination thresholds read out from the population of recorded neurons with the discrimination threshold of the monkey and show that our method predicts different results than simpler, average schemes of noise correlations."

correlation, health & medicine, neurology, (20 more...)

Country: North America > Mexico > Mexico City (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Zhang, Xinhua, Saha, Ankan, Vishwanathan, S.v.n.

Lower Bounds on Rate of Convergence of Cutting Plane Methods

In a recent paper Joachims (2006) presented SVM-Perf, a cutting plane method (CPM) for training linear Support Vector Machines (SVMs) which converges to an $\epsilon$ accurate solution in $O(1/\epsilon^{2})$ iterations. By tightening the analysis, Teo et al. (2010) showed that $O(1/\epsilon)$ iterations suffice. Given the impressive convergence speed of CPM on a number of practical problems, it was conjectured that these rates could be further improved. In this paper we disprove this conjecture. We present counter examples which are not only applicable for training linear SVMs with hinge loss, but also hold for support vector methods which optimize a \emph{multivariate} performance score. However, surprisingly, these problems are not inherently hard. By exploiting the structure of the objective function we can devise an algorithm that converges in $O(1/\sqrt{\epsilon})$ iterations.

algorithm, artificial intelligence, optimization problem, (17 more...)

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)