AITopics

In this paper we consider the problem of learning an n x n Kernel matrix from m similarity matrices under general convex loss. Past research have extensively studied the m =1 case and have derived several algorithms which require sophisticated techniques like ACCP, SOCP, etc. The existing algorithms do not apply if one uses arbitrary losses and often can not handle m > 1 case. We present several provably convergent iterative algorithms, where each iteration requires either an SVM or a Multiple Kernel Learning (MKL) solver for m > 1 case. One of the major contributions of the paper is to extend the well known Mirror Descent(MD) framework to handle Cartesian product of psd matrices. This novel extension leads to an algorithm, called EMKL, which solves the problem in O(m^2 log n) iterations; in each iteration one solves an MKL involving m kernels and m eigen-decomposition of n x n matrices. By suitably defining a restriction on the objective function, a faster version of EMKL is proposed, called REKL, which avoids the eigen-decomposition. An alternative to both EMKL and REKL is also suggested which requires only an SVM solver. Experimental results on real world protein data set involving several similarity matrices illustrate the efficacy of the proposed algorithms.

artificial intelligence, machine learning, similarity matrix, (17 more...)

Country: Asia > Middle East (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

Kumar, Akshat, Zilberstein, Shlomo

MAP Estimation for Graphical Models by Likelihood Maximization

Computing a {\em maximum a posteriori} (MAP) assignment in graphical models is a crucial inference problem for many practical applications. Several provably convergent approaches have been successfully developed using linear programming (LP) relaxation of the MAP problem. We present an alternative approach, which transforms the MAP problem into that of inference in a finite mixture of simple Bayes nets. We then derive the Expectation Maximization (EM) algorithm for this mixture that also monotonically increases a lower bound on the MAP assignment until convergence. The update equations for the EM algorithm are remarkably simple, both conceptually and computationally, and can be implemented using a graph-based message passing paradigm similar to max-product computation. We experiment on the real-world protein design dataset and show that EM's convergence rate is significantly higher than the previous LP relaxation based approach MPLP. EM achieves a solution quality within $95$\% of optimal for most instances and is often an order-of-magnitude faster than MPLP.

Kulesza, Alex, Taskar, Ben

Structured Determinantal Point Processes

We present a novel probabilistic model for distributions over sets of structures -- for example, sets of sequences, trees, or graphs. The critical characteristic of our model is a preference for diversity: sets containing dissimilar structures are more likely. Our model is a marriage of structured probabilistic models, like Markov random fields and context free grammars, with determinantal point processes, which arise in quantum physics as models of particles with repulsive interactions. We extend the determinantal point process model to handle an exponentially-sized set of particles (structures) via a natural factorization of the model into parts. We show how this factorization leads to tractable algorithms for exact inference, including computing marginals, computing conditional probabilities, and sampling. Our algorithms exploit a novel polynomially-sized dual representation of determinantal point processes, and use message passing over a special semiring to compute relevant quantities. We illustrate the advantages of the model on tracking and articulated pose estimation problems.

algorithm, artificial intelligence, machine learning, (19 more...)

Country: North America > United States > Pennsylvania (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.54)

Kolter, J. Z., Batra, Siddharth, Ng, Andrew Y.

Energy Disaggregation via Discriminative Sparse Coding

Energy disaggregation is the task of taking a whole-home energy signal and separating it into its component appliances. Studies have shown that having device-level energy information can cause users to conserve significant amounts of energy, but current electricity meters only report whole-home data. Thus, developing algorithmic methods for disaggregation presents a key technical challenge in the effort to maximize energy conservation. In this paper, we examine a large scale energy disaggregation task, and apply a novel extension of sparse coding to this problem. In particular, we develop a method, based upon structured prediction, for discriminatively training sparse coding algorithms specifically to maximize disaggregation performance. We show that this significantly improves the performance of sparse coding algorithms on the energy task and illustrate how these disaggregation results can provide useful information about energy usage.

artificial intelligence, machine learning, sparse, (16 more...)

Country: North America > United States > Massachusetts (0.28)

Industry: Energy > Power Industry (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Kleiner, Ariel, Rahimi, Ali, Jordan, Michael I.

Random Conic Pursuit for Semidefinite Programming

We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm.

artificial intelligence, machine learning, random conic pursuit, (18 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Kim, Taehwan, Shakhnarovich, Gregory, Urtasun, Raquel

Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

Sparse coding has recently become a popular approach in computer vision to learn dictionaries of natural images. In this paper we extend sparse coding to learn interpretable spatio-temporal primitives of human motion. We cast the problem of learning spatio-temporal primitives as a tensor factorization problem and introduce constraints to learn interpretable primitives. In particular, we use group norms over those tensors, diagonal constraints on the activations as well as smoothness constraints that are inherent to human motion. We demonstrate the effectiveness of our approach to learn interpretable representations of human motion from motion capture data, and show that our approach outperforms recently developed matching pursuit and sparse coding algorithms.

artificial intelligence, machine learning, sparse, (18 more...)

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Learning Convolutional Feature Hierarchies for Visual Recognition

Kavukcuoglu, Koray, Sermanet, Pierre, Boureau, Y-lan, Gregor, Karol, Mathieu, Michael, Cun, Yann L.

We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. While sparse coding has become an increasingly popular method for learning visual features, it is most often trained at the patch level. Applying the resulting filters convolutionally results in highly redundant codes because overlapping patches are encoded in isolation. By training convolutionally over large image windows, our method reduces the redudancy between feature vectors at neighboring locations and improves the efficiency of the overall representation. In addition to a linear decoder that reconstructs the image from sparse features, our method trains an efficient feed-forward encoder that predicts quasi-sparse features from the input. While patch-based training rarely produces anything but oriented edge detectors, we show that convolutional training produces highly diverse filters, including center-surround filters, corner detectors, cross detectors, and oriented grating detectors. We show that using these filters in multi-stage convolutional network architecture improves performance on a number of visual recognition and detection tasks.

artificial intelligence, machine learning, sparse, (17 more...)

Country: North America > United States > California (0.28)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Katahira, Kentaro, Okanoya, Kazuo, Okada, Masato

Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks

When animals repeatedly choose actions from multiple alternatives, they can allocate their choices stochastically depending on past actions and outcomes. It is commonly assumed that this ability is achieved by modifications in synaptic weights related to decision making. Choice behavior has been empirically found to follow Herrnstein’s matching law. Loewenstein & Seung (2006) demonstrated that matching behavior is a steady state of learning in neural networks if the synaptic weights change proportionally to the covariance between reward and neural activities. However, their proof did not take into account the change in entire synaptic distributions. In this study, we show that matching behavior is not necessarily a steady state of the covariance-based learning rule when the synaptic strength is sufficiently strong so that the fluctuations in input from individual sensory neurons influence the net input to output neurons. This is caused by the increasing variance in the input potential due to the diffusion of synaptic weights. This effect causes an undermatching phenomenon, which has been observed in many behavioral experiments. We suggest that the synaptic diffusion effects provide a robust neural mechanism for stochastic choice behavior.

artificial intelligence, choice probability, machine learning, (16 more...)

Country: Asia > Japan > Honshū > Kantō (0.14)

Genre: Research Report > New Finding (0.89)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)

Kale, Satyen, Reyzin, Lev, Schapire, Robert E.

Non-Stochastic Bandit Slate Problems

We consider bandit problems, motivated by applications in online advertising and news story selection, in which the learner must repeatedly select a slate, that is, a subset of size s from K possible actions, and then receives rewards for just the selected actions. The goal is to minimize the regret with respect to total reward of the best slate computed in hindsight. We consider unordered and ordered versions of the problem, and give efficient algorithms which have regret O(sqrt(T)), where the constant depends on the specific nature of the problem. We also consider versions of the problem where we have access to a number of policies which make recommendations for slates in every round, and give algorithms with O(sqrt(T)) regret for competing with the best such policy as well. We make use of the technique of relative entropy projections combined with the usual multiplicative weight update algorithm to obtain our algorithms.

artificial intelligence, data mining, machine learning, (18 more...)

Country: North America > United States (0.68)

Industry:

Marketing (0.48)
Media (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.49)

Joulin, Armand, Ponce, Jean, Bach, Francis R.

Efficient Optimization for Discriminative Latent Class Models

Dimensionality reduction is commonly used in the setting of multi-label supervised classification to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression; we show that this model provides a probabilistic interpretation of discriminative clustering methods with added benefits in terms of number of hyperparameters and optimization. While expectation-maximization (EM) algorithm is commonly used to learn these models, its optimization usually leads to local minimum because it relies on a non-convex cost function with many such local minima. To avoid this problem, we introduce a local approximation of this cost function, which leads to a quadratic non-convex optimization problem over a product of simplices. In order to minimize such functions, we propose an efficient algorithm based on convex relaxation and low-rank representation of our data, which allows to deal with large instances. Experiments on text document classification show that the new model outperforms other supervised dimensionality reduction methods, while simulations on unsupervised clustering show that our probabilistic formulation has better properties than existing discriminative clustering methods.

algorithm, artificial intelligence, machine learning, (16 more...)

Country: North America > United States (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)