AITopics

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly nonlinear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions. Hinton et al. recently introduced a greedy layer-wise unsupervised learning algorithm for Deep Belief Networks (DBN), a generative model with many layers of hidden causal variables. In the context of the above optimization problem, we study this algorithm empirically and explore variants to better understand its success and extend it to cases where the inputs are continuous or where the structure of the input distribution is not revealing enough about the variable to be predicted in a supervised task. Our experiments also confirm the hypothesis that the greedy layer-wise unsupervised training strategy mostly helps the optimization, by initializing weights in a region near a good local minimum, giving rise to internal distributed representations that are high-level abstractions of the input, bringing better generalization.

algorithm, dbn, representation, (17 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > Canada > Quebec > Montreal (0.04)
North America > United States > Colorado > Denver County > Denver (0.04)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Ben-sasson, Eli, Kalai, Ehud, Kalai, Adam

An Approach to Bounded Rationality

A central question in game theory and artificial intelligence is how a rational agent should behave in a complex environment, given that it cannot perform unbounded computations. We study strategic aspects of this question by formulating a simple model of a game with additional costs (computational or otherwise) for each strategy. First we connect this to zero-sum games, proving a counterintuitive generalization of the classic min-max theorem to zero-sum games with the addition of strategy costs. We then show that potential games with strategy costs remain potential games. Both zero-sum and potential games with strategy costs maintain a very appealing property: simple learning dynamics converge to equilibrium.

equilibria, payoff, strategy cost, (15 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Israel (0.04)

Industry: Leisure & Entertainment > Games > Chess (0.48)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence (1.00)

Ben-David, Shai, Blitzer, John, Crammer, Koby, Pereira, Fernando

Analysis of Representations for Domain Adaptation

Discriminative learning methods for classification perform well when training and test data are drawn from the same distribution. In many situations, though, we have labeled training data for a source domain, and we wish to learn a classifier which performs well on a target domain with a different distribution. Under what conditions can we adapt a classifier trained on the source domain for use in the target domain? Intuitively, a good feature representation is a crucial factor in the success of domain adaptation. We formalize this intuition theoretically with a generalization bound for domain adaption. Our theory illustrates the tradeoffs inherent in designing a representation for domain adaptation and gives a new justification for a recently proposed model. It also points toward a promising new model for domain adaptation: one which explicitly minimizes the difference between the source and target domains, while at the same time maximizing the margin of the training set.

a-distance, domain adaptation, representation, (14 more...)

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Belkin, Mikhail, Niyogi, Partha

Convergence of Laplacian Eigenmaps

Geometrically based methods for various tasks of machine learning have attracted considerable attention over the last few years. In this paper we show convergence of eigenvectors of the point cloud Laplacian to the eigenfunctions of the Laplace-Beltrami operator on the underlying manifold, thus establishing the first convergence results for a spectral dimensionality reduction algorithm in the manifold setting.

convergence, eigenfunction, manifold, (15 more...)

Country:

North America > United States > Illinois > Cook County > Chicago (0.05)
North America > United States > Ohio > Franklin County > Columbus (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Bartolozzi, Chiara, Indiveri, Giacomo

A selective attention multi--chip system with dynamic synapses and spiking neurons

Selective attention is the strategy used by biological sensory systems to solve the problem of limited parallel processing capacity: salient subregions of the input stimuli are serially processed, while non-salient regions are suppressed. We present an mixed mode analog/digital Very Large Scale Integration implementation of a building block for a multi-chip neuromorphic hardware model of selective attention. We describe the chip's architecture and its behavior, when its is part of a multi-chip system with a spiking retina as input, and show how it can be used to implement in real-time flexible models of bottom-up attention.

neuron, pixel, retina, (15 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Architecture > Real Time Systems (0.35)

Bartlett, Peter L., Traskin, Mikhail

AdaBoost is Consistent

The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we consider the stopping strategy to be used in AdaBoost to achieve universal consistency.

adaboost, algorithm, classifier, (17 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Bartlett, Peter L., Tewari, Ambuj

Sample Complexity of Policy Search with Known Dynamics

We consider methods that try to find a good policy for a Markov decision process by choosing one from a given class. The policy is chosen based on its empirical performance in simulations. We are interested in conditions on the complexity of the policy class that ensure the success of such simulation based policy search methods. We show that under bounds on the amount of computation involved in computing policies, transition dynamics and rewards, uniform convergence of empirical estimates to true value functions occurs. Previously, such results were derived by assuming boundedness of pseudodimension and Lipschitz continuity. These assumptions and ours are both stronger than the usual combinatorial complexity measures. We show, via minimax inequalities, that this is essential: boundedness of pseudodimension or fat-shattering dimension alone is not sufficient.

combinatorial dimension, dimension, policy class, (15 more...)

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > Middle East > Jordan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Barber, David, Mesot, Bertrand

A Novel Gaussian Sum Smoother for Approximate Inference in Switching Linear Dynamical Systems

We introduce a method for approximate smoothed inference in a class of switching linear dynamical systems, based on a novel form of Gaussian Sum smoother. This class includes the switching Kalman Filter and the more general case of switch transitions dependent on the continuous latent state. The method improves on the standard Kim smoothing approach by dispensing with one of the key approximations, thus making fuller use of the available future information. Whilst the only central assumption required is projection to a mixture of Gaussians, we show that an additional conditional independence assumption results in a simpler but stable and accurate alternative. Unlike the alternative unstable Expectation Propagation procedure, our method consists only of a single forward and backward pass and is reminiscent of the standard smoothing'correction' recursions in the simpler linear dynamical system. The algorithm performs well on both toy experiments and in a large scale application to noise robust speech recognition.

approximation, gaussian, recursion, (11 more...)

Country:

North America > United States > Massachusetts > Norfolk County > Norwood (0.04)
Europe > Switzerland (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.93)

Barber, David, Chiappa, Silvia

Unified Inference for Variational Bayesian Linear Gaussian State-Space Models

Linear Gaussian State-Space Models are widely used and a Bayesian treatment of parameters is therefore of considerable interest. The approximate Variational Bayesian method applied to these models is an attractive approach, used successfully in applications ranging from acoustics to bioinformatics. The most challenging aspect of implementing the method is in performing inference on the hidden state sequence of the model. We show how to convert the inference problem so that standard Kalman Filtering/Smoothing recursions from the literature may be applied. This is in contrast to previously published approaches based on Belief Propagation. Our framework both simplifies and unifies the inference problem, so that future applications may be more easily developed. We demonstrate the elegance of the approach on Bayesian temporal ICA, with an application to finding independent dynamical processes underlying noisy EEG signals.

application, lgssm, linear gaussian state-space model, (14 more...)

Country:

Europe > Switzerland (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Hillel, Aharon B., Weinshall, Daphna

Subordinate class recognition using relational object models

We address the problem of subordinate class recognition, like the distinction between different types of motorcycles. Our approach is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories (like motorcycles), while subordinate categories are more often defined by part properties (like'jagged wheels'). Accordingly, we suggest a two-stage algorithm: First, a relational part based object model is learnt using unsegmented object images from the inclusive class (e.g., motorcycles in general). The model is then used to build a class-specific vector representation for images, where each entry corresponds to a model's part. In the second stage we train a standard discriminative classifier to classify subclass instances (e.g., cross motorcycles) based on the class-specific vector representation. We describe extensive experimental results with several subclasses. The proposed algorithm typically gives better results than a competing one-step algorithm, or a two stage algorithm where classification is based on a model of the subordinate class.

category, recognition, subordinate class, (16 more...)

Country: Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)