AITopics

To find out how the representations of structured visual objects depend on the co-occurrence statistics of their constituents, we exposed subjects to a set of composite images with tight control exerted over (1) the conditional probabilities of the constituent fragments, and (2) the value of Barlow's criterion of "suspicious coincidence" (the ratio of joint probability to the product of marginals). We then compared the part verification response times for various probe/target combinations before and after the exposure. For composite probes, the speedup was much larger for targets that contained pairs of fragments perfectly predictive of each other, compared to those that did not. This effect was modulated by the significance of their co-occurrence as estimated by Barlow's criterion. For lone-fragment probes, the speedup in all conditions was generally lower than for composites. These results shed light on the brain's strategies for unsupervised acquisition of structural information in vision.

experiment, probability, representation, (13 more...)

Country:

North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > United States > North Carolina > Wake County > Cary (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.53)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.36)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.36)

Courville, Aaron C., Touretzky, David S.

Modeling Temporal Structure in Classical Conditioning

The Temporal Coding Hypothesis of Miller and colleagues [7] suggests that animals integrate related temporal patterns of stimuli into single memory representations. We formalize this concept using quasi-Bayes estimation to update the parameters of a constrained hidden Markov model. This approach allows us to account for some surprising temporal effects in the second order conditioning experiments of Miller et al. [1, 2, 3], which other models are unable to explain.

experiment, representation, stimuli, (16 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Orange County > Irvine (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Littman, Michael L., Kearns, Michael J., Singh, Satinder P.

An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games

We describe a new algorithm for computing a Nash equilibrium in graphical games, a compact representation for multi-agent systems that we introduced in previous work. The algorithm is the first to compute equilibria both efficiently and exactly for a nontrivial class of graphical games. 1 Introduction Seeking to replicate the representational and computational benefits that graphical models have provided to probabilistic inference, several recent works have introduced graph-theoretic frameworks for the study of multi-agent systems (La Mura 2000; Koller and Milch 2001; Kearns et al. 2001). In the simplest of these formalisms, each vertex represents a single agent, and the edges represent pairwise interaction between agents. As with many familiar network models, the macroscopic behavior of a large system is thus implicitly described by its local interactions, and the computational challenge is to extract the global states of interest. Classical game theory is typically used to model multi-agent interactions, and the global states of interest are thus the so-called Nash equilibria, in which no agent has a unilateral incentive to deviate.

algorithm, breakpoint, breakpoint policy, (14 more...)

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Heisele, Bernd, Serre, Thomas, Pontil, Massimiliano, Vetter, Thomas, Poggio, Tomaso

Categorization by Learning and Combining Object Parts

We describe an algorithm for automatically learning discriminative components of objects with SVM classifiers. It is based on growing image parts by minimizing theoretical bounds on the error probability of an SVM. Component-based face classifiers are then combined in a second stage to yield a hierarchical SVM classifier. Experimental results in face classification show considerable robustness against rotations in depth and suggest performance at significantly better level than other face detection systems. Novel aspects of our approach are: a) an algorithm to learn component-based classification experts and their combination, b) the use of 3-D morphable models for training, and c) a maximum operation on the output of each component classifier which may be relevant for biological models of visual recognition.

classifier, component-based classifier, detection, (12 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)
Europe > Germany > Baden-Württemberg > Freiburg (0.05)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.94)

Wang, Xin, Dietterich, Thomas G.

Stabilizing Value Function Approximation with the BFBP Algorithm

However, online RL algorithms such as SARSA(A) have been shown experimentally to have difficulty converging when applied with function approximators. Theoretical analysis has not been able to prove convergence, even in the case-of linear function approximators.

algorithm, bfbp, function approximator, (15 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.

contribution, derivative, ylx, (15 more...)

Country: Europe > France (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Mannor, Shie, Shimkin, Nahum

The Steering Approach for Multi-Criteria Reinforcement Learning

We consider the problem of learning to attain multiple goals in a dynamic environment, which is initially unknown. In addition, the environment may contain arbitrarily varying elements related to actions of other agents or to non-stationary moves of Nature. This problem is modelled as a stochastic (Markov) game between the learning agent and an arbitrary player, with a vector-valued reward function. The objective of the learning agent is to have its long-term average reward vector belong to a given target set. We devise an algorithm for achieving this task, which is based on the theory of approachability for stochastic games. This algorithm combines, in an appropriate way, a finite set of standard, scalar-reward learning algorithms. Sufficient conditions are given for the convergence of the learning algorithm to a general target set. The specialization of these results to the single-controller Markov decision problem are discussed as well.

algorithm, reward vector, vector, (14 more...)

Country: Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Littman, Michael L., Sutton, Richard S.

Predictive Representations of State

We show that states of a dynamical system can be usefully represented by multi-step, action-conditional predictions of future observations. State representations that are grounded in data in this way may be easier to learn, generalize better, and be less dependent on accurate prior models than, for example, POMDP state representations. Building on prior work by Jaeger and by Rivest and Schapire, in this paper we compare and contrast a linear specialization of the predictive approach with the state representations used in POMDPs and in k-order Markov models. Ours is the first specific formulation of the predictive idea that includes both stochasticity and actions (controls). We show that any system has a linear predictive state representation with number of predictions no greater than the number of states in its minimal POMDP model.

pomdp, representation, vector, (16 more...)

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > New Jersey (0.04)
North America > United States > Colorado (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Lagoudakis, Michail G., Parr, Ronald

Model-Free Least-Squares Policy Iteration

We propose a new approach to reinforcement learning which combines least squares function approximation with policy iteration. Our method is model-free and completely off policy. We are motivated by the least squares temporal difference learning algorithm (LSTD), which is known for its efficient use of sample experiences compared to pure temporal difference algorithms. LSTD is ideal for prediction problems, however it heretofore has not had a straightforward application to control problems. Moreover, approximations learned by LSTD are strongly influenced by the visitation distribution over states.

algorithm, approximation, iteration, (13 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > North Carolina > Durham County > Durham (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
(2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

A Natural Policy Gradient

Kakade, Sham M.

We provide a natural gradient method that represents the steepest descent direction based on the underlying structure of the parameter space. Although gradient methods cannot make large changes in the values of the parameters, we show that the natural gradient is moving toward choosing a greedy optimal action rather than just a better action. These greedy optimal actions are those that would be chosen under one improvement step of policy iteration with approximate, compatible value functions, as defined by Sutton et al. [9]. We then show drastic performance improvements in simple MDPs and in the more challenging MDP of Tetris. 1 Introduction There has been a growing interest in direct policy-gradient methods for approximate planning in large Markov decision problems (MDPs). Unfortunately, the standard gradient descent rule is noncovariant.

gradient, gradient method, natural gradient, (14 more...)

Country:

North America > United States > Massachusetts (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)