Country
A Principle for Unsupervised Hierarchical Decomposition of Visual Scenes
Structure in a visual scene can be described at many levels of granularity. At a coarse level, the scene is composed of objects; at a finer level, each object is made up of parts, and the parts of subparts. In this work, I propose a simple principle by which such hierarchical structure can be extracted from visual scenes: Regularity in the relations among different parts of an object is weaker than in the internal structure of a part. This principle can be applied recursively to define part-whole relationships among elements in a scene. The principle does not make use of object models, categories, or other sorts of higher-level knowledge; rather, part-whole relationships can be established based on the statistics of a set of sample visual scenes. I illustrate with a model that performs unsupervised decomposition of simple scenes. The model can account for the results from a human learning experiment on the ontogeny of partwhole relationships.
Mechanisms of Generalization in Perceptual Learning
The learning of many visual perceptual tasks has been shown to be specific to practiced stimuli, while new stimuli require re-Iearning from scratch. Here we demonstrate generalization using a novel paradigm in motion discrimination where learning has been previously shown to be specific. We trained subjects to discriminate the directions of moving dots, and verified the previous results that learning does not transfer from the trained direction to a new one. However, by tracking the subjects' performance across time in the new direction, we found that their rate of learning doubled. Therefore, learning generalized in a task previously considered too difficult for generalization.
A Model for Associative Multiplication
Christianson, G. Bjorn, Becker, Suzanna
Despite the fact that mental arithmetic is based on only a few hundred basic facts and some simple algorithms, humans have a difficult time mastering the subject, and even experienced individuals make mistakes. Associative multiplication, the process of doing multiplication by memory without the use of rules or algorithms, is especially problematic.
Evidence for a Forward Dynamics Model in Human Adaptive Motor Control
Bhushan, Nikhil, Shadmehr, Reza
Based on computational principles, the concept of an internal model for adaptive control has been divided into a forward and an inverse model. However, there is as yet little evidence that learning control by the eNS is through adaptation of one or the other. Here we examine two adaptive control architectures, one based only on the inverse model and other based on a combination of forward and inverse models. We then show that for reaching movements of the hand in novel force fields, only the learning of the forward model results in key characteristics of performance that match the kinematics of human subjects. In contrast, the adaptive control system that relies only on the inverse model fails to produce the kinematic patterns observed in the subjects, despite the fact that it is more stable.
Barycentric Interpolators for Continuous Space and Time Reinforcement Learning
In order to find the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish sufficient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement functions.
Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes
Williams, John K., Singh, Satinder P.
Partially Observable Markov Decision Processes (pO "MOPs) constitute an important class of reinforcement learning problems which present unique theoretical and computational difficulties. In the absence of the Markov property, popular reinforcement learning algorithms such as Q-Iearning may no longer be effective, and memory-based methods which remove partial observability via state-estimation are notoriously expensive. An alternative approach is to seek a stochastic memoryless policy which for each observation of the environment prescribes a probability distribution over available actions that maximizes the average reward per timestep. A reinforcement learning algorithm which learns a locally optimal stochastic memoryless policy has been proposed by Jaakkola, Singh and Jordan, but not empirically verified. We present a variation of this algorithm, discuss its implementation, and demonstrate its viability using four test problems.
Improved Switching among Temporally Abstract Actions
Sutton, Richard S., Singh, Satinder P., Precup, Doina, Ravindran, Balaraman
In robotics and other control applications it is commonplace to have a preexisting set of controllers for solving subtasks, perhaps handcrafted or previously learned or planned, and still face a difficult problem of how to choose and switch among the controllers to solve an overall task as well as possible. In this paper we present a framework based on Markov decision processes and semi-Markov decision processes for phrasing this problem, a basic theorem regarding the improvement in performance that can be obtained by switching flexibly between given controllers, and example applications of the theorem. In particular, we show how an agent can plan with these high-level controllers and then use the results of such planning to find an even better plan, by modifying the existing controllers, with negligible additional cost and no re-planning. In one of our examples, the complexity of the problem is reduced from 24 billion state-action pairs to less than a million state-controller pairs. In many applications, solutions to parts of a task are known, either because they were handcrafted by people or because they were previously learned or planned. For example, in robotics applications, there may exist controllers for moving joints to positions, picking up objects, controlling eye movements, or navigating along hallways. More generally, an intelligent system may have available to it several temporally extended courses of action to choose from. In such cases, a key challenge is to take full advantage of the existing temporally extended actions, to choose or switch among them effectively, and to plan at their level rather than at the level of individual actions.
A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory
Suematsu, Nobuo, Hayashi, Akira
We have proved that the model learned by BLHT converges to the optimal model in given hypothesis space, 1{, which provides the most accurate predictions of percepts and rewards, given short-term memory. We believe this fact provides a solid basis for BLHT, and BLHT can be compared favorably with other methods using short-term memory.
Reinforcement Learning Based on On-Line EM Algorithm
On the other hand, applications to continuous state/action problems (Werbos, 1990; Doya, 1996; Sofge & White, 1992) are much more difficult than the finite state/action cases. Good function approximation methods and fast learning algorithms are crucial for successful applications. In this article, we propose a new RL method that has the above-mentioned two features. This method is based on an actor-critic architecture (Barto et al., 1983), although the detailed implementations of the actor and the critic are quite differ- Reinforcement Learning Based on On-Line EM Algorithm 1053 ent from those in the original actor-critic model. The actor and the critic in our method estimate a policy and a Q-function, respectively, and are approximated by Normalized Gaussian Networks (NGnet) (l'doody & Darken, 1989).