AITopics

Journal of Artificial Intelligence ResearchFeb-1-2005

Reinforcement Learning for Agents with Many Sensors and Actuators Acting in Categorizable Environments

Porta, J. M., Celaya, E.

In this paper, we confront the problem of applying reinforcement learning to agents that perceive the environment through many sensors and that can perform parallel actions using many actuators as is the case in complex autonomous robots. We argue that reinforcement learning can only be successfully applied to this case if strong assumptions are made on the characteristics of the environment in which the learning is performed, so that the relevant sensor readings and motor commands can be readily identified. The introduction of such assumptions leads to strongly-biased learning systems that can eventually lose the generality of traditional reinforcement-learning algorithms. In this line, we observe that, in realistic situations, the reward received by the robot depends only on a reduced subset of all the executed actions and that only a reduced subset of the sensor inputs (possibly different in each situation and for each action) are relevant to predict the reward. We formalize this property in the so called 'categorizability assumption' and we present an algorithm that takes advantage of the categorizability of the environment, allowing a decrease in the learning time with respect to existing reinforcement-learning algorithms. Results of the application of the algorithm to a couple of simulated realistic-robotic problems (landmark-based navigation and the six-legged robot gait generation) are reported to validate our approach and to compare it to existing flat and generalization-based reinforcement-learning approaches.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

doi: 10.1613/jair.1437

AI Access Foundation

10401

Country:

Europe (0.67)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Journal of Artificial Intelligence ResearchFeb-1-2005

Restricted Value Iteration: Theory and Algorithms

Zhang, W., Zhang, N. L.

Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs.

artificial intelligence, health & medicine, machine learning, (17 more...)

doi: 10.1613/jair.1379

AI Access Foundation

10400

Country:

Asia > China > Hong Kong (0.14)
North America > United States > Pennsylvania (0.14)
North America > United States > Oregon (0.14)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Journal of Artificial Intelligence ResearchFeb-1-2005

Combining Spatial and Temporal Logics: Expressiveness vs. Complexity

Gabelaia, D., Kontchakov, R., Kurucz, A., Wolter, F., Zakharyaschev, M.

In this paper, we construct and investigate a hierarchy of spatio-temporal formalisms that result from various combinations of propositional spatial and temporal logics such as the propositional temporal logic PTL, the spatial logics RCC-8, BRCC-8, S4u and their fragments. The obtained results give a clear picture of the trade-off between expressiveness and `computational realisability' within the hierarchy. We demonstrate how different combining principles as well as spatial and temporal primitives can produce NP-, PSPACE-, EXPSPACE-, 2EXPSPACE-complete, and even undecidable spatio-temporal logics out of components that are at most NP- or PSPACE-complete.

Journal of Artificial Intelligence ResearchJan-1-2005

Finding Approximate POMDP solutions Through Belief Compression

Roy, N., Gordon, G., Thrun, S.

Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.

artificial intelligence, machine learning, north america government, (18 more...)

doi: 10.1613/jair.1496

AI Access Foundation

10398

Country:

Europe (0.67)
North America > Canada > British Columbia (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
(4 more...)

Genre:

Research Report (0.46)
Workflow (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (0.46)
Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Felzenszwalb, Pedro F., Huttenlocher, Daniel P., Kleinberg, Jon M.

Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

In applying Hidden Markov Models to the analysis of massive data streams, it is often necessary to use an artificially reduced set of states; this is due in large part to the fact that the basic HMM estimation algorithms have a quadratic dependence on the size of the state set. We present algorithms that reduce this computational bottleneck to linear or near-linear time, when the states can be embedded in an underlying grid of parameters. This type of state representation arises in many domains; in particular, we show an application to traffic analysis at a high-volume Web site.

algorithm, artificial intelligence, machine learning, (17 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Suzuki, Jun, Sasaki, Yutaka, Maeda, Eisaku

Kernels for Structured Natural Language Data

In this paper, we focus on tasks in the application areas of NLP, such as Machine Translation, Text Summarization, Text Categorization and Question Answering.

artificial intelligence, kernel, text processing, (16 more...)

Country: Asia > Japan (0.71)

Industry: Government (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)

Kim, Woojae, Navarro, Daniel J., Pitt, Mark A., Myung, In J.

An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science

Despite the popularity of connectionist models in cognitive science, their performance can often be difficult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns efficiently. We demonstrate the approach using two localist network models of speech perception.

artificial intelligence, merge, neural network, (17 more...)

Country: North America > United States > Ohio (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.71)

Quinlan, Michael J., Chalup, Stephan K., Middleton, Richard H.

Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

This article addresses the issues of colour classification and collision detection asthey occur in the legged league robot soccer environment of RoboCup. We show how the method of one-class classification with support vectormachines (SVMs) can be applied to solve these tasks satisfactorily usingthe limited hardware capacity of the prescribed Sony AIBO quadruped robots. The experimental evaluation shows an improvement over our previous methods of ellipse fitting for colour classification and the statistical approach used for collision detection.

artificial intelligence, classification, soccer, (16 more...)

Country: North America > United States (0.14)

Industry: Leisure & Entertainment > Sports > Soccer (0.88)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)

Rudin, Cynthia, Daubechies, Ingrid, Schapire, Robert E.

On the Dynamics of Boosting

In order to understand AdaBoost's dynamics, especially its ability to maximize margins, we derive an associated simplified nonlinear iterated map and analyze its behavior in low-dimensional cases. We find stable cycles for these cases, which can explicitly be used to solve for Ada-Boost's output. By considering AdaBoost as a dynamical system, we are able to prove Rätsch and Warmuth's conjecture that AdaBoost may fail to converge to a maximal-margin combined classifier when given a'nonoptimal' weaklearning algorithm.

adaboost, artificial intelligence, machine learning, (17 more...)