Goto

Collaborating Authors

 Europe


CMUNITED-98: RoboCup-98 Small-Robot World Champion Team

AI Magazine

Although our previous and processes the images, giving the positions team had accurate navigation, it was not easily of each robot and the ball. This information is interruptible, which is necessary for operating sent to an off-board controller and distributed in a highly dynamic environment. The final design includes a battery of inherent mechanical inaccuracies and module supplying three independent unforeseen interventions from other agents. It also includes a single board RoboCup competition in Paris (Stone, Veloso, containing all the required electronic circuitry and Riley 1999; Kitano et al. 1997). These improvements by an array of four infrared sensors, which include a robust low-level control algorithm, which handles a moving target with is enabled or disabled by the software control.


CMUNITED-98 Simulator Team

AI Magazine

We view robotic soccer as an example of a periodic team synchronization (PTS) domain. By perceiving the with no adverse effects on the achievement world, each fully distributed agent builds a of G. Then, based can be thought of as times at which the on a complex set of behaviors, it chooses an team is "offline." In general (that is, when the agents are Although acting autonomously, each agent "online"), the domain is dynamic and real time, contributes to the overall team's goal. Agents receive sensory p at time t.


Overview of RoboCup-98

AI Magazine

The Robot World Cup Soccer Games and Conferences (RoboCup) are a series of competitions and events designed to promote the full integration of AI and robotics research. Following the first RoboCup, held in Nagoya, Japan, in 1997, RoboCup-98 was held in Paris from 2-9 July, overlapping with the real World Cup soccer competition. RoboCup-98 included competitions in three leagues: (1) the simulation league, (2) the real robot small-size league, and (3) the real robot middle-size league. Champion teams were cmunited-98 in both the simulation and the real robot small-size leagues and cs-freiburg (Freiburg, Germany) in the real robot middle-size league. RoboCup-98 also included a Scientific Challenge Award, which was given to three research groups for their simultaneous development of fully automatic commentator systems for the RoboCup simulator league. Over 15,000 spectators watched the games, and 120 international media provided worldwide coverage of the competition.


A Model of Inductive Bias Learning

Journal of Artificial Intelligence Research

A major problem in machine learning is that of inductive bias: how to choose a learner's hypothesis space so that it is large enough to contain a solution to the problem being learnt, yet small enough to ensure reliable generalization from reasonably-sized training sets. Typically such bias is supplied by hand through the skill and insights of experts. In this paper a model for automatically learning bias is investigated. The central assumption of the model is that the learner is embedded within an environment of related learning tasks. Within such an environment the learner can sample from multiple tasks, and hence it can search for a hypothesis space that contains good solutions to many of the problems in the environment. Under certain restrictions on the set of all hypothesis spaces available to the learner, we show that a hypothesis space that performs well on a sufficiently large number of training tasks will also perform well when learning novel tasks in the same environment. Explicit bounds are also derived demonstrating that learning multiple tasks within an environment of related tasks can potentially give much better generalization than learning a single task.


Reasoning on Interval and Point-based Disjunctive Metric Constraints in Temporal Contexts

Journal of Artificial Intelligence Research

We introduce a temporal model for reasoning on disjunctive metric constraints on intervals and time points in temporal contexts. This temporal model is composed of a labeled temporal algebra and its reasoning algorithms. The labeled temporal algebra defines labeled disjunctive metric point-based constraints, where each disjunct in each input disjunctive constraint is univocally associated to a label. Reasoning algorithms manage labeled constraints, associated label lists, and sets of mutually inconsistent disjuncts. These algorithms guarantee consistency and obtain a minimal network. Additionally, constraints can be organized in a hierarchy of alternative temporal contexts. Therefore, we can reason on context-dependent disjunctive metric constraints on intervals and points. Moreover, the model is able to represent non-binary constraints, such that logical dependencies on disjuncts in constraints can be handled. The computational cost of reasoning algorithms is exponential in accordance with the underlying problem complexity, although some improvements are proposed.


Planning Graph as a (Dynamic) CSP: Exploiting EBL, DDB and other CSP Search Techniques in Graphplan

Journal of Artificial Intelligence Research

This paper reviews the connections between Graphplan's planning-graph and the dynamic constraint satisfaction problem and motivates the need for adapting CSP search techniques to the Graphplan algorithm. It then describes how explanation based learning, dependency directed backtracking, dynamic variable ordering, forward checking, sticky values and random-restart search strategies can be adapted to Graphplan. Empirical results are provided to demonstrate that these augmentations improve Graphplan's performance significantly (up to 1000x speedups) on several benchmark problems. Special attention is paid to the explanation-based learning and dependency directed backtracking techniques as they are empirically found to be most useful in improving the performance of Graphplan.



Reinforcement Learning for Trading

Neural Information Processing Systems

In this paper, we propose to use recurrent reinforcement learning to directly optimize such trading system performance functions, and we compare two different reinforcement learning methods. The first, Recurrent Reinforcement Learning, uses immediate rewards to train the trading systems, while the second (Q-Learning (Watkins 1989)) approximates discounted future rewards. These methodologies can be applied to optimizing systems designed to trade a single security or to trade portfolios . In addition, we propose a novel value function for risk-adjusted return that enables learning to be done online: the differential Sharpe ratio. Trading system profits depend upon sequences of interdependent decisions, and are thus path-dependent. Optimal trading decisions when the effects of transactions costs, market impact and taxes are included require knowledge of the current system state. In Moody, Wu, Liao & Saffell (1998), we demonstrate that reinforcement learning provides a more elegant and effective means for training trading systems when transaction costs are included, than do more standard supervised approaches.


A Phase Space Approach to Minimax Entropy Learning and the Minutemax Approximations

Neural Information Processing Systems

There has been much recent work on measuring image statistics and on learning probability distributions on images. We observe that the mapping from images to statistics is many-to-one and show it can be quantified by a phase space factor. This phase space approach throws light on the Minimax Entropy technique for learning Gibbs distributions on images with potentials derived from image statistics and elucidates the ambiguities that are inherent to determining the potentials. In addition, it shows that if the phase factor can be approximated by an analytic distribution then this approximation yields a swift "Minutemax" algorithm that vastly reduces the computation time for Minimax entropy learning. An illustration of this concept, using a Gaussian to approximate the phase factor, gives a good approximation to the results of Zhu and Mumford (1997) in just seconds of CPU time. The phase space approach also gives insight into the multi-scale potentials found by Zhu and Mumford (1997) and suggests that the forms of the potentials are influenced greatly by phase space considerations. Finally, we prove that probability distributions learned in feature space alone are equivalent to Minimax Entropy learning with a multinomial approximation of the phase factor. 1 Introduction Bayesian probability theory gives a powerful framework for visual perception (Knill and Richards 1996). This approach, however, requires specifying prior probabilities and likelihood functions. Learning these probabilities is difficult because it requires estimating distributions on random variables of very high dimensions (for example, images with 200 x 200 pixels, or shape curves of length 400 pixels).


Utilizing lime: Asynchronous Binding

Neural Information Processing Systems

A binding problem occurs when two different events (or objects) are represented identically. For example, representing "John hit Ted" by activating the units JOHN, HIT, and TED would lead to a binding problem because the same pattern of activation would also be used to represent "Ted hit John". The binding problem is ubiquitous and is a concern whenever internal representations are postulated. In addition to guarding against the binding problem, an effective binding mechanism must construct representations that assist processing. For instance, different states of the world must be represented in a manner that assists in discovering commonalities between disparate states, allowing for category formation and analogical processing.