Fuzzy Logic
Stabilizing Value Function Approximation with the BFBP Algorithm
Wang, Xin, Dietterich, Thomas G.
Our BFBP (Batch Fit to Best Paths) algorithm alternates between an exploration phase (during which trajectories are generated to try to find fragments of the optimal policy) and a function fitting phase (during which a function approximator is fit to the best known paths from start states to terminal states). An advantage of this approach is that batch value-function fitting is a global process, which allows it to address the tradeoffs in function approximation that cannot be handled by local, online algorithms.
Batch Value Function Approximation via Support Vectors
Dietterich, Thomas G., Wang, Xin
One formulation is based on SVM regression; the second is based on the Bellman equation; and the third seeks only to ensure that good moves have an advantage over bad moves. All formulations attemptto minimize the number of support vectors while fitting the data. Experiments in a difficult, synthetic maze problem show that all three formulations give excellent performance, but the advantage formulation is much easier to train. Unlike policy gradient methods,the kernel methods described here can easily'adjust the complexity of the function approximator to fit the complexity of the value function.
An AI-Based Approach to Destination Control in Elevators
Koehler, Jana, Ottiger, Daniel
Not widely known by the AI community, elevator control has become a major field of application for AI technologies. Techniques such as neural networks, genetic algorithms, fuzzy rules and, recently, multiagent systems and AI planning have been adopted by leading elevator companies not only to improve the transportation capacity of conventional elevator systems but also to revolutionize the way in which elevators interact with and serve passengers. In this article, we begin with an overview of AI techniques adopted by this industry and explain the motivations behind the continuous interest in AI. We review and summarize publications that are not easily accessible from the common AI sources. In the second part, we present in more detail a recent development project to apply AI planning and multiagent systems to elevator control problems.
Reasoning within Fuzzy Description Logics
Description Logics (DLs) are suitable, well-known, logics for managing structured knowledge. They allow reasoning about individuals and well defined concepts, i.e., set of individuals with common properties. The experience in using DLs in applications has shown that in many cases we would like to extend their capabilities. In particular, their use in the context of Multimedia Information Retrieval (MIR) leads to the convincement that such DLs should allow the treatment of the inherent imprecision in multimedia object content representation and retrieval. In this paper we will present a fuzzy extension of ALC, combining Zadeh's fuzzy logic with a classical DL. In particular, concepts becomes fuzzy and, thus, reasoning about imprecise concepts is supported. We will define its syntax, its semantics, describe its properties and present a constraint propagation calculus for reasoning in it.
A New Direction in AI: Toward a Computational Theory of Perceptions
Fast-forward (FF) was the most successful automatic planner in the Fifth International Conference on Artificial Intelligence Planning and Scheduling (AIPS '00) planning systems competition. Like the well-known hsp system, FF relies on forward search in the state space, guided by a heuristic that estimates goal distances by ignoring delete lists. It differs from HSP in a number of important details. This article describes the algorithmic techniques used in FF in comparison to hsp and evaluates their benefits in terms of run-time and solution-length behavior. Humans have a remarkable capability to perform a wide variety of physical and mental tasks without any measurements and any computations. Familiar examples are parking a car, driving in city traffic, playing golf, cooking a meal, and summarizing a story. In performing such tasks, humans use perceptions of time, direction, speed, shape, possibility, likelihood, truth, and other attributes of physical and mental objects. Reflecting the bounded ability of the human brain to resolve detail, perceptions are intrinsically imprecise. In more concrete terms, perceptions are f-granular, meaning that (1) the boundaries of perceived classes are unsharp and (2) the values of attributes are granulated, with a granule being a clump of values (points, objects) drawn together by indistinguishability, similarity, proximity, and function. For example, the granules of age might be labeled very young, young, middle aged, old, very old, and so on. F-granularity of perceptions puts them well beyond the reach of traditional methods of analysis based on predicate logic or probability theory. The computational theory of perceptions (CTP), which is outlined in this article, adds to the armamentarium of AI a capability to compute and reason with perception-based information. The point of departure in CTP is the assumption that perceptions are described by propositions drawn from a natural language; for example, it is unlikely that there will be a significant increase in the price of oil in the near future. In CTP, a proposition, p, is viewed as an answer to a question, and the meaning of p is represented as a generalized constraint. To compute with perceptions, their descriptors are translated into what is called the generalized constraint language (GCL). Then, goal-directed constraint propagation is utilized to answer a given query. A concept that plays a key role in CTP is that of precisiated natural language (PNL). The computational theory of perceptions suggests a new direction in AI -- a direction that might enhance the ability of AI to deal with realworld problems in which decision-relevant information is a mixture of measurements and perceptions. What is not widely recognized is that many important problems in AI fall into this category.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Sutton, Richard S., McAllester, David A., Singh, Satinder P., Mansour, Yishay
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Sutton, Richard S., McAllester, David A., Singh, Satinder P., Mansour, Yishay
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining a policy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent of the value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Policy Gradient Methods for Reinforcement Learning with Function Approximation
Sutton, Richard S., McAllester, David A., Singh, Satinder P., Mansour, Yishay
Function approximation is essential to reinforcement learning, but the standard approach of approximating a value function and determining apolicy from it has so far proven theoretically intractable. In this paper we explore an alternative approach in which the policy is explicitly represented by its own function approximator, independent ofthe value function, and is updated according to the gradient of expected reward with respect to the policy parameters. Williams's REINFORCE method and actor-critic methods are examples of this approach. Our main new result is to show that the gradient can be written in a form suitable for estimation from experience aided by an approximate action-value or advantage function. Using this result, we prove for the first time that a version of policy iteration with arbitrary differentiable function approximation is convergent to a locally optimal policy.
Function Approximation with the Sweeping Hinge Algorithm
Hush, Don R., Lozano, Fernando, Horne, Bill G.
We present a computationally efficient algorithm for function approximation with piecewise linear sigmoidal nodes. A one hidden layer network is constructed one node at a time using the method of fitting the residual. The task of fitting individual nodes is accomplished using a new algorithm that searchs for the best fit by solving a sequence of Quadratic Programming problems. This approach offers significant advantages over derivative-based search algorithms (e.g.