AITopics | Fuzzy Logic

Collaborating Authors

Fuzzy Logic

"Fuzzy Logic is basically a multivalued logic that allows intermediate values to be defined between conventional evaluations like yes/no, true/false, black/white, etc. Notions like rather warm or pretty cold can be formulated mathematically and processed by computers."
– Peter Bauer, Stephan Nouak, and Roman Winkler. A Brief Course in Fuzzy Logic and Fuzzy Control. Available from ESRU [Energy Systems Research Unit], Department of Mechanical Engineering, University of Strathclyde. 1996.

News Overviews Instructional Materials AI-Alerts Classics

A Convergent O(n) Temporal-difference Algorithm for Off-policy Learning with Linear Function Approximation

Sutton, Richard S., Maei, Hamid R., Szepesvári, Csaba

Neural Information Processing SystemsFeb-15-2020, 03:42:41 GMT

We introduce the first temporal-difference learning algorithm that is stable with linear function approximation and off-policy training, for any finite Markov decision process, target policy, and exciting behavior policy, and whose complexity scales linearly in the number of parameters. We consider an i.i.d.\ policy-evaluation setting in which the data need not come from on-policy experience. The gradient temporal-difference (GTD) algorithm estimates the expected update vector of the TD(0) algorithm and performs stochastic gradient descent on its L_2 norm. Our analysis proves that its expected update is in the direction of the gradient, assuring convergence under the usual stochastic approximation conditions to the same least-squares solution as found by the LSTD, but without its quadratic computational complexity. GTD is online and incremental, and does not involve multiplying by products of likelihood ratios as in importance-sampling methods.

linear function approximation, off-policy learning, temporal-difference algorithm

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

Robust Value Function Approximation Using Bilinear Programming

Petrik, Marek, Zilberstein, Shlomo

Neural Information Processing SystemsFeb-15-2020, 03:11:09 GMT

Existing value function approximation methods have been successfully used in many applications, but they often lack useful a priori error bounds. We propose approximate bilinear programming, a new formulation of value function approximation that provides strong a priori guarantees. In particular, it provably finds an approximate value function that minimizes the Bellman residual. Solving a bilinear program optimally is NP hard, but this is unavoidable because the Bellman-residual minimization itself is NP hard. We, therefore, employ and analyze a common approximate algorithm for bilinear programs.

artificial intelligence, fuzzy logic, robust value function approximation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.94)

Add feedback

Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation

Bhatnagar, Shalabh, Precup, Doina, Silver, David, Sutton, Richard S., Maei, Hamid R., Szepesvári, Csaba

Neural Information Processing SystemsFeb-15-2020, 02:42:59 GMT

We introduce the first temporal-difference learning algorithms that converge with smooth value function approximators, such as neural networks. Conventional temporal-difference (TD) methods, such as TD($\lambda$), Q-learning and Sarsa have been used successfully with function approximation in many applications. However, it is well known that off-policy sampling, as well as nonlinear function approximation, can cause these algorithms to become unstable (i.e., the parameters of the approximator may diverge). Sutton et al (2009a,b) solved the problem of off-policy learning with linear TD algorithms by introducing a new objective function, related to the Bellman-error, and algorithms that perform stochastic gradient-descent on this function. In this paper, we generalize their work to nonlinear function approximation.

algorithm, arbitrary smooth function approximation, convergent temporal-difference learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Convergent Fitted Value Iteration with Linear Function Approximation

Lizotte, Daniel J.

Neural Information Processing SystemsFeb-14-2020, 23:59:10 GMT

Fitted value iteration (FVI) with ordinary least squares regression is known to diverge. We present a new method, "Expansion-Constrained Ordinary Least Squares" (ECOLS), that produces a linear approximation but also guarantees convergence when used with FVI. To ensure convergence, we constrain the least squares regression operator to be a non-expansion in the infinity-norm. We show that the space of function approximators that satisfy this constraint is more rich than the space of "averagers," we prove a minimax property of the ECOLS residual error, and we give an efficient algorithm for computing the coefficients of ECOLS based on constraint generation. We illustrate the algorithmic convergence of FVI with ECOLS in a suite of experiments, and discuss its properties.

convergence, convergent fitted value iteration, linear function approximation, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)

Add feedback

Sketch-Based Linear Value Function Approximation

Bellemare, Marc, Veness, Joel, Bowling, Michael

Neural Information Processing SystemsFeb-14-2020, 23:41:53 GMT

Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products.

feature vector, sketch-based linear value function approximation, value estimate, (1 more...)

Neural Information Processing Systems

Genre: Research Report (0.43)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.94)

Add feedback

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

Nguyen, Duc Thien, Kumar, Akshat, Lau, Hoong Chuin

Neural Information Processing SystemsFeb-14-2020, 15:12:02 GMT

Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDec-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDec-POMDP policies. Vanilla AC has slow convergence for larger problems.

collective multiagent planning, policy gradient, value function approximation, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.40)

Add feedback

Basis refinement strategies for linear value function approximation in MDPs

Comanici, Gheorghe, Precup, Doina, Panangaden, Prakash

Neural Information Processing SystemsFeb-14-2020, 12:43:39 GMT

We provide a theoretical framework for analyzing basis function construction for linear value function approximation in Markov Decision Processes (MDPs). We show that important existing methods, such as Krylov bases and Bellman-error-based methods are a special case of the general framework we develop. We provide a general algorithmic framework for computing basis function refinements which "respect" the dynamics of the environment, and we derive approximation error bounds that apply for any algorithm respecting this general framework. We also show how, using ideas related to bisimulation metrics, one can translate basis refinement into a process of finding "prototypes" that are diverse enough to represent the given MDP. Papers published at the Neural Information Processing Systems Conference.

basis refinement strategy, general framework, linear value function approximation, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.77)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)

Add feedback

Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces

Ohnishi, Motoya, Yukawa, Masahiro, Johansson, Mikael, Sugiyama, Masashi

Neural Information Processing SystemsFeb-14-2020, 11:28:19 GMT

Motivated by the success of reinforcement learning (RL) for discrete-time tasks such as AlphaGo and Atari games, there has been a recent surge of interest in using RL for continuous-time control of physical systems (cf. Since discretization of time is susceptible to error, it is methodologically more desirable to handle the system dynamics directly in continuous time. However, very few techniques exist for continuous-time RL and they lack flexibility in value function approximation. In this paper, we propose a novel framework for model-based continuous-time value function approximation in reproducing kernel Hilbert spaces. The resulting framework is so flexible that it can accommodate any kind of kernel-based approach, such as Gaussian processes and kernel adaptive filters, and it allows us to handle uncertainties and nonstationarity without prior knowledge about the environment or what basis functions to employ. We demonstrate the validity of the presented framework through experiments.

continuous-time value function approximation, reproducing kernel hilbert space

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games > Computer Games (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.91)

Add feedback

Weighted importance sampling for off-policy learning with linear function approximation

Mahmood, A. Rupam, Hasselt, Hado P. van, Sutton, Richard S.

Neural Information Processing SystemsFeb-14-2020, 11:27:19 GMT

Importance sampling is an essential component of off-policy model-free reinforcement learning algorithms. However, its most effective variant, \emph{weighted} importance sampling, does not carry over easily to function approximation and, because of this, it is not utilized in existing off-policy learning algorithms. In this paper, we take two steps toward bridging this gap. First, we show that weighted importance sampling can be viewed as a special case of weighting the error of individual training samples, and that this weighting has theoretical and empirical benefits similar to those of weighted importance sampling. Second, we show that these benefits extend to a new weighted-importance-sampling version of off-policy LSTD(lambda).

function approximation, linear function approximation, off-policy learning, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Bayes-Adaptive Simulation-based Search with Value Function Approximation

Guez, Arthur, Heess, Nicolas, Silver, David, Dayan, Peter

Neural Information Processing SystemsFeb-14-2020, 05:56:41 GMT

Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty. It finds the optimal policy in belief space, which explicitly accounts for the expected effect on future rewards of reductions in uncertainty. However, the Bayes-adaptive solution is typically intractable in domains with large or continuous state spaces. We present a tractable method for approximating the Bayes-adaptive solution by combining simulation-based search with a novel value function approximation technique that generalises over belief space. Our method outperforms prior approaches in both discrete bandit tasks and simple continuous navigation and control tasks. Papers published at the Neural Information Processing Systems Conference.

bayes-adaptive simulation-based search, bayes-adaptive solution, value function approximation, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)

Add feedback