Goto

Collaborating Authors

 Fuzzy Logic


Optimism in Reinforcement Learning with Generalized Linear Function Approximation

arXiv.org Machine Learning

We design a new provably efficient algorithm for episodic reinforcement learning with generalized linear function approximation. We analyze the algorithm under a new expressivity assumption that we call "optimistic closure," which is strictly weaker than assumptions from prior analyses for the linear setting. With optimistic closure, we prove that our algorithm enjoys a regret bound of $\tilde{O}(\sqrt{d^3 T})$ where $d$ is the dimensionality of the state-action features and $T$ is the number of episodes. This is the first statistically and computationally efficient algorithm for reinforcement learning with generalized linear functions.


Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

arXiv.org Artificial Intelligence

Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a flexible and simple algorithm for approximately solving imperfect information games with policies parameterized by a normalized rectified linear unit (ReLU). In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and has a regret bound with a better dependence on the number of actions in the tabular case. We derive approximation error-aware regret bounds for $(\Phi, f)$-regret matching, which applies to a general class of link functions and regret objectives. These bounds recover a tighter bound for RCFR and provides a theoretical justification for RCFR implementations with alternative policy parameterizations ($f$-RCFR), including softmax. We provide exploitability bounds for $f$-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games, and examine empirically how the link function interacts with the severity of the approximation to determine exploitability performance in practice. Although a ReLU parameterized policy is typically the best choice, a softmax parameterization can perform as well or better in settings that require aggressive approximation.


A Dataset Schema for Cooperative Learning from Demonstration in Multi-robots Systems

arXiv.org Artificial Intelligence

To achieve these common goals, agents in a MAS should be capable of interacting with other agents, not simply by exchanging data, but by engaging as in social activities, such as those people participate in their daily lives: cooperation, coordination, negotiation, and the like. In MASs, agents are assumed to be autonomous - capable of making independent decisions about to do in order to satisfy their design objectives, and thus they need mechanisms that allow them to synchronize and to coordinate their activities at run time [31]. Although one of the main issues in MASs is the agents' coordination structure, this is not hard-wired at design time, as MASs are typically in standard concurrent/distributed systems. One well-known strategy for coordination in MAS is the design of multi-agent coordinated plans [7][35][36][33][14] that include, not only usual agents' actions defined by their effectors, but also communication actions to achieve the necessary synchronization and coordination. To represent communication actions, some specific languages were created, e.g.


Application of artificial intelligence to wastewater treatment: A bibliometric analysis and systematic review of technology, economy, management, and wastewater reuse

#artificialintelligence

Bibliometric analysis and systematic review of AI applied to wastewater treatment. Wastewater treatment technology, economy, management, and reuse were discussed. Prediction accuracy of AI technologies on pollutant removal ranged 0.64–1.00. Application of AI technology could reduce operational costs by up to 30 %. Combined AI methods could provide higher accuracy and lower error. Wastewater treatment is an important step for pollutant reduction and the promotion of water environment quality.


The Canonical Distortion Measure for Vector Quantization and Function Approximation

arXiv.org Machine Learning

To measure the quality of a set of vector quantization points a means of measuring the distance between a random point and its quantization is required. Common metrics such as the {\em Hamming} and {\em Euclidean} metrics, while mathematically simple, are inappropriate for comparing natural signals such as speech or images. In this paper it is shown how an {\em environment} of functions on an input space $X$ induces a {\em canonical distortion measure} (CDM) on X. The depiction 'canonical" is justified because it is shown that optimizing the reconstruction error of X with respect to the CDM gives rise to optimal piecewise constant approximations of the functions in the environment. The CDM is calculated in closed form for several different function classes. An algorithm for training neural networks to implement the CDM is presented along with some encouraging experimental results.


Overview of artificial intelligence in medicine

#artificialintelligence

Alan Turing (1950) was one of the founders of modern computers and AI. The "Turing test" was based on the fact that the intelligent behavior of a computer is the ability to achieve human level performance in cognition related tasks.[1] The 1980s and 1990s saw a surge in interest in AI. Artificial intelligent techniques such as fuzzy expert systems, Bayesian networks, artificial neural networks, and hybrid intelligent systems were used in different clinical settings in health care. In 2016, the biggest chunk of investments in AI research were in healthcare applications compared with other sectors.[2] AI in medicine can be dichotomized into two subtypes: Virtual and physical.[3]


Some Considerations and a Benchmark Related to the CNF Property of the Koczy-Hirota Fuzzy Rule Interpolation

arXiv.org Artificial Intelligence

The goal of this paper is twofold. Once to highlight some basic problematic properties of the KH Fuzzy Rule Interpolation through examples, secondly to set up a brief Benchmark set of Examples, which is suitable for testing other Fuzzy Rule Interpolation (FRI) methods against these ill conditions. Fuzzy Rule Interpolation methods were originally proposed to handle the situation of missing fuzzy rules (sparse rule-bases) and to reduce the decision complexity. Fuzzy Rule Interpolation is an important technique for implementing inference with sparse fuzzy rule-bases. Even if a given observation has no overlap with the antecedent of any rule from the rule-base, FRI may still conclude a conclusion. The first FRI method was the Koczy and Hirota proposed "Linear Interpolation", which was later renamed to "KH Fuzzy Interpolation" by the followers. There are several conditions and criteria have been suggested for unifying the common requirements an FRI methods have to satisfy. One of the most common one is the demand for a convex and normal fuzzy (CNF) conclusion, if all the rule antecedents and consequents are CNF sets. The KH FRI is the one, which cannot fulfill this condition. This paper is focusing on the conditions, where the KH FRI fails the demand for the CNF conclusion. By setting up some CNF rule examples, the paper also defines a Benchmark, in which other FRI methods can be tested if they can produce CNF conclusion where the KH FRI fails.


Provably Convergent Off-Policy Actor-Critic with Function Approximation

arXiv.org Machine Learning

We present the first provably convergent off-policy actor-critic algorithm (COF-PAC) with function approximation in a two-timescale form. Key to COF-PAC is the introduction of a new critic, the emphasis critic, which is trained via Gradient Emphasis Learning (GEM), a novel combination of the key ideas of Gradient Temporal Difference Learning and Emphatic Temporal Difference Learning. With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.


TSK-Streams: Learning TSK Fuzzy Systems on Data Streams

arXiv.org Machine Learning

In many practical applications of machine learning and pred ictive modeling, data is produced incrementally in the course of time and observed in the form of a continuous, potentially unbounded stream of observations. Correspond ingly, the problem of learning from data streams has recently received increasing attenti on (Gama, 2012). Algorithms for learning on streams must be able to process the data in a si ngle pass, which implies an incremental mode of learning, and to adapt to changes of the u nderlying data-generating process (Domingos and Hulten, 2003). A popular approach for learning on data streams, both for cla ssification and regression, is rule induction, in the fuzzy logic and computational inte lligence community also known as "evolving fuzzy systems" (Lughofer, 2011). Shaker et al. (2017) proposed a method for regression that builds on a very efficient and effective techniq ue for rule induction, which 1 is inspired by the state-of-the-art machine learning algor ithm AMRules, and combines it with the strengths of fuzzy modeling. Thus, the method induc es a set of fuzzy rules, which, compared to conventional rules with Boolean antecedents, h as the advantage of producing smooth regression functions. The method presented in this p aper, called TSK-Streams, is a revised and improved variant. The main modifications and novel contributions are as follows.


Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

arXiv.org Machine Learning

Thanks to its generality, RL has been widely studied in many areas, such as control theory, game theory, operations research, multi-agent systems, machine learning, artificial intelligence, and statistics [23]. In recent years, combining with deep learning, RL has demonstrated its great potential in addressing challenging practical control and optimization problems [17, 21]. Among all possible algorithms, the temporal difference (TD) learning has arguably become one of the most popular RL algorithms so far, which is further dominated by the celebrated TD(0) algorithm [22]. TD learning provides an iterative process to update an estimate of the so-termed value function v π(s) with respect to a given policy π based on temporally successive samples. Dealing with a finite state space, the classical version of the TD(0) algorithm adopts a tabular representation for v π(s), which stores entry-wise value estimates on a per state basis. J. Sun and Q. Yang are with the College of Control Science and Engineering, and the State Key Laboratory of Industrial Control Technology, Zhejiang University, Hangzhou, China. G. Wang and G. B. Giannakis are with the Digital Technology Center and the Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN 55455, USA. Z. Yang is with the Department of Mechanical and Energy Engineering, Southern University of Science and Technology, Shenzhen, China.