to

On fine-tuning of Autoencoders for Fuzzy rule classifiers

Recent discoveries in Deep Neural Networks are allowing researchers to tackle some very complex problems such as image classification and audio classification, with improved theoretical and empirical justifications. This paper presents a novel scheme to incorporate the use of autoencoders in Fuzzy rule classifiers (FRC). Autoencoders when stacked can learn the complex non-linear relationships amongst data, and the proposed framework built towards FRC can allow users to input expert knowledge to the system. This paper further introduces four novel fine-tuning strategies for autoencoders to improve the FRC's classification and rule reduction performance. The proposed framework has been tested across five real-world benchmark datasets. Elaborate comparisons with over 15 previous studies, and across 10-fold cross validation performance, suggest that the proposed methods are capable of building FRCs which can provide state of the art accuracies.

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we bridge this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods.

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al.~[2020c] establish a value-based algorithm with general function approximation that enjoys $\widetilde{O}(\mathrm{poly}(dH)\sqrt{K})$\footnote{Throughout the paper, we use $\widetilde{O}(\cdot)$ to suppress logarithm factors. } regret bound, where $d$ depends on the complexity of the function class, $H$ is the planning horizon, and $K$ is the total number of episodes. However, their algorithm requires $\Omega(K)$ computation time per round, rendering the algorithm inefficient for practical use. In this paper, by applying online sub-sampling techniques, we develop an algorithm that takes $\widetilde{O}(\mathrm{poly}(dH))$ computation time per round on average, and enjoys nearly the same regret bound. Furthermore, the algorithm achieves low switching cost, i.e., it changes the policy only $\widetilde{O}(\mathrm{poly}(dH))$ times during its execution, making it appealing to be implemented in real-life scenarios. Moreover, by using an upper-confidence based exploration-driven reward function, the algorithm provably explores the environment in the reward-free setting. In particular, after $\widetilde{O}(\mathrm{poly}(dH))/\epsilon^2$ rounds of exploration, the algorithm outputs an $\epsilon$-optimal policy for any given reward function.

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

We propose a model-free reinforcement learning In this work, we propose an exploration strategy inspired algorithm inspired by the popular randomized by the popular Randomized Least Squares Value Iteration least squares value iteration (RLSVI) algorithm (RLSVI) algorithm (Osband et al., 2016b; Russo, 2019; as well as the optimism principle. Unlike Zanette et al., 2020a) as well as by the optimism principle existing upper-confidence-bound (UCB) based (Brafman & Tennenholtz, 2001; Jaksch et al., 2010; Jin approaches, which are often computationally intractable, et al., 2018; 2020; Wang et al., 2020), which is efficient in our algorithm drives exploration by simply both statistical and computational sense, and can be easily perturbing the training data with judiciously plugged into common RL algorithms, including UCB-VI chosen i.i.d.

A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems

This paper proposes a novel fuzzy action selection method to leverage human knowledge in reinforcement learning problems. Based on the estimates of the most current action-state values, the proposed fuzzy nonlinear mapping as-signs each member of the action set to its probability of being chosen in the next step. A user tunable parameter is introduced to control the action selection policy, which determines the agent's greedy behavior throughout the learning process. This parameter resembles the role of the temperature parameter in the softmax action selection policy, but its tuning process can be more knowledge-oriented since this parameter reflects the human knowledge into the learning agent by making modifications in the fuzzy rule base. Simulation results indicate that including fuzzy logic within the reinforcement learning in the proposed manner improves the learning algorithm's convergence rate, and provides superior performance.

Safe Reinforcement Learning with Linear Function Approximation

Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with \emph{no safety violation}, achieve a $\tilde{\mathcal{O}}\left(\kappa\sqrt{d^3H^3T}\right)$ regret, nearly matching that of state-of-the-art unsafe algorithms, where $H$ is the duration of each episode, $d$ is the dimension of the feature mapping, $\kappa$ is a constant characterizing the safety constraints, and $T$ is the total number of action plays. We further present numerical simulations that corroborate our theoretical findings.

Synthesising Reinforcement Learning Policies through Set-Valued Inductive Rule Learning

Today's advanced Reinforcement Learning algorithms produce black-box policies, that are often difficult to interpret and trust for a person. We introduce a policy distilling algorithm, building on the CN2 rule mining algorithm, that distills the policy into a rule-based decision system. At the core of our approach is the fact that an RL process does not just learn a policy, a mapping from states to actions, but also produces extra meta-information, such as action values indicating the quality of alternative actions. This meta-information can indicate whether more than one action is near-optimal for a certain state. We extend CN2 to make it able to leverage knowledge about equally-good actions to distill the policy into fewer rules, increasing its interpretability by a person. Then, to ensure that the rules explain a valid, non-degenerate policy, we introduce a refinement algorithm that fine-tunes the rules to obtain good performance when executed in the environment. We demonstrate the applicability of our algorithm on the Mario AI benchmark, a complex task that requires modern reinforcement learning algorithms including neural networks. The explanations we produce capture the learned policy in only a few rules, that allow a person to understand what the black-box agent learned.

Towards interval uncertainty propagation control in bivariate aggregation processes and the introduction of width-limited interval-valued overlap functions

Overlap functions are a class of aggregation functions that measure the overlapping degree between two values. Interval-valued overlap functions were defined as an extension to express the overlapping of interval-valued data, and they have been usually applied when there is uncertainty regarding the assignment of membership degrees. The choice of a total order for intervals can be significant, which motivated the recent developments on interval-valued aggregation functions and interval-valued overlap functions that are increasing to a given admissible order, that is, a total order that refines the usual partial order for intervals. Also, width preservation has been considered on these recent works, in an intent to avoid the uncertainty increase and guarantee the information quality, but no deeper study was made regarding the relation between the widths of the input intervals and the output interval, when applying interval-valued functions, or how one can control such uncertainty propagation based on this relation. Thus, in this paper we: (i) introduce and develop the concepts of width-limited interval-valued functions and width limiting functions, presenting a theoretical approach to analyze the relation between the widths of the input and output intervals of bivariate interval-valued functions, with special attention to interval-valued aggregation functions; (ii) introduce the concept of $(a,b)$-ultramodular aggregation functions, a less restrictive extension of one-dimension convexity for bivariate aggregation functions, which have an important predictable behaviour with respect to the width when extended to the interval-valued context; (iii) define width-limited interval-valued overlap functions, taking into account a function that controls the width of the output interval; (iv) present and compare three construction methods for these width-limited interval-valued overlap functions.

Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation

Natural policy gradient (NPG) methods with function approximation achieve impressive empirical success in reinforcement learning problems with large state-action spaces. However, theoretical understanding of their convergence behaviors remains limited in the function approximation setting. In this paper, we perform a finite-time analysis of NPG with linear function approximation and softmax parameterization, and prove for the first time that widely used entropy regularization method, which encourages exploration, leads to linear convergence rate. We adopt a Lyapunov drift analysis to prove the convergence results and explain the effectiveness of entropy regularization in improving the convergence rates.

Recommending Multiple Criteria Decision Analysis Methods with A New Taxonomy-based Decision Support System

We present the Multiple Criteria Decision Analysis Methods Selection Software (MCDA-MSS). This decision support system helps analysts answering a recurring question in decision science: Which is the most suitable Multiple Criteria Decision Analysis method (or a subset of MCDA methods) that should be used for a given Decision-Making Problem (DMP)?. The MCDA-MSS includes guidance to lead decision-making processes and choose among an extensive collection (over 200) of MCDA methods. These are assessed according to an original comprehensive set of problem characteristics. The accounted features concern problem formulation, preference elicitation and types of preference information, desired features of a preference model, and construction of the decision recommendation. The applicability of the MCDA-MSS has been tested on several case studies. The MCDA-MSS includes the capabilities of (i) covering from very simple to very complex DMPs, (ii) offering recommendations for DMPs that do not match any method from the collection, (iii) helping analysts prioritize efforts for reducing gaps in the description of the DMPs, and (iv) unveiling methodological mistakes that occur in the selection of the methods. A community-wide initiative involving experts in MCDA methodology, analysts using these methods, and decision-makers receiving decision recommendations will contribute to expansion of the MCDA-MSS.