Goto

Collaborating Authors

 Fuzzy Logic


A fuzzy adaptive metaheuristic algorithm for identifying sustainable, economical, lightweight, and earthquake-resistant reinforced concrete cantilever retaining walls

arXiv.org Artificial Intelligence

In earthquake-prone zones, the seismic performance of reinforced concrete cantilever (RCC) retaining walls is significant. In this study, the seismic performance was investigated using horizontal and vertical pseudo-static coefficients. To tackle RCC weights and forces resulting from these earth pressures, 26 constraints for structural strengths and geotechnical stability along with 12 geometric variables are associated with each design. These constraints and design variables form a constraint optimization problem with a twelve-dimensional solution space. To conduct effective search and produce sustainable, economical, lightweight RCC designs robust against earthquake hazards, a novel adaptive fuzzy-based metaheuristic algorithm is applied. The proposed method divides the search space to sub-regions and establishes exploration, information sharing, and exploitation search capabilities based on its novel search components. Further, fuzzy inference systems were employed to address parameterization and computational cost evaluation issues. It was found that the proposed algorithm can achieve low-cost, low-weight, and low CO2 emission RCC designs under nine seismic conditions in comparison with several classical and best-performing design optimizers.


Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation

arXiv.org Artificial Intelligence

Reinforcement Learning (RL; Sutton and Barto, 2018; Mannor et al., 2022) studies online decision making problems in which an agent learns through experience within a dynamic environment, with the goal to minimize a loss function associated with the agent-environment interaction. Modern applications of RL such as robotics(Schulman et al., 2015; Lillicrap et al., 2015; Akkaya et al., 2019), game playing (Mnih et al., 2013; Silver et al., 2018) and autonomous driving (Kiran et al., 2021), almost invariably consist of large scale environments where function approximation techniques are necessary to allow the agent to generalize across different states. Furthermore, some form of agent robustness is usually required to cope with environment irregularities that cannot be faithfully represented by stochasticity assumptions (see e.g., Dulac-Arnold et al., 2021). Theoretical foundations for RL with function approximation (e.g., Jiang et al., 2017; Yang and Wang, 2019; Jin et al., 2020b; Agarwal et al., 2020) have been steadily coming into fruition.


Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation

arXiv.org Artificial Intelligence

We study reinforcement learning with linear function approximation where the transition probability and reward functions are linear with respect to a feature mapping $\boldsymbol{\phi}(s,a)$. Specifically, we consider the episodic inhomogeneous linear Markov Decision Process (MDP), and propose a novel computation-efficient algorithm, LSVI-UCB$^+$, which achieves an $\widetilde{O}(Hd\sqrt{T})$ regret bound where $H$ is the episode length, $d$ is the feature dimension, and $T$ is the number of steps. LSVI-UCB$^+$ builds on weighted ridge regression and upper confidence value iteration with a Bernstein-type exploration bonus. Our statistical results are obtained with novel analytical tools, including a new Bernstein self-normalized bound with conservatism on elliptical potentials, and refined analysis of the correction term. This is a minimax optimal algorithm for linear MDPs up to logarithmic factors, which closes the $\sqrt{Hd}$ gap between the upper bound of $\widetilde{O}(\sqrt{H^3d^3T})$ in (Jin et al., 2020) and lower bound of $\Omega(Hd\sqrt{T})$ for linear MDPs.


Distributionally Robust Offline Reinforcement Learning with Linear Function Approximation

arXiv.org Artificial Intelligence

Among the reasons hindering reinforcement learning (RL) applications to real-world problems, two factors are critical: limited data and the mismatch between the testing environment (real environment in which the policy is deployed) and the training environment (e.g., a simulator). This paper attempts to address these issues simultaneously with distributionally robust offline RL, where we learn a distributionally robust policy using historical data obtained from the source environment by optimizing against a worst-case perturbation thereof. In particular, we move beyond tabular settings and consider linear function approximation. More specifically, we consider two settings, one where the dataset is well-explored and the other where the dataset has sufficient coverage of the optimal policy. We propose two algorithms~-- one for each of the two settings~-- that achieve error bounds $\tilde{O}(d^{1/2}/N^{1/2})$ and $\tilde{O}(d^{3/2}/N^{1/2})$ respectively, where $d$ is the dimension in the linear function approximation and $N$ is the number of trajectories in the dataset. To the best of our knowledge, they provide the first non-asymptotic results of the sample complexity in this setting. Diverse experiments are conducted to demonstrate our theoretical findings, showing the superiority of our algorithm against the non-robust one.


On Instance-Dependent Bounds for Offline Reinforcement Learning with Linear Function Approximation

arXiv.org Artificial Intelligence

Sample-efficient offline reinforcement learning (RL) with linear function approximation has recently been studied extensively. Much of prior work has yielded the minimax-optimal bound of $\tilde{\mathcal{O}}(\frac{1}{\sqrt{K}})$, with $K$ being the number of episodes in the offline data. In this work, we seek to understand instance-dependent bounds for offline RL with function approximation. We present an algorithm called Bootstrapped and Constrained Pessimistic Value Iteration (BCP-VI), which leverages data bootstrapping and constrained optimization on top of pessimism. We show that under a partial data coverage assumption, that of \emph{concentrability} with respect to an optimal policy, the proposed algorithm yields a fast rate of $\tilde{\mathcal{O}}(\frac{1}{K})$ for offline RL when there is a positive gap in the optimal Q-value functions, even when the offline data were adaptively collected. Moreover, when the linear features of the optimal actions in the states reachable by an optimal policy span those reachable by the behavior policy and the optimal actions are unique, offline RL achieves absolute zero sub-optimality error when $K$ exceeds a (finite) instance-dependent threshold. To the best of our knowledge, these are the first $\tilde{\mathcal{O}}(\frac{1}{K})$ bound and absolute zero sub-optimality bound respectively for offline RL with linear function approximation from adaptive data with partial coverage. We also provide instance-agnostic and instance-dependent information-theoretical lower bounds to complement our upper bounds.


A fuzzy logic-based stabilization system for a flying robot, with an embedded energy harvester and a visual decision-making system

arXiv.org Artificial Intelligence

"Smart cities" is the trendy rubric of modern urban projects that require new innovative ideas to attain the desired perfection in many fields to change our life for the better. In this context, a new innovative application will be presented here to investigate and continuously make the required maintenance of public roads by creating a flying robot for painting the partially erased parts of sidewalks' edges that are usually plated in two different colors; primarily black and white as we suppose here. The first contribution of this paper is developing a fuzzy-logic-based stabilization system for an octocopter serving as a liquids transporter that could be equipped with a robot arm. The second contribution consists of designing an embedded energy harvester for the flying robot to promote the management of available power sources. Finally, as suggested in this project, we present a complement heuristic study clarifying some main concepts that rely on a computer vision-based decision-making system.


Function Approximation and Ordinary Least Squares

#artificialintelligence

The mathematics of least squares would have been so trivial for Gauss that even had he come upon the method he might have passed it over as but one of many, not noticing its true significance until Legendre's book prodded his memory and produced a post facto priority claim. There have been many extraordinary equations that changed the world (whether they were discovered or invented depends on whether you subscribe to mathematical Platonism--I do) but among the 17 equations that changed the world, the legendary Ordinary Least Squares (OLS) wasn't listed among them (though it is heavily related to both the Normal Distribution and Information Theory). It's a shame because the article and tweets referencing the "17 Equations" have been floating around for nearly ten years. So I will tell you about the magic of OLS, a little about its history, some of its extensions, and its applications (yes, to Fintech too). Subscribe for free to receive new posts and support my work.


Max-min Learning of Approximate Weight Matrices from Fuzzy Data

arXiv.org Artificial Intelligence

In this article, we study the approximate solutions set $\Lambda_b$ of an inconsistent system of $\max-\min$ fuzzy relational equations $(S): A \Box_{\min}^{\max}x =b$. Using the $L_\infty$ norm, we compute by an explicit analytical formula the Chebyshev distance $\Delta~=~\inf_{c \in \mathcal{C}} \Vert b -c \Vert$, where $\mathcal{C}$ is the set of second members of the consistent systems defined with the same matrix $A$. We study the set $\mathcal{C}_b$ of Chebyshev approximations of the second member $b$ i.e., vectors $c \in \mathcal{C}$ such that $\Vert b -c \Vert = \Delta$, which is associated to the approximate solutions set $\Lambda_b$ in the following sense: an element of the set $\Lambda_b$ is a solution vector $x^\ast$ of a system $A \Box_{\min}^{\max}x =c$ where $c \in \mathcal{C}_b$. As main results, we describe both the structure of the set $\Lambda_b$ and that of the set $\mathcal{C}_b$. We then introduce a paradigm for $\max-\min$ learning weight matrices that relates input and output data from training data. The learning error is expressed in terms of the $L_\infty$ norm. We compute by an explicit formula the minimal value of the learning error according to the training data. We give a method to construct weight matrices whose learning error is minimal, that we call approximate weight matrices. Finally, as an application of our results, we show how to learn approximately the rule parameters of a possibilistic rule-based system according to multiple training data.


Contextual Autonomy Evaluation of Unmanned Aerial Vehicles in Subterranean Environments

arXiv.org Artificial Intelligence

In this paper we focus on the evaluation of contextual autonomy for robots. More specifically, we propose a fuzzy framework for calculating the autonomy score for a small Unmanned Aerial Systems (sUAS) for performing a task while considering task complexity and environmental factors. Our framework is a cascaded Fuzzy Inference System (cFIS) composed of combination of three FIS which represent different contextual autonomy capabilities. We performed several experiments to test our framework in various contexts, such as endurance time, navigation, take off/land, and room clearing, with seven different sUAS. We introduce a predictive measure which improves upon previous predictive measures, allowing for previous real-world task performance to be used in predicting future mission performance.


Provably Efficient Model-Free Constrained RL with Linear Function Approximation

arXiv.org Artificial Intelligence

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret and $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ constraint violation bounds can be achieved, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps. Our bounds are attained without explicitly estimating the unknown transition model or requiring a simulator, and they depend on the state space only through the dimension of the feature mapping. Hence our bounds hold even when the number of states goes to infinity. Our main results are achieved via novel adaptations of the standard LSVI-UCB algorithms. In particular, we first introduce primal-dual optimization into the LSVI-UCB algorithm to balance between regret and constraint violation. More importantly, we replace the standard greedy selection with respect to the state-action function in LSVI-UCB with a soft-max policy. This turns out to be key in establishing uniform concentration for the constrained case via its approximation-smoothness trade-off. We also show that one can achieve an even zero constraint violation while still maintaining the same order with respect to $T$.