Fuzzy Logic
Non-stationary Reinforcement Learning under General Function Approximation
Feng, Songtao, Yin, Ming, Huang, Ruiquan, Wang, Yu-Xiang, Yang, Jing, Liang, Yingbin
General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such an attempt. We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs, which subsumes majority of existing tractable RL problems in static MDPs as well as non-stationary MDPs. Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA, which features a sliding window mechanism and a new confidence set design for non-stationary MDPs. We then establish an upper bound on the dynamic regret for the proposed algorithm, and show that SW-OPEA is provably efficient as long as the variation budget is not significantly large. We further demonstrate via examples of non-stationary linear and tabular MDPs that our algorithm performs better in small variation budget scenario than the existing UCB-type algorithms. To the best of our knowledge, this is the first dynamic regret analysis in non-stationary MDPs with general function approximation.
Refined Regret for Adversarial MDPs with Linear Function Approximation
Dai, Yan, Luo, Haipeng, Wei, Chen-Yu, Zimmert, Julian
We consider learning in an adversarial Markov Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes and the state space can be arbitrarily large. We assume that the Q-function of any policy is linear in some known features, that is, a linear function approximation exists. The best existing regret upper bound for this setting (Luo et al., 2021) is of order $\tilde{\mathcal O}(K^{2/3})$ (omitting all other dependencies), given access to a simulator. This paper provides two algorithms that improve the regret to $\tilde{\mathcal O}(\sqrt K)$ in the same setting. Our first algorithm makes use of a refined analysis of the Follow-the-Regularized-Leader (FTRL) algorithm with the log-barrier regularizer. This analysis allows the loss estimators to be arbitrarily negative and might be of independent interest. Our second algorithm develops a magnitude-reduced loss estimator, further removing the polynomial dependency on the number of actions in the first algorithm and leading to the optimal regret bound (up to logarithmic terms and dependency on the horizon). Moreover, we also extend the first algorithm to simulator-free linear MDPs, which achieves $\tilde{\mathcal O}(K^{8/9})$ regret and greatly improves over the best existing bound $\tilde{\mathcal O}(K^{14/15})$. This algorithm relies on a better alternative to the Matrix Geometric Resampling procedure by Neu & Olkhovskaya (2020), which could again be of independent interest.
Detection of Late Blight Disease in Tomato Leaf Using Image Processing Techniques
Farooq, Muhammad Shoaib, Arif, Tabir, Riaz, Shamyla
=One of the most frequently farmed crops is the tomato crop. Late blight is the most prevalent tomato disease in the world, and often causes a significant reduction in the production of tomato crops. The importance of tomatoes as an agricultural product necessitates early detection of late blight. It is produced by the fungus Phytophthora. The earliest signs of late blight on tomatoes are unevenly formed, water-soaked lesions on the leaves located on the plant canopy's younger leave White cottony growth may appear in humid environments evident on the undersides of the leaves that have been impacted. Lesions increase as the disease proceeds, turning the leaves brown to shrivel up and die. Using picture segmentation and the Multi-class SVM technique, late blight disorder is discovered in this work. Image segmentation is employed for separating damaged areas on leaves, and the Multi-class SVM method is used for reliable disease categorization. 30 reputable studies were chosen from a total of 2770 recognized papers. The primary goal of this study is to compile cutting-edge research that identifies current research trends, problems, and prospects for late blight detection. It also looks at current approaches for applying image processing to diagnose and detect late blight. A suggested taxonomy for late blight detection has also been provided. In the same way, a model for the development of the solutions to problems is also presented. Finally, the research gaps have been presented in terms of open issues for the provision of future directions in image processing for the researchers.
What can online reinforcement learning with function approximation benefit from general coverage conditions?
Liu, Fanghui, Viano, Luca, Cevher, Volkan
In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition (original from offline RL) is enough to ensure sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this new direction by digging more possible and general coverage conditions, and study the potential and the utility of them in efficient online RL. We identify more concepts, including the $L^p$ variant of concentrability, the density ratio realizability, and trade-off on the partial/rest coverage condition, that can be also beneficial to sample-efficient online RL, achieving improved regret bound. Furthermore, if exploratory offline data are used, under our coverage conditions, both statistically and computationally efficient guarantees can be achieved for online RL. Besides, even though the MDP structure is given, e.g., linear MDP, we elucidate that, good coverage conditions are still beneficial to obtain faster regret bound beyond $\widetilde{O}(\sqrt{T})$ and even a logarithmic order regret. These results provide a good justification for the usage of general coverage conditions in efficient online RL.
Sharp high-probability sample complexities for policy evaluation with linear function approximation
Li, Gen, Wu, Weichen, Chi, Yuejie, Ma, Cong, Rinaldo, Alessandro, Wei, Yuting
This paper is concerned with the problem of policy evaluation with linear function approximation in discounted infinite horizon Markov decision processes. We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms: the temporal difference (TD) learning algorithm and the two-timescale linear TD with gradient correction (TDC) algorithm. In both the on-policy setting, where observations are generated from the target policy, and the off-policy setting, where samples are drawn from a behavior policy potentially different from the target policy, we establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level. We also exhihit an explicit dependence on problem-related quantities, and show in the on-policy setting that our upper bound matches the minimax lower bound on crucial problem parameters, including the choice of the feature maps and the problem dimension.
Milestones in Autonomous Driving and Intelligent Vehicles Part I: Control, Computing System Design, Communication, HD Map, Testing, and Human Behaviors
Chen, Long, Li, Yuchen, Huang, Chao, Xing, Yang, Tian, Daxin, Li, Li, Hu, Zhongxu, Teng, Siyu, Lv, Chen, Wang, Jinjun, Cao, Dongpu, Zheng, Nanning, Wang, Fei-Yue
Interest in autonomous driving (AD) and intelligent vehicles (IVs) is growing at a rapid pace due to the convenience, safety, and economic benefits. Although a number of surveys have reviewed research achievements in this field, they are still limited in specific tasks and lack systematic summaries and research directions in the future. Our work is divided into 3 independent articles and the first part is a Survey of Surveys (SoS) for total technologies of AD and IVs that involves the history, summarizes the milestones, and provides the perspectives, ethics, and future research directions. This is the second part (Part I for this technical survey) to review the development of control, computing system design, communication, High Definition map (HD map), testing, and human behaviors in IVs. In addition, the third part (Part II for this technical survey) is to review the perception and planning sections. The objective of this paper is to involve all the sections of AD, summarize the latest technical milestones, and guide abecedarians to quickly understand the development of AD and IVs. Combining the SoS and Part II, we anticipate that this work will bring novel and diverse insights to researchers and abecedarians, and serve as a bridge between past and future.
Three-way Imbalanced Learning based on Fuzzy Twin SVM
Cai, Wanting, Cai, Mingjie, Li, Qingguo, Liu, Qiong
Three-way decision (3WD) is a powerful tool for granular computing to deal with uncertain data, commonly used in information systems, decision-making, and medical care. Three-way decision gets much research in traditional rough set models. However, three-way decision is rarely combined with the currently popular field of machine learning to expand its research. In this paper, three-way decision is connected with SVM, a standard binary classification model in machine learning, for solving imbalanced classification problems that SVM needs to improve. A new three-way fuzzy membership function and a new fuzzy twin support vector machine with three-way membership (TWFTSVM) are proposed. The new three-way fuzzy membership function is defined to increase the certainty of uncertain data in both input space and feature space, which assigns higher fuzzy membership to minority samples compared with majority samples. To evaluate the effectiveness of the proposed model, comparative experiments are designed for forty-seven different datasets with varying imbalance ratios. In addition, datasets with different imbalance ratios are derived from the same dataset to further assess the proposed model's performance. The results show that the proposed model significantly outperforms other traditional SVM-based methods.
An Ensemble Semi-Supervised Adaptive Resonance Theory Model with Explanation Capability for Pattern Classification
Pourpanah, Farhad, Lim, Chee Peng, Etemad, Ali, Wu, Q. M. Jonathan
Most semi-supervised learning (SSL) models entail complex structures and iterative training processes as well as face difficulties in interpreting their predictions to users. To address these issues, this paper proposes a new interpretable SSL model using the supervised and unsupervised Adaptive Resonance Theory (ART) family of networks, which is denoted as SSL-ART. Firstly, SSL-ART adopts an unsupervised fuzzy ART network to create a number of prototype nodes using unlabeled samples. Then, it leverages a supervised fuzzy ARTMAP structure to map the established prototype nodes to the target classes using labeled samples. Specifically, a one-to-many (OtM) mapping scheme is devised to associate a prototype node with more than one class label. The main advantages of SSL-ART include the capability of: (i) performing online learning, (ii) reducing the number of redundant prototype nodes through the OtM mapping scheme and minimizing the effects of noisy samples, and (iii) providing an explanation facility for users to interpret the predicted outcomes. In addition, a weighted voting strategy is introduced to form an ensemble SSL-ART model, which is denoted as WESSL-ART. Every ensemble member, i.e., SSL-ART, assigns {\color{black}a different weight} to each class based on its performance pertaining to the corresponding class. The aim is to mitigate the effects of training data sequences on all SSL-ART members and improve the overall performance of WESSL-ART. The experimental results on eighteen benchmark data sets, three artificially generated data sets, and a real-world case study indicate the benefits of the proposed SSL-ART and WESSL-ART models for tackling pattern classification problems.
Reinforcement Learning with Function Approximation: From Linear to Nonlinear
Function approximation has been an indispensable component in modern reinforcement learning algorithms designed to tackle problems with large state spaces in high dimensions. This paper reviews recent results on error analysis for these reinforcement learning algorithms in linear or nonlinear approximation settings, emphasizing approximation error and estimation error/sample complexity. We discuss various properties related to approximation error and present concrete conditions on transition probability and reward function under which these properties hold true. Sample complexity analysis in reinforcement learning is more complicated than in supervised learning, primarily due to the distribution mismatch phenomenon. With assumptions on the linear structure of the problem, numerous algorithms in the literature achieve polynomial sample complexity with respect to the number of features, episode length, and accuracy, although the minimax rate has not been achieved yet. These results rely on the $L^\infty$ and UCB estimation of estimation error, which can handle the distribution mismatch phenomenon. The problem and analysis become substantially more challenging in the setting of nonlinear function approximation, as both $L^\infty$ and UCB estimation are inadequate for bounding the error with a favorable rate in high dimensions. We discuss additional assumptions necessary to address the distribution mismatch and derive meaningful results for nonlinear RL problems.
Robust Quantum Controllers: Quantum Information -- Thermodynamic Hidden Force Control in Intelligent Robotics based on Quantum Soft Computing
Ulyanov, Sergey V., Ulyanov, Viktor S., Hagiwara, Takakhide
For complex and ill-defined dynamic control objects that are not easily controlled by conventional control systems (such as P-[I]-D-controllers) -- especially in the presence of fuzzy model parameters and different stochastic noises -- the System of Systems Engineering methodology provides fuzzy controllers (FC) as one of alternative way of control systems design. Soft computing methodologies, such as genetic algorithms (GA) and fuzzy neural networks (FNN) had expanded application areas of FC by adding optimization, learning and adaptation features. But still now it is difficult to design optimal and robust intelligent control system, when its operational conditions have to evolve dramatically (aging, sensor failure and so on). Such conditions could be predicted from one hand, but it is difficult to cover such situations by a single FC. Using unconventional computational intelligence toolkit, we propose a solution of such kind of generalization problems by introducing a self-organization design process of robust KB-FC that supported by the Quantum Fuzzy Inference (QFI) based on quantum soft computing ideas [1-3].