AITopics

2106.14417

Country:

North America > United States > California > San Francisco County > San Francisco (0.13)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.13)
Europe > France > Occitanie > Hérault > Montpellier (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Energy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)
Health & Medicine > Therapeutic Area > Immunology (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.67)

Pratama, Mahardhika, Za'in, Choiru, Lughofer, Edwin, Pardede, Eric, Rahayu, Dwi A. P.

Scalable Teacher Forcing Network for Semi-Supervised Large Scale Data Streams

arXiv.org Artificial IntelligenceJun-25-2021

The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction ($DA^3$) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only $25\%$ label proportions. It shows highly competitive performance even if compared with fully supervised learners with $100\%$ label proportions.

base learner, classifier, wescatternet, (16 more...)

2107.02943

Country:

Oceania > Australia (0.04)
Asia > Singapore (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(9 more...)

Genre:

Instructional Material (0.68)
Research Report (0.63)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

arXiv.org Machine LearningJun-22-2021

Variance-Aware Off-Policy Evaluation with Linear Function Approximation

Min, Yifei, Wang, Tianhao, Zhou, Dongruo, Gu, Quanquan

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear function approximation, which aims to estimate the value function of a target policy based on the offline data collected by a behavior policy. We propose to incorporate the variance information of the value function to improve the sample efficiency of OPE. More specifically, for time-inhomogeneous episodic linear Markov decision processes (MDPs), we propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration. We show that our algorithm achieves a tighter error bound than the best-known result. We also provide a fine-grained characterization of the distribution shift between the behavior policy and the target policy. Extensive numerical experiments corroborate our theory.

linear function approximation, variance-aware off-policy evaluation

2106.1196

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.60)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

arXiv.org Machine LearningJun-22-2021

Uniform-PAC Bounds for Reinforcement Learning with Linear Function Approximation

He, Jiafan, Zhou, Dongruo, Gu, Quanquan

We study reinforcement learning (RL) with linear function approximation. Existing algorithms for this problem only have high-probability regret and/or Probably Approximately Correct (PAC) sample complexity guarantees, which cannot guarantee the convergence to the optimal policy. In this paper, in order to overcome the limitation of existing algorithms, we propose a new algorithm called FLUTE, which enjoys uniform-PAC convergence to the optimal policy with high probability. The uniform-PAC guarantee is the strongest possible guarantee for reinforcement learning in the literature, which can directly imply both PAC and high probability regret bounds, making our algorithm superior to all existing algorithms with linear function approximation. At the core of our algorithm is a novel minimax value function estimator and a multi-level partition scheme to select the training samples from historical observations. Both of these techniques are new and of independent interest.

linear function approximation, reinforcement learning, uniform-pac bound

2106.11612

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)

Sevakula, Rahul Kumar, Verma, Nishchal Kumar, Ishibuchi, Hisao

On fine-tuning of Autoencoders for Fuzzy rule classifiers

arXiv.org Artificial IntelligenceJun-21-2021

Recent discoveries in Deep Neural Networks are allowing researchers to tackle some very complex problems such as image classification and audio classification, with improved theoretical and empirical justifications. This paper presents a novel scheme to incorporate the use of autoencoders in Fuzzy rule classifiers (FRC). Autoencoders when stacked can learn the complex non-linear relationships amongst data, and the proposed framework built towards FRC can allow users to input expert knowledge to the system. This paper further introduces four novel fine-tuning strategies for autoencoders to improve the FRC's classification and rule reduction performance. The proposed framework has been tested across five real-world benchmark datasets. Elaborate comparisons with over 15 previous studies, and across 10-fold cross validation performance, suggest that the proposed methods are capable of building FRCs which can provide state of the art accuracies.

dataset, frc, ieee trans, (13 more...)

2106.11182

Country: North America > United States > New York (0.04)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Bisicchia, Giuseppe, Forti, Stefano, Brogi, Antonio

A Declarative Goal-oriented Framework for Smart Environments with LPaaS

arXiv.org Artificial IntelligenceJun-18-2021

Smart environments powered by the Internet of Things aim at improving our daily lives by automatically tuning ambient parameters (e.g. temperature, interior light) and by achieving energy savings through self-managing cyber-physical systems. Commercial solutions, however, only permit setting simple target goals on those parameters and do not consider mediating conflicting goals among different users and/or system administrators, and feature limited compatibility across different IoT verticals. In this article, we propose a declarative framework to represent smart environments, user-set goals and customisable mediation policies to reconcile contrasting goals encompassing multiple IoT systems. An open-source Prolog prototype of the framework is showcased over two lifelike motivating examples.

artificial intelligence, expert system, fuzzy logic, (19 more...)

2106.13083

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
North America > United States (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Smart Houses & Appliances (1.00)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)
(3 more...)

Kong, Dingwen, Salakhutdinov, Ruslan, Wang, Ruosong, Yang, Lin F.

Online Sub-Sampling for Reinforcement Learning with General Function Approximation

arXiv.org Machine LearningJun-14-2021

Designing provably efficient algorithms with general function approximation is an important open problem in reinforcement learning. Recently, Wang et al.~[2020c] establish a value-based algorithm with general function approximation that enjoys $\widetilde{O}(\mathrm{poly}(dH)\sqrt{K})$\footnote{Throughout the paper, we use $\widetilde{O}(\cdot)$ to suppress logarithm factors. } regret bound, where $d$ depends on the complexity of the function class, $H$ is the planning horizon, and $K$ is the total number of episodes. However, their algorithm requires $\Omega(K)$ computation time per round, rendering the algorithm inefficient for practical use. In this paper, by applying online sub-sampling techniques, we develop an algorithm that takes $\widetilde{O}(\mathrm{poly}(dH))$ computation time per round on average, and enjoys nearly the same regret bound. Furthermore, the algorithm achieves low switching cost, i.e., it changes the policy only $\widetilde{O}(\mathrm{poly}(dH))$ times during its execution, making it appealing to be implemented in real-life scenarios. Moreover, by using an upper-confidence based exploration-driven reward function, the algorithm provably explores the environment in the reward-free setting. In particular, after $\widetilde{O}(\mathrm{poly}(dH))/\epsilon^2$ rounds of exploration, the algorithm outputs an $\epsilon$-optimal policy for any given reward function.

algorithm, dataset, probability, (15 more...)

2106.07203

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

arXiv.org Machine LearningJun-14-2021

Randomized Exploration for Reinforcement Learning with General Value Function Approximation

Ishfaq, Haque, Cui, Qiwen, Nguyen, Viet, Ayoub, Alex, Yang, Zhuoran, Wang, Zhaoran, Precup, Doina, Yang, Lin F.

We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm as well as the optimism principle. Unlike existing upper-confidence-bound (UCB) based approaches, which are often computationally intractable, our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. To attain optimistic value function estimation without resorting to a UCB-style bonus, we introduce an optimistic reward sampling procedure. When the value functions can be represented by a function class $\mathcal{F}$, our algorithm achieves a worst-case regret bound of $\widetilde{O}(\mathrm{poly}(d_EH)\sqrt{T})$ where $T$ is the time elapsed, $H$ is the planning horizon and $d_E$ is the $\textit{eluder dimension}$ of $\mathcal{F}$. In the linear setting, our algorithm reduces to LSVI-PHE, a variant of RLSVI, that enjoys an $\widetilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret. We complement the theory with an empirical evaluation across known difficult exploration tasks.

probability, randomized exploration, reinforcement learning, (11 more...)

2106.07841

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Quebec > Montreal (0.14)
Asia > Middle East > Jordan (0.04)
North America > Canada > Alberta (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.42)

Barakat, Anas, Bianchi, Pascal, Lehmann, Julien

Analysis of a Target-Based Actor-Critic Algorithm with Linear Function Approximation

arXiv.org Machine LearningJun-14-2021

Actor-critic methods integrating target networks have exhibited a stupendous empirical success in deep reinforcement learning. However, a theoretical understanding of the use of target networks in actor-critic methods is largely missing in the literature. In this paper, we bridge this gap between theory and practice by proposing the first theoretical analysis of an online target-based actor-critic algorithm with linear function approximation in the discounted reward setting. Our algorithm uses three different timescales: one for the actor and two for the critic. Instead of using the standard single timescale temporal difference (TD) learning algorithm as a critic, we use a two timescales target-based version of TD learning closely inspired from practical actor-critic algorithms implementing target networks. First, we establish asymptotic convergence results for both the critic and the actor under Markovian sampling. Then, we provide a finite-time analysis showing the impact of incorporating a target network into actor-critic methods.

actor-critic algorithm, algorithm, sequence, (15 more...)

2106.07472

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Annabestani, Mohsen, Abedi, Ali, Nematollahi, Mohammad Reza, Sis-tani, Mohammad Bagher Naghibi

A new soft computing method for integration of expert's knowledge in reinforcement learn-ing problems

arXiv.org Artificial IntelligenceJun-13-2021

This paper proposes a novel fuzzy action selection method to leverage human knowledge in reinforcement learning problems. Based on the estimates of the most current action-state values, the proposed fuzzy nonlinear mapping as-signs each member of the action set to its probability of being chosen in the next step. A user tunable parameter is introduced to control the action selection policy, which determines the agent's greedy behavior throughout the learning process. This parameter resembles the role of the temperature parameter in the softmax action selection policy, but its tuning process can be more knowledge-oriented since this parameter reflects the human knowledge into the learning agent by making modifications in the fuzzy rule base. Simulation results indicate that including fuzzy logic within the reinforcement learning in the proposed manner improves the learning algorithm's convergence rate, and provides superior performance.

knowledge, membership function, reinforcement learning, (15 more...)

2106.07088

Country:

North America > United States (0.05)
Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
Asia > Singapore > Central Region > Singapore (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)