AITopics

doi: 10.17762/ijritcc.v11i9.8342

2311.01842

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
Asia > Singapore (0.04)
Asia > India > Uttarakhand (0.04)
(6 more...)

Genre:

Research Report (1.00)
Overview (0.67)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.67)

arXiv.org Machine LearningNov-3-2023

Posterior Sampling with Delayed Feedback for Reinforcement Learning with Linear Function Approximation

Kuang, Nikki Lijing, Yin, Ming, Wang, Mengdi, Wang, Yu-Xiang, Ma, Yi-An

Recent studies in reinforcement learning (RL) have made significant progress by leveraging function approximation to alleviate the sample complexity hurdle for better performance. Despite the success, existing provably efficient algorithms typically rely on the accessibility of immediate feedback upon taking actions. The failure to account for the impact of delay in observations can significantly degrade the performance of real-world systems due to the regret blow-up. In this work, we tackle the challenge of delayed feedback in RL with linear function approximation by employing posterior sampling, which has been shown to empirically outperform the popular UCB algorithms in a wide range of regimes. We first introduce Delayed-PSVI, an optimistic value-based algorithm that effectively explores the value function space via noise perturbation with posterior sampling. We provide the first analysis for posterior sampling algorithms with delayed feedback in RL and show our algorithm achieves $\widetilde{O}(\sqrt{d^3H^3 T} + d^2H^2 E[\tau])$ worst-case regret in the presence of unknown stochastic delays. Here $E[\tau]$ is the expected delay. To further improve its computational efficiency and to expand its applicability in high-dimensional RL problems, we incorporate a gradient-based approximate sampling scheme via Langevin dynamics for Delayed-LPSVI, which maintains the same order-optimal regret guarantee with $\widetilde{O}(dHK)$ computational cost. Empirical evaluations are performed to demonstrate the statistical and computational efficacy of our algorithms.

machine learning, probability 1, reinforcement learning, (16 more...)

2310.18919

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Barbara County > Santa Barbara (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.48)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)

Vaswani, Sharan, Kazemi, Amirreza, Babanezhad, Reza, Roux, Nicolas Le

Decision-Aware Actor-Critic with Function Approximation and Theoretical Guarantees

arXiv.org Artificial IntelligenceOct-30-2023

Actor-critic (AC) methods are widely used in reinforcement learning (RL) and benefit from the flexibility of using any policy gradient method as the actor and value-based method as the critic. The critic is usually trained by minimizing the TD error, an objective that is potentially decorrelated with the true goal of achieving a high reward with the actor. We address this mismatch by designing a joint objective for training the actor and critic in a decision-aware fashion. We use the proposed objective to design a generic, AC algorithm that can easily handle any function approximation. We explicitly characterize the conditions under which the resulting algorithm guarantees monotonic policy improvement, regardless of the choice of the policy and critic parameterization. Instantiating the generic algorithm results in an actor that involves maximizing a sequence of surrogate functions (similar to TRPO, PPO) and a critic that involves minimizing a closely connected objective. Using simple bandit examples, we provably establish the benefit of the proposed critic objective over the standard squared error. Finally, we empirically demonstrate the benefit of our decision-aware actor-critic framework on simple RL problems.

exp, parameterization, representation, (14 more...)

2305.15249

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.61)

arXiv.org Machine LearningOct-30-2023

Posterior Sampling for Competitive RL: Function Approximation and Partial Observation

Qiu, Shuang, Dai, Ziyu, Zhong, Han, Wang, Zhaoran, Yang, Zhuoran, Zhang, Tong

This paper investigates posterior sampling algorithms for competitive reinforcement learning (RL) in the context of general function approximations. Focusing on zero-sum Markov games (MGs) under two critical settings, namely self-play and adversarial learning, we first propose the self-play and adversarial generalized eluder coefficient (GEC) as complexity measures for function approximation, capturing the exploration-exploitation trade-off in MGs. Based on self-play GEC, we propose a model-based self-play posterior sampling method to control both players to learn Nash equilibrium, which can successfully handle the partial observability of states. Furthermore, we identify a set of partially observable MG models fitting MG learning with the adversarial policies of the opponent. Incorporating the adversarial GEC, we propose a model-based posterior sampling method for learning adversarial MG with potential partial observability. We further provide low regret bounds for proposed algorithms that can scale sublinearly with the proposed GEC and the number of episodes $T$. To the best of our knowledge, we for the first time develop generic model-based posterior sampling algorithms for competitive RL that can be applied to a majority of tractable zero-sum MG classes in both fully observable and partially observable MGs with self-play and adversarial learning.

machine learning, pomg, reinforcement learning, (17 more...)

2310.19861

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceOct-28-2023

A Fuzzy Time Series-Based Model Using Particle Swarm Optimization and Weighted Rules

Ortiz-Arroyo, Daniel

During the last decades, a myriad of fuzzy time series models have been proposed in scientific literature. Among the most accurate models found in fuzzy time series, the high-order ones are the most accurate. The research described in this paper tackles three potential limitations associated with the application of high-order fuzzy time series models. To begin with, the adequacy of forecast rules lacks consistency. Secondly, as the model's order increases, data utilization diminishes. Thirdly, the uniformity of forecast rules proves to be highly contingent on the chosen interval partitions. To address these likely drawbacks, we introduce a novel model based on fuzzy time series that amalgamates the principles of particle swarm optimization (PSO) and weighted summation. Our results show that our approach models accurately the time series in comparison with previous methods.

enrollment, fuzzy time sery, time sery, (14 more...)

2310.18825

Country: North America > United States > Alabama (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

arXiv.org Machine LearningOct-27-2023

Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds

Huang, Jiayi, Zhong, Han, Wang, Liwei, Yang, Lin F.

While numerous works have focused on devising efficient algorithms for reinforcement learning (RL) with uniformly bounded rewards, it remains an open question whether sample or time-efficient algorithms for RL with large state-action space exist when the rewards are \emph{heavy-tailed}, i.e., with only finite $(1+\epsilon)$-th moments for some $\epsilon\in(0,1]$. In this work, we address the challenge of such rewards in RL with linear function approximation. We first design an algorithm, \textsc{Heavy-OFUL}, for heavy-tailed linear bandits, achieving an \emph{instance-dependent} $T$-round regret of $\tilde{O}\big(d T^{\frac{1-\epsilon}{2(1+\epsilon)}} \sqrt{\sum_{t=1}^T \nu_t^2} + d T^{\frac{1-\epsilon}{2(1+\epsilon)}}\big)$, the \emph{first} of this kind. Here, $d$ is the feature dimension, and $\nu_t^{1+\epsilon}$ is the $(1+\epsilon)$-th central moment of the reward at the $t$-th round. We further show the above bound is minimax optimal when applied to the worst-case instances in stochastic and deterministic linear bandits. We then extend this algorithm to the RL settings with linear function approximation. Our algorithm, termed as \textsc{Heavy-LSVI-UCB}, achieves the \emph{first} computationally efficient \emph{instance-dependent} $K$-episode regret of $\tilde{O}(d \sqrt{H \mathcal{U}^*} K^\frac{1}{1+\epsilon} + d \sqrt{H \mathcal{V}^* K})$. Here, $H$ is length of the episode, and $\mathcal{U}^*, \mathcal{V}^*$ are instance-dependent quantities scaling with the central moment of reward and value functions, respectively. We also provide a matching minimax lower bound $\Omega(d H K^{\frac{1}{1+\epsilon}} + d \sqrt{H^3 K})$ to demonstrate the optimality of our algorithm in the worst case. Our result is achieved via a novel robust self-normalized concentration inequality that may be of independent interest in handling heavy-tailed noise in general online regression problems.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2306.06836

Country:

Asia > Middle East > Jordan (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > California (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.80)

Niemczynowicz, Agnieszka, Kycia, Radosław A., Jaworski, Maciej, Siemaszko, Artur, Calabuig, Jose M., García-Raffi, Lluis M., Schneider, Baruch, Berseghyan, Diana, Perfiljeva, Irina, Novak, Vilem, Artiemjew, Piotr

Selected aspects of complex, hypercomplex and fuzzy neural networks

arXiv.org Artificial IntelligenceOct-20-2023

This short report reviews the current state of the research and methodology on theoretical and practical aspects of Artificial Neural Networks (ANN). It was prepared to gather state-of-the-art knowledge needed to construct complex, hypercomplex and fuzzy neural networks. The report reflects the individual interests of the authors and, by now means, cannot be treated as a comprehensive review of the ANN discipline. Considering the fast development of this field, it is currently impossible to do a detailed review of a considerable number of pages. The report is an outcome of the Project 'The Strategic Research Partnership for the mathematical aspects of complex, hypercomplex and fuzzy neural networks' meeting at the University of Warmia and Mazury in Olsztyn, Poland, organized in September 2022.

architecture, dh 3, neural network, (13 more...)

2301.00007

Country:

Europe > Poland > Warmia-Masuria Province > Olsztyn (0.24)
Europe > Poland > Lesser Poland Province > Kraków (0.14)
Europe > Czechia > Moravian-Silesian Region > Ostrava (0.04)
(13 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)

arXiv.org Artificial IntelligenceOct-18-2023

Rethinking Complex Queries on Knowledge Graphs with Neural Link Predictors

Yin, Hang, Wang, Zihao, Song, Yangqiu

Reasoning on knowledge graphs is a challenging task because it utilizes observed information to predict the missing one. Particularly, answering complex queries based on first-order logic is one of the crucial tasks to verify learning to reason abilities for generalization and composition. Recently, the prevailing method is query embedding which learns the embedding of a set of entities and treats logic operations as set operations and has shown great empirical success. Though there has been much research following the same formulation, many of its claims lack a formal and systematic inspection. In this paper, we rethink this formulation and justify many of the previous claims by characterizing the scope of queries investigated previously and precisely identifying the gap between its formulation and its goal, as well as providing complexity analysis for the currently investigated queries. Moreover, we develop a new dataset containing ten new types of queries with features that have never been considered and therefore can provide a thorough investigation of complex queries. Finally, we propose a new neural-symbolic method, Fuzzy Inference with Truth value (FIT), where we equip the neural link predictors with fuzzy logic theory to support end-to-end learning using complex queries with provable reasoning capability. Empirical results show that our method outperforms previous methods significantly in the new dataset and also surpasses previous methods in the existing dataset at the same time.

knowledge graph, neural link predictor, rethinking complex query

2304.07063

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.53)

arXiv.org Artificial IntelligenceOct-13-2023

A Survey of Methods for Handling Disk Data Imbalance

Yuan, Shuangshuang, Wu, Peng, Chen, Yuehui, Li, Qiang

Class imbalance exists in many classification problems, and since the data is designed for accuracy, imbalance in data classes can lead to classification challenges with a few classes having higher misclassification costs. The Backblaze dataset, a widely used dataset related to hard discs, has a small amount of failure data and a large amount of health data, which exhibits a serious class imbalance. This paper provides a comprehensive overview of research in the field of imbalanced data classification. The discussion is organized into three main aspects: data-level methods, algorithmic-level methods, and hybrid methods. For each type of method, we summarize and analyze the existing problems, algorithmic ideas, strengths, and weaknesses. Additionally, the challenges of unbalanced data classification are discussed, along with strategies to address them. It is convenient for researchers to choose the appropriate method according to their needs.

algorithm, dataset, imbalance, (14 more...)

2310.08867

Country:

Asia > China > Shandong Province > Jinan (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States (0.04)
(8 more...)

Genre: Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Huang, Jiawei, Yardim, Batuhan, He, Niao

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

arXiv.org Machine LearningOct-13-2023

In this paper, we study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MF-MBED), which characterizes the inherent complexity of mean-field model classes. We show that low MF-MBED subsumes a rich family of Mean-Field RL problems. Additionally, we propose algorithms based on maximal likelihood estimation, which can return an $\epsilon$-optimal policy for MFC or an $\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial w.r.t. relevant parameters and independent of the number of states, actions and agents. Compared with previous works, our results only require the minimal assumptions including realizability and Lipschitz continuity.

artificial intelligence, fuzzy logic, machine learning, (14 more...)

2305.11283

Country:

Asia > Middle East > Jordan (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)