AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Federated Multi-Agent Actor-Critic Learning for Age Sensitive Mobile Edge Computing

Zhu, Zheqi, Wan, Shuo, Fan, Pingyi, Letaief, Khaled B.

arXiv.org Artificial IntelligenceJan-6-2021

As an emerging technique, mobile edge computing (MEC) introduces a new processing scheme for various distributed communication-computing systems such as industrial Internet of Things (IoT), vehicular communication, smart city, etc. In this work, we mainly focus on the timeliness of the MEC systems where the freshness of the data and computation tasks is significant. Firstly, we formulate a kind of age-sensitive MEC models and define the average age of information (AoI) minimization problems of interests. Then, a novel policy based multi-agent deep reinforcement learning (RL) framework, called heterogeneous multi-agent actor critic (H-MAAC), is proposed as a paradigm for joint collaboration in the investigated MEC systems, where edge devices and center controller learn the interactive strategies through their own observations. To improves the system performance, we develop the corresponding online algorithm by introducing an edge federated learning mode into the multi-agent cooperation whose advantages on learning convergence can be guaranteed theoretically. To the best of our knowledge, it's the first joint MEC collaboration algorithm that combines the edge federated mode with the multi-agent actor-critic reinforcement learning. Furthermore, we evaluate the proposed approach and compare it with classical RL based methods. As a result, the proposed framework not only outperforms the baseline on average system age, but also promotes the stability of training process. Besides, the simulation results provide some innovative perspectives for the system design under the edge federated collaboration.

agent, edge computing, edge device, (16 more...)

arXiv.org Artificial Intelligence

2012.14137

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.81)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

Wang, Tianhao, Zhou, Dongruo, Gu, Quanquan

arXiv.org Machine LearningJan-6-2021

We study reinforcement learning (RL) with linear function approximation under the adaptivity constraint. We consider two popular limited adaptivity models: batch learning model and rare policy switch model, and propose two efficient online RL algorithms for linear Markov decision processes. In specific, for the batch learning model, our proposed LSVI-UCB-Batch algorithm achieves an $\tilde O(\sqrt{d^3H^3T} + dHT/B)$ regret, where $d$ is the dimension of the feature mapping, $H$ is the episode length, $T$ is the number of interactions and $B$ is the number of batches. Our result suggests that it suffices to use only $\sqrt{T/dH}$ batches to obtain $\tilde O(\sqrt{d^3H^3T})$ regret. For the rare policy switch model, our proposed LSVI-UCB-RareSwitch algorithm enjoys an $\tilde O(\sqrt{d^3H^3T[1+T/(dH)]^{dH/B}})$ regret, which implies that $dH\log T$ policy switches suffice to obtain the $\tilde O(\sqrt{d^3H^3T})$ regret. Our algorithms achieve the same regret as the LSVI-UCB algorithm (Jin et al., 2019), yet with a substantially smaller amount of adaptivity.

algorithm, batch, rare policy switch model, (11 more...)

arXiv.org Machine Learning

2101.02195

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Asia > Middle East > Jordan (0.04)
North America > United States > Connecticut > New Haven County > New Haven (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.78)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Enhanced Audit Techniques Empowered by the Reinforcement Learning Pertaining to IFRS 16 Lease

Choi, Byungryul

arXiv.org Artificial IntelligenceJan-5-2021

The purpose of accounting audit is to have clear understanding on the financial activities of a company, which can be enhanced by machine learning or reinforcement learning as numeric analysis better than manual analysis can be made. For the purpose of assessment on the relevance, completeness and accuracy of the information produced by entity pertaining to the newly implemented International Financial Reporting Standard 16 Lease (IFRS 16) is one of such candidates as its characteristic of requiring the understanding on the nature of contracts and its complete analysis from listing up without omission, which can be enhanced by the digitalization of contracts for the purpose of creating the lists, still leaving the need of auditing cash flows of companies for the possible omission due to the potential error at the stage of data collection, especially for entities with various short or middle term business sites and related leases, such as construction entities. The implementation of the reinforcement learning and its well-known code is to be made for the purpose of drawing the possibility and utilizability of interpreters from domain knowledge to numerical system, also can be called 'gamification interpreter' or 'numericalization interpreter' which can be referred or compared to the extrapolation with nondimensional numbers, such as Froude Number, in physics, which was a source of inspiration at this study. Studies on the interpreters can be able to empower the utilizability of artificial general intelligence in domain and commercial area.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2101.05633

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report (0.40)

Industry:

Banking & Finance (1.00)
Leisure & Entertainment > Games > Computer Games (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Reinforcement Learning based Collective Entity Alignment with Adaptive Features

Zeng, Weixin, Zhao, Xiang, Tang, Jiuyang, Lin, Xuemin, Groth, Paul

arXiv.org Artificial IntelligenceJan-5-2021

Entity alignment (EA) is the task of identifying the entities that refer to the same real-world object but are located in different knowledge graphs (KGs). For entities to be aligned, existing EA solutions treat them separately and generate alignment results as ranked lists of entities on the other side. Nevertheless, this decision-making paradigm fails to take into account the interdependence among entities. Although some recent efforts mitigate this issue by imposing the 1-to-1 constraint on the alignment process, they still cannot adequately model the underlying interdependence and the results tend to be sub-optimal. To fill in this gap, in this work, we delve into the dynamics of the decision-making process, and offer a reinforcement learning (RL) based model to align entities collectively. Under the RL framework, we devise the coherence and exclusiveness constraints to characterize the interdependence and restrict collective alignment. Additionally, to generate more precise inputs to the RL framework, we employ representative features to capture different aspects of the similarity between entities in heterogeneous KGs, which are integrated by an adaptive feature fusion strategy. Our proposal is evaluated on both cross-lingual and mono-lingual EA benchmarks and compared against state-of-the-art solutions. The empirical results verify its effectiveness and superiority.

information, source entity, target entity, (12 more...)

arXiv.org Artificial Intelligence

2101.01353

Country:

Europe > Switzerland > Valais > Sion (0.14)
North America > United States > New York > New York County > New York City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(25 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Meta Variationally Intrinsic Motivated Reinforcement Learning for Decentralized Traffic Signal Control

Zhu, Liwen, Peng, Peixi, Lu, Zongqing, Wang, Xiangqian, Tian, Yonghong

arXiv.org Artificial IntelligenceJan-5-2021

The goal of traffic signal control is to coordinate multiple traffic signals to improve the traffic efficiency of a district or a city. In this work, we propose a novel Meta Variationally Intrinsic Motivated (MetaVIM) RL method, and aim to learn the decentralized polices of each traffic signal only conditioned on its local observation. MetaVIM makes three novel contributions. Firstly, to make the model available to new unseen target scenarios, we formulate the traffic signal control as a meta-learning problem over a set of related tasks. The train scenario is divided as multiple partially observable Markov decision process (POMDP) tasks, and each task corresponds to a traffic light. In each task, the neighbours are regarded as an unobserved part of the state. Secondly, we assume that the reward, transition and policy functions vary across different tasks but share a common structure, and a learned latent variable conditioned on the past trajectories is proposed for each task to represent the specific information of the current task in these functions, then is further brought into policy for automatically trade off between exploration and exploitation to induce the RL agent to choose the reasonable action. In addition, to make the policy learning stable, four decoders are introduced to predict the received observations and rewards of the current agent with/without neighbour agents' policies, and a novel intrinsic reward is designed to encourage the received observation and reward invariant to the neighbour agents. Empirically, extensive experiments conducted on CityFlow demonstrate that the proposed method substantially outperforms existing methods and shows superior generalizability.

agent, intersection, traffic signal control, (14 more...)

arXiv.org Artificial Intelligence

2101.00746

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.05)
North America > United States > New York (0.05)
(2 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Zhang, Kaiqing, Zhang, Xiangyuan, Hu, Bin, Başar, Tamer

arXiv.org Artificial IntelligenceJan-4-2021

Direct policy search serves as one of the workhorses in modern reinforcement learning (RL), and its applications in continuous control tasks have recently attracted increasing attention. In this work, we investigate the convergence theory of policy gradient (PG) methods for learning the linear risk-sensitive and robust controller. In particular, we develop PG methods that can be implemented in a derivative-free fashion by sampling system trajectories, and establish both global convergence and sample complexity results in the solutions of two fundamental settings in risk-sensitive and robust control: the finite-horizon linear exponential quadratic Gaussian, and the finite-horizon linear-quadratic disturbance attenuation problems. As a by-product, our results also provide the first sample complexity for the global convergence of PG methods on solving zero-sum linear-quadratic dynamic games, a nonconvex-nonconcave minimax optimization problem that serves as a baseline setting in multi-agent reinforcement learning (MARL) with continuous spaces. One feature of our algorithms is that during the learning phase, a certain level of robustness/risk-sensitivity of the controller is preserved, which we termed as the implicit regularization property, and is an essential requirement in safety-critical control systems.

matrix, probability, sequence, (14 more...)

arXiv.org Artificial Intelligence

2101.01041

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

A Pushing-Grasping Collaborative Method Based on Deep Q-Network Algorithm in Dual Perspectives

Gang, Peng, Jinhu, Liao, Shangbin, Guan

arXiv.org Artificial IntelligenceJan-4-2021

Aiming at the traditional grasping method for manipulators based on 2D camera, when faced with the scene of gathering or covering, it can hardly perform well in unstructured scenes that appear as gathering and covering, for the reason that can't recognize objects accurately in cluster scenes from a single perspective and the manipulators can't make the environment better for grasping. In this case, a novel method of pushing-grasping collaborative based on the deep Q-network in dual perspectives is proposed in this paper. This method adopts an improved deep Q network algorithm, with an RGB-D camera to obtain the information of objects' RGB images and point clouds from two perspectives, and combines the pushing and grasping actions so that the trained manipulator can make the scenes better for grasping so that it can perform well in more complicated grasping scenes. What's more, we improved the reward function of the deep Q-network and propose the piecewise reward function to speed up the convergence of the deep Q-network. We trained different models and tried different methods in the V-REP simulation environment, and it concluded that the method proposed in this paper converges quickly and the success rate of grasping objects in unstructured scenes raises up to 83.5%. Besides, it shows the generalization ability and well performance when novel objects appear in the scenes that the manipulator has never grasped before.

dual perspective, manipulator, reinforcement, (15 more...)

arXiv.org Artificial Intelligence

2101.00829

Country:

Asia > China > Hubei Province > Wuhan (0.05)
Asia > China > Heilongjiang Province > Harbin (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Enhanced Pub/Sub Communications for Massive IoT Traffic with SARSA Reinforcement Learning

Arruda, Carlos E., Moraes, Pedro F., Agoulmine, Nazim, Martins, Joberto S. B.

arXiv.org Artificial IntelligenceJan-3-2021

Sensors are being extensively deployed and are expected to expand at significant rates in the coming years. They typically generate a large volume of data on the internet of things (IoT) application areas like smart cities, intelligent traffic systems, smart grid, and e-health. Cloud, edge and fog computing are potential and competitive strategies for collecting, processing, and distributing IoT data. However, cloud, edge, and fog-based solutions need to tackle the distribution of a high volume of IoT data efficiently through constrained and limited resource network infrastructures. This paper addresses the issue of conveying a massive volume of IoT data through a network with limited communications resources (bandwidth) using a cognitive communications resource allocation based on Reinforcement Learning (RL) with SARSA algorithm. The proposed network infrastructure (PSIoTRL) uses a Publish/ Subscribe architecture to access massive and highly distributed IoT data. It is demonstrated that the PSIoTRL bandwidth allocation for buffer flushing based on SARSA enhances the IoT aggregator buffer occupation and network link utilization.

algorithm, allocation, bandwidth, (13 more...)

arXiv.org Artificial Intelligence

2101.00687

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(4 more...)

Genre:

Research Report (0.50)
Overview (0.46)

Industry:

Transportation > Ground > Road (0.68)
Transportation > Electric Vehicle (0.68)
Automobiles & Trucks (0.68)
(3 more...)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

POPO: Pessimistic Offline Policy Optimization

He, Qiang, Hou, Xinwen

arXiv.org Artificial IntelligenceJan-3-2021

Offline reinforcement learning (RL), also known as batch RL, aims to optimize policy from a large pre-recorded dataset without interaction with the environment. This setting offers the promise of utilizing diverse, pre-collected datasets to obtain policies without costly, risky, active exploration. However, commonly used off-policy algorithms based on Q-learning or actor-critic perform poorly when learning from a static dataset. In this work, we study why off-policy RL methods fail to learn in offline setting from the value function view, and we propose a novel offline RL algorithm that we call Pessimistic Offline Policy Optimization (POPO), which learns a pessimistic value function to get a strong policy. We find that POPO performs surprisingly well and scales to tasks with high-dimensional state and action space, comparing or outperforming several state-of-the-art offline RL algorithms on benchmark tasks.

algorithm, learning, value function, (14 more...)

arXiv.org Artificial Intelligence

2012.13682

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Joint Learning and Communication Framework for Multi-Agent Reinforcement Learning over Noisy Channels

Tung, Tze-Yang, Pujol, Joan Roig, Kobus, Szymon, Gunduz, Deniz

arXiv.org Artificial IntelligenceJan-2-2021

We propose a novel formulation of the "effectiveness problem" in communications, put forth by Shannon and Weaver in their seminal work [2], by considering multiple agents communicating over a noisy channel in order to achieve better coordination and cooperation in a multi-agent reinforcement learning (MARL) framework. Specifically, we consider a multi-agent partially observable Markov decision process (MA-POMDP), in which the agents, in addition to interacting with the environment can also communicate with each other over a noisy communication channel. The noisy communication channel is considered explicitly as part of the dynamics of the environment and the message each agent sends is part of the action that the agent can take. As a result, the agents learn not only to collaborate with each other but also to communicate "effectively" over a noisy channel. This framework generalizes both the traditional communication problem, where the main goal is to convey a message reliably over a noisy channel, and the "learning to communicate" framework that has received recent attention in the MARL literature, where the underlying communication channels are assumed to be error-free. We show via examples that the joint policy learned using the proposed framework is superior to that where the communication is considered separately from the underlying MA-POMDP. This is a very powerful framework, which has many real world applications, from autonomous vehicle planning to drone swarm control, and opens up the rich toolbox of deep reinforcement learning for the design of multi-user communication systems. This work was supported in part by the European Research Council (ERC) Starting Grant BEACON (grant agreement no. An earlier version of this work was presented at the IEEE Global Communications Conference (GLOBECOM) in December 2020 [1]. Communication is essential for our society. Humans use language to communicate ideas, which has given rise to complex social structures, and scientists have observed either gestural or vocal communication in other animal groups, complexity of which increases with the complexity of the social structure of the group [3].

agent, communication, communication channel, (15 more...)

arXiv.org Artificial Intelligence

2101.10369

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback