AITopics | policy reuse

Collaborating Authors

policy reuse

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

8fae6a68aaf1e05bfd90375755b63821-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 15:03:29 GMT

artificial intelligence, bayesian inference, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Beyond Single Stationary Policies: Meta-Task Players as Naturally Superior Collaborators

Neural Information Processing SystemsOct-10-2025, 09:29:41 GMT

We provide theoretical guarantees for CBPR's rapid convergence to the optimal policy once human partners alter their policies.

agent, cbpr, experiment, (15 more...)

Neural Information Processing Systems

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Indiana > St. Joseph County > Notre Dame (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach

Nagib, Ahmad M., Abou-Zeid, Hatem, Hassanein, Hossam S.

arXiv.org Artificial IntelligenceSep-18-2023

The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when encountering previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.

agent, drl agent, expert policy, (15 more...)

arXiv.org Artificial Intelligence

2309.07265

Country:

North America > Canada > Alberta > Census Division No. 6 > Calgary Metropolitan Region > Calgary (0.14)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
North America > Canada > Ontario > Kingston (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.46)
Personal > Honors (0.46)

Industry:

Leisure & Entertainment > Games > Computer Games (0.48)
Telecommunications > Networks (0.46)
Education > Educational Setting (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

CUP: Critic-Guided Policy Reuse

Zhang, Jin, Li, Siyuan, Zhang, Chongjie

arXiv.org Artificial IntelligenceOct-14-2022

The ability to reuse previous policies is an important aspect of human intelligence. To achieve efficient policy reuse, a Deep Reinforcement Learning (DRL) agent needs to decide when to reuse and which source policies to reuse. Previous methods solve this problem by introducing extra components to the underlying algorithm, such as hierarchical high-level policies over source policies, or estimations of source policies' value functions on the target task. However, training these components induces either optimization non-stationarity or heavy sampling cost, significantly impairing the effectiveness of transfer. To tackle this problem, we propose a novel policy reuse algorithm called Critic-gUided Policy reuse (CUP), which avoids training any extra components and efficiently reuses source policies. CUP utilizes the critic, a common component in actor-critic methods, to evaluate and choose source policies. At each state, CUP chooses the source policy that has the largest one-step improvement over the current target policy, and forms a guidance policy. The guidance policy is theoretically guaranteed to be a monotonic improvement over the current target policy. Then the target policy is regularized to imitate the guidance policy to perform efficient policy search. Empirical results demonstrate that CUP achieves efficient transfer and significantly outperforms baseline algorithms.

machine learning, reinforcement learning, source policy, (13 more...)

arXiv.org Artificial Intelligence

2210.08153

Country: Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)

Add feedback

Quantum Architecture Search via Continual Reinforcement Learning

Ye, Esther, Chen, Samuel Yen-Chi

arXiv.org Artificial IntelligenceDec-10-2021

Quantum computing has promised significant improvement in solving difficult computational tasks over classical computers. Designing quantum circuits for practical use, however, is not a trivial objective and requires expert-level knowledge. To aid this endeavor, this paper proposes a machine learning-based method to construct quantum circuit architectures. Previous works have demonstrated that classical deep reinforcement learning (DRL) algorithms can successfully construct quantum circuit architectures without encoded physics knowledge. However, these DRL-based works are not generalizable to settings with changing device noises, thus requiring considerable amounts of training resources to keep the RL models up-to-date. With this in mind, we incorporated continual learning to enhance the performance of our algorithm. In this paper, we present the Probabilistic Policy Reuse with deep Q-learning (PPR-DQL) framework to tackle this circuit design challenge. By conducting numerical simulations over various noise patterns, we demonstrate that the RL agent with PPR was able to find the quantum gate sequence to generate the two-qubit Bell state faster than the agent that was trained from scratch. The proposed framework is general and can be applied to other quantum gate synthesis or control problems -- including the automatic calibration of quantum devices.

agent, arxiv preprint arxiv, learning, (11 more...)

arXiv.org Artificial Intelligence

2112.05779

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.81)

Industry:

Education (1.00)
Energy (0.67)
Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lifetime policy reuse and the importance of task capacity

Bossens, David M., Sobey, Adam J.

arXiv.org Artificial IntelligenceJun-3-2021

A long-standing challenge in artificial intelligence is lifelong learning. In lifelong learning, many tasks are presented in sequence and learners must efficiently transfer knowledge between tasks while avoiding catastrophic forgetting over long lifetimes. On these problems, policy reuse and other multi-policy reinforcement learning techniques can learn many tasks. However, they can generate many temporary or permanent policies, resulting in memory issues. Consequently, there is a need for lifetime-scalable methods that continually refine a policy library of a pre-defined size. This paper presents a first approach to lifetime-scalable policy reuse. To pre-select the number of policies, a notion of task capacity, the maximal number of tasks that a policy can accurately solve, is proposed. To evaluate lifetime policy reuse using this method, two state-of-the-art single-actor base-learners are compared: 1) a value-based reinforcement learner, Deep Q-Network (DQN) or Deep Recurrent Q-Network (DRQN); and 2) an actor-critic reinforcement learner, Proximal Policy Optimisation (PPO) with or without Long Short-Term Memory layer. By selecting the number of policies based on task capacity, D(R)QN achieves near-optimal performance with 6 policies in a 27-task MDP domain and 9 policies in an 18-task POMDP domain; with fewer policies, catastrophic forgetting and negative transfer are observed. Due to slow, monotonic improvement, PPO requires fewer policies, 1 policy for the 27-task domain and 4 policies for the 18-task domain, but it learns the tasks with lower accuracy than D(R)QN. These findings validate lifetime-scalable policy reuse and suggest using D(R)QN for larger and PPO for smaller library sizes. During their lifetime, animals may be subjected to a large number of unknown tasks. In some environmental conditions, nutritious food sources may be readily available, while in others they may be sparse, hidden, or even poisonous, and dangerous predators may roam in their vicinity. To address these challenging conditions, various behaviours must be selectively combined, such as avoidance, reward-seeking, or even fleeing. When direct perception provides limited or no cues about the current task, animals have to infer the task or use a strategy that works for many different tasks it may encounter. Therefore, a key challenge for learning challenging sequences of tasks is to find a limited number of strategies that work on the large domain of tasks that animals encounter over their lifetime. In artificial intelligence, variants of the above problem have been studied with investigations focusing on two aspects: transfer learning and catastrophic forgetting. Transfer learning is a process in which learners leverage the knowledge gained from a set of previously learned tasks with similar characteristics to a new task, whilst avoiding transferring knowledge that is not relevant [Taylor and Stone, 2009, Pan and Yang, 2010, Lazaric, 2013].

learner, policy reuse, task capacity, (13 more...)

arXiv.org Artificial Intelligence

2106.01741

Country:

North America > United States (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)
(6 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Education (0.89)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Context-Aware Policy Reuse

Li, Siyuan, Gu, Fangda, Zhu, Guangxiang, Zhang, Chongjie

arXiv.org Artificial IntelligenceJun-28-2018

Transfer learning can greatly speed up reinforcement learning for a new task by leveraging policies of relevant tasks. Existing works of policy reuse either focus on only selecting a single best source policy for transfer without considering contexts, or cannot guarantee to learn an optimal policy for a target task. To improve transfer efficiency and guarantee optimality, we develop a novel policy reuse method, called Context-Aware Policy reuSe (CAPS), that enables multi-policy transfer. Our method learns when and which source policy is best for reuse, as well as when to terminate its reuse. CAPS provides theoretical guarantees in convergence and optimality for both source policy selection and target task learning. Empirical results on a grid-based navigation domain and the Pygame Learning Environment demonstrate that CAPS significantly outperforms other state-of-the-art policy reuse methods.

machine learning, reinforcement learning, source policy, (16 more...)

arXiv.org Artificial Intelligence

1806.03793

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

An Optimal Online Method of Selecting Source Policies for Reinforcement Learning

Li, Siyuan (Tsinghua University, Institute for Interdisciplinary Information Sciences) | Zhang, Chongjie (Tsinghua University, Institute for Interdisciplinary Information Sciences)

AAAI ConferencesFeb-8-2018

Transfer learning significantly accelerates the reinforcement learning process by exploiting relevant knowledge from previous experiences. The problem of optimally selecting source policies during the learning process is of great importance yet challenging. There has been little theoretical analysis of this problem. In this paper, we develop an optimal online method to select source policies for reinforcement learning. This method formulates online source policy selection as a multi-armed bandit problem and augments Q-learning with policy reuse. We provide theoretical guarantees of the optimal selection process and convergence to the optimal policy. In addition, we conduct experiments on a grid-based robot navigation domain to demonstrate its efficiency and robustness by comparing to the state-of-the-art transfer learning method.

machine learning, reinforcement learning, source policy, (17 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Massachusetts (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

SEAPoT-RL: Selective Exploration Algorithm for Policy Transfer in RL

Narayan, Akshay (National University of Singapore) | Li, Zhuoru (National University of Singapore) | Leong, Tze-Yun (National University of Singapore)

AAAI ConferencesFeb-14-2017

We propose a new method for transferring a policy from a source task to a target task in model-based reinforcement learning. Our work is motivated by scenarios where a robotic agent operates in similar but challenging environments, such as hospital wards, differentiated by structural arrangements or obstacles, such as furniture. We address problems that require fast responses adapted from incomplete, prior knowledge of the agent in new scenarios. We present an efficient selective exploration strategy that maximally reuses the source task policy. Reuse efficiency is effected through identifying sub-spaces that are different in the target environment, thus limiting the exploration needed in the target task. We empirically show that SEAPoT performs better in terms of jump starts and cumulative average rewards, as compared to existing state-of-the-art policy reuse methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Industry: Health & Medicine > Health Care Providers & Services (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.35)

Add feedback

Policy Reuse in Deep Reinforcement Learning

Glatt, Ruben (Universidade de São Paulo) | Costa, Anna Helena Reali (Universidade de São Paulo)

AAAI ConferencesFeb-14-2017

Driven by recent developments in Artificial Intelligence research, a promising new technology for building intelligent agents has evolved. The approach is termed Deep Reinforcement Learning and combines the classic field of Reinforcement Learning (RL) with the representational power of modern Deep Learning approaches. It is very well suited for single task learning but needs a long time to learn any new task. To speed up this process, we propose to extend the concept to multi-task learning by adapting Policy Reuse, a Transfer Learning approach from classic RL, to use with Deep Q-Networks.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

South America > Brazil (0.16)
North America > United States (0.15)

Genre: Research Report > New Finding (0.48)

Industry: Energy (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback