AITopics | exploitation policy

Collaborating Authors

exploitation policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

Francisco Garcia, Philip S. Thomas

Neural Information Processing SystemsSep-28-2025, 01:32:24 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

Oceania > Australia (0.14)
North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Illinois (0.14)

Industry: Energy > Oil & Gas (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

Francisco Garcia, Philip S. Thomas

Neural Information Processing SystemsAug-20-2025, 01:17:56 GMT

Neural Information Processing Systems http://nips.cc/

agent, exploration, exploration policy, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Hyper: Hyperparameter Robust Efficient Exploration in Reinforcement Learning

Wang, Yiran, Liu, Chenshu, Li, Yunfan, Amani, Sanae, Zhou, Bolei, Yang, Lin F.

arXiv.org Machine LearningDec-4-2024

The exploration \& exploitation dilemma poses significant challenges in reinforcement learning (RL). Recently, curiosity-based exploration methods achieved great success in tackling hard-exploration problems. However, they necessitate extensive hyperparameter tuning on different environments, which heavily limits the applicability and accessibility of this line of methods. In this paper, we characterize this problem via analysis of the agent behavior, concluding the fundamental difficulty of choosing a proper hyperparameter. We then identify the difficulty and the instability of the optimization when the agent learns with curiosity. We propose our method, hyperparameter robust exploration (\textbf{Hyper}), which extensively mitigates the problem by effectively regularizing the visitation of the exploration and decoupling the exploitation to ensure stable training. We theoretically justify that \textbf{Hyper} is provably efficient under function approximation setting and empirically demonstrate its appealing performance and robustness in various environments.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2412.03767

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Individual Contributions as Intrinsic Exploration Scaffolds for Multi-agent Reinforcement Learning

Li, Xinran, Liu, Zifan, Chen, Shibo, Zhang, Jun

arXiv.org Artificial IntelligenceMay-28-2024

In multi-agent reinforcement learning (MARL), effective exploration is critical, especially in sparse reward environments. Although introducing global intrinsic rewards can foster exploration in such settings, it often complicates credit assignment among agents. To address this difficulty, we propose Individual Contributions as intrinsic Exploration Scaffolds (ICES), a novel approach to motivate exploration by assessing each agent's contribution from a global view. In particular, ICES constructs exploration scaffolds with Bayesian surprise, leveraging global transition information during centralized training. These scaffolds, used only in training, help to guide individual agents towards actions that significantly impact the global latent state transitions. Additionally, ICES separates exploration policies from exploitation policies, enabling the former to utilize privileged global information during training. Extensive experiments on cooperative benchmark tasks with sparse rewards, including Google Research Football (GRF) and StarCraft Multi-agent Challenge (SMAC), demonstrate that ICES exhibits superior exploration capabilities compared with baselines. The code is publicly available at https://github.com/LXXXXR/ICES.

agent, exploration, individual contribution, (11 more...)

arXiv.org Artificial Intelligence

2405.1811

Country:

North America > United States (0.14)
Europe > Austria > Vienna (0.14)
Asia > China > Hong Kong (0.04)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Leisure & Entertainment > Games (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Offline Retraining for Online RL: Decoupled Policy Learning to Mitigate Exploration Bias

Mark, Max Sobol, Sharma, Archit, Tajwar, Fahim, Rafailov, Rafael, Levine, Sergey, Finn, Chelsea

arXiv.org Artificial IntelligenceOct-12-2023

It is desirable for policies to optimistically explore new states and behaviors during online reinforcement learning (RL) or fine-tuning, especially when prior offline data does not provide enough state coverage. However, exploration bonuses can bias the learned policy, and our experiments find that naive, yet standard use of such bonuses can fail to recover a performant policy. Concurrently, pessimistic training in offline RL has enabled recovery of performant policies from static datasets. Can we leverage offline RL to recover better policies from online interaction? We make a simple observation that a policy can be trained from scratch on all interaction data with pessimistic objectives, thereby decoupling the policies used for data collection and for evaluation. Specifically, we propose offline retraining, a policy extraction step at the end of online fine-tuning in our Offline-to-Online-to-Offline (OOO) framework for reinforcement learning (RL). An optimistic (exploration) policy is used to interact with the environment, and a separate pessimistic (exploitation) policy is trained on all the observed data for evaluation. Such decoupling can reduce any bias from online interaction (intrinsic rewards, primacy bias) in the evaluation policy, and can allow more exploratory behaviors during online interaction which in turn can generate better data for exploitation. OOO is complementary to several offline-to-online RL and online RL methods, and improves their average performance by 14% to 26% in our fine-tuning experiments, achieves state-of-the-art performance on several environments in the D4RL benchmarks, and improves online RL performance by 165% on two OpenAI gym environments. Further, OOO can enable fine-tuning from incomplete offline datasets where prior methods can fail to recover a performant policy. Implementation: https://github.com/MaxSobolMark/OOO

arxiv preprint arxiv, exploitation policy, exploration bonus, (12 more...)

arXiv.org Artificial Intelligence

2310.08558

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Explicit Explore, Exploit, or Escape ($E^4$): near-optimal safety-constrained reinforcement learning in polynomial time

Bossens, David M., Bishop, Nicholas

arXiv.org Artificial IntelligenceNov-14-2021

In reinforcement learning (RL), an agent must explore an initially unknown environment in order to learn a desired behaviour. When RL agents are deployed in real world environments, safety is of primary concern. Constrained Markov decision processes (CMDPs) can provide long-term safety constraints; however, the agent may violate the constraints in an effort to explore its environment. This paper proposes a model-based RL algorithm called Explicit Explore, Exploit, or Escape ($E^{4}$), which extends the Explicit Explore or Exploit ($E^{3}$) algorithm to a robust CMDP setting. $E^4$ explicitly separates exploitation, exploration, and escape CMDPs, allowing targeted policies for policy improvement across known states, discovery of unknown states, as well as safe return to known states. $E^4$ robustly optimises these policies on the worst-case CMDP from a set of CMDP models consistent with the empirical observations of the deployment environment. Theoretical results show that $E^4$ finds a near-optimal constraint-satisfying policy in polynomial time whilst satisfying safety constraints throughout the learning process. We discuss robust-constrained offline optimisation algorithms as well as how to incorporate uncertainty in transition dynamics of unknown states based on empirical inference and prior knowledge.

cmdp, near-optimal safety-constrained reinforcement, unknown state, (13 more...)

arXiv.org Artificial Intelligence

2111.07395

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Learning Efficient and Effective Exploration Policies with Counterfactual Meta Policy

Yang, Ruihan, Ye, Qiwei, Liu, Tie-Yan

arXiv.org Machine LearningMay-27-2019

A fundamental issue in reinforcement learning algorithms is the balance between exploration of the environment and exploitation of information already obtained by the agent. Especially, exploration has played a critical role for both efficiency and efficacy of the learning process. However, Existing works for exploration involve task-agnostic design, that is performing well in one environment, but be ill-suited to another. To the purpose of learning an effective and efficient exploration policy in an automated manner. We formalized a feasible metric for measuring the utility of exploration based on counterfactual ideology. Based on that, We proposed an end-to-end algorithm to learn exploration policy by meta-learning. We demonstrate that our method achieves good results compared to previous works in the high-dimensional control tasks in MuJoCo simulator.

artificial intelligence, exploration, upstream oil & gas, (16 more...)

arXiv.org Machine Learning

1905.11583

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas > Upstream (0.74)
Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning

Garcia, Francisco M., Thomas, Philip S.

arXiv.org Machine LearningFeb-2-2019

In this paper we consider the problem of how a reinforcement learning agent that is tasked with solving a sequence of reinforcement learning problems (a sequence of Markov decision processes) can use knowledge acquired early in its lifetime to improve its ability to solve new problems. We argue that previous experience with similar problems can provide an agent with information about how it should explore when facing a new but related problem. We show that the search for an optimal exploration strategy can be formulated as a reinforcement learning problem itself and demonstrate that such strategy can leverage patterns found in the structure of related problems. We conclude with experiments that show the benefits of optimizing an exploration strategy using our proposed approach.

agent, ground transportation, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1902.00843

Country:

North America > United States > Massachusetts (0.28)
Oceania > Australia (0.14)
North America > United States > Illinois (0.14)

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas > Upstream (0.68)
Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

Add feedback

TD-DeltaPi: A Model-Free Algorithm for Efficient Exploration

Silva, Bruno C. da (University of Massachusetts Amherst) | Barto, Andrew G. (University of Massachusetts Amherst)

AAAI ConferencesJul-21-2012

We study the problem of finding efficient exploration policies for the case in which an agent is momentarily not concerned with exploiting, and instead tries to compute a policy for later use. We first formally define the Optimal Exploration Problem as one of sequential sampling and show that its solutions correspond to paths of minimum expected length in the space of policies. We derive a model-free, local linear approximation to such solutions and use it to construct efficient exploration policies. We compare our model-free approach to other exploration techniques, including one with the best known PAC bounds, and show that ours is both based on a well-defined optimization problem and empirically efficient.

approximation, exploration, q-function, (16 more...)

AAAI Conferences

Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback