relevant action
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking
Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using proximal policy optimization ( PPO), we evaluate our methods on four control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking
Continuous action spaces in reinforcement learning (RL) are commonly defined as multidimensional intervals. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions.
Causality-Driven Reinforcement Learning for Joint Communication and Sensing
Roy, Anik, Banerjee, Serene, Sadasivan, Jishnu, Sarkar, Arnab, Dey, Soumyajit
The next-generation wireless network, 6G and beyond, envisions to integrate communication and sensing to overcome interference, improve spectrum efficiency, and reduce hardware and power consumption. Massive Multiple-Input Multiple Output (mMIMO)-based Joint Communication and Sensing (JCAS) systems realize this integration for 6G applications such as autonomous driving, as it requires accurate environmental sensing and time-critical communication with neighboring vehicles. Reinforcement Learning (RL) is used for mMIMO antenna beamforming in the existing literature. However, the huge search space for actions associated with antenna beamforming causes the learning process for the RL agent to be inefficient due to high beam training overhead. The learning process does not consider the causal relationship between action space and the reward, and gives all actions equal importance. In this work, we explore a causally-aware RL agent which can intervene and discover causal relationships for mMIMO-based JCAS environments, during the training phase. We use a state dependent action dimension selection strategy to realize causal discovery for RL-based JCAS. Evaluation of the causally-aware RL framework in different JCAS scenarios shows the benefit of our proposed framework over baseline methods in terms of the beamforming gain.
- Asia > India > West Bengal > Kharagpur (0.05)
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Excluding the Irrelevant: Focusing Reinforcement Learning through Continuous Action Masking
Stolz, Roland, Krasowski, Hanna, Thumm, Jakob, Eichelbeck, Michael, Gassert, Philipp, Althoff, Matthias
Continuous action spaces in reinforcement learning (RL) are commonly defined as interval sets. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using Proximal Policy Optimization (PPO), we evaluate our methods on three control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Language Guided Exploration for RL Agents in Text Environments
Golchha, Hitesh, Yerawar, Sahil, Patel, Dhruvesh, Dan, Soham, Murugesan, Keerthiram
Real-world sequential decision making is characterized by sparse rewards and large decision spaces, posing significant difficulty for experiential learning systems like $\textit{tabula rasa}$ reinforcement learning (RL) agents. Large Language Models (LLMs), with a wealth of world knowledge, can help RL agents learn quickly and adapt to distribution shifts. In this work, we introduce Language Guided Exploration (LGE) framework, which uses a pre-trained language model (called GUIDE ) to provide decision-level guidance to an RL agent (called EXPLORER). We observe that on ScienceWorld (Wang et al.,2022), a challenging text environment, LGE outperforms vanilla RL agents significantly and also outperforms other sophisticated methods like Behaviour Cloning and Text Decision Transformer.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Create a Bot to Find Diamonds in Minecraft
Minecraft is the next frontier for Artificial Intelligence. It takes an entire wiki with over 8000 pages just to teach humans how to play Minecraft. So how good can be artificial intelligence? This is the question we'll answer in this article. We'll design a bot and try to achieve one of the most difficult challenges in Minecraft: finding diamonds from scratch.
When Planning Should Be Easy: On Solving Cumulative Planning Problems
Bartak, Roman (Charles University in Prague) | Dvorak, Filip (Charles University in Prague) | Gemrot, Jakub (Charles University in Prague) | Brom, Cyril (Charles University in Prague) | Toropila, Daniel (Charles University in Prague)
This paper deals with planning domains that appear in computer games, especially when modeling intelligent virtual agents. Some of these domains contain only actions with no negative effects and are thus treated as easy from the planning perspective. We propose two new techniques to solve the problems in these planning domains, a heuristic search algorithm ANA* and a constraint-based planner RelaxPlan, and we compare them with the state-of-the-art planners, that were successful in IPC, using planning domains motivated by computer games.
The Influence of k- Dependence on the Complexity of Planning
Gimenez, Omer (Universitat Politecnica de Catalunya) | Jonsson, Anders (Universitat Pompeu Fabra)
A planning problem is k- dependent if each action has at most k pre-conditions on variables unaffected by the action. This concept is well-founded since k is a constant for all but a few of the standard planning domains, and is known to have implications for tractability. In this paper, we present several new complexity results for P ( k ), the class of k- dependent planning problems with binary variables and polytree causal graphs. The problem of plan generation for P ( k ) is equivalent to determining how many times each variable can change. Using this fact, we present a polytime plan generation algorithm for P (2) and P (3). For constant k > 3, we introduce and use the notion of a cover to find conditions under which plan generation for P ( k ) is polynomial.