AITopics | subpolicy

Collaborating Authors

subpolicy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Explainable RL Policies by Distilling to Locally-Specialized Linear Policies with Voronoi State Partitioning

Deproost, Senne, Steckelmacher, Dennis, Nowé, Ann

arXiv.org Artificial IntelligenceNov-18-2025

Deep Reinforcement Learning is one of the state-of-the-art methods for producing near-optimal system controllers. However, deep RL algorithms train a deep neural network, that lacks transparency, which poses challenges when the controller has to meet regulations, or foster trust. To alleviate this, one could transfer the learned behaviour into a model that is human-readable by design using knowledge distilla- tion. Often this is done with a single model which mimics the original model on average but could struggle in more dynamic situations. A key challenge is that this simpler model should have the right balance be- tween flexibility and complexity or right balance between balance bias and accuracy. We propose a new model-agnostic method to divide the state space into regions where a simplified, human-understandable model can operate in. In this paper, we use Voronoi partitioning to find regions where linear models can achieve similar performance to the original con- troller. We evaluate our approach on a gridworld environment and a classic control task. We observe that our proposed distillation to locally- specialized linear models produces policies that are explainable and show that the distillation matches or even slightly outperforms the black-box policy they are distilled from.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2511.13322

Country:

Europe (1.00)
North America > United States > California (0.46)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

MORE: Mobile Manipulation Rearrangement Through Grounded Language Reasoning

Mohammadi, Mohammad, Honerkamp, Daniel, Büchner, Martin, Cassinelli, Matteo, Welschehold, Tim, Despinoy, Fabien, Gilitschenski, Igor, Valada, Abhinav

arXiv.org Artificial IntelligenceMay-7-2025

Autonomous long-horizon mobile manipulation encompasses a multitude of challenges, including scene dynamics, unexplored areas, and error recovery. Recent works have leveraged foundation models for scene-level robotic reasoning and planning. However, the performance of these methods degrades when dealing with a large number of objects and large-scale environments. To address these limitations, we propose MORE, a novel approach for enhancing the capabilities of language models to solve zero-shot mobile manipulation planning for rearrangement tasks. MORE leverages scene graphs to represent environments, incorporates instance differentiation, and introduces an active filtering scheme that extracts task-relevant subgraphs of object and region instances. These steps yield a bounded planning problem, effectively mitigating hallucinations and improving reliability. Additionally, we introduce several enhancements that enable planning across both indoor and outdoor environments. We evaluate MORE on 81 diverse rearrangement tasks from the BEHAVIOR-1K benchmark, where it becomes the first approach to successfully solve a significant share of the benchmark, outperforming recent foundation model-based approaches. Furthermore, we demonstrate the capabilities of our approach in several complex real-world tasks, mimicking everyday activities. We make the code publicly available at https://more-model.cs.uni-freiburg.de.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.03035

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.24)
North America > Canada > Ontario > Toronto (0.14)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(2 more...)

Add feedback

Stochastic Network Design in Bidirected Trees

xiaojian wu, Daniel R. Sheldon, Shlomo Zilberstein

Neural Information Processing SystemsFeb-9-2025, 14:39:18 GMT

We investigate the problem of stochastic network design in bidirected trees. In this problem, an underlying phenomenon (e.g., a behavior, rumor, or disease) starts at multiple sources in a tree and spreads in both directions along its edges. Actions can be taken to increase the probability of propagation on edges, and the goal is to maximize the total amount of spread away from all sources. Our main result is a rounded dynamic programming approach that leads to a fully polynomial-time approximation scheme (FPTAS), that is, an algorithm that can find (1 ɛ)-optimal solutions for any problem instance in time polynomial in the input size and 1/ɛ. Our algorithm outperforms competing approaches on a motivating problem from computational sustainability to remove barriers in river networks to restore the health of aquatic ecosystems.

algorithm, artificial intelligence, data mining, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Technology:

Information Technology > Communications > Networks (0.73)
Information Technology > Data Science > Data Mining (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.56)

Add feedback

Stochastic Network Design in Bidirected Trees Xiaojian Wu1 Daniel Sheldon

Neural Information Processing SystemsMar-13-2024, 11:15:59 GMT

algorithm, passability, river network, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Connecticut (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)

Technology:

Information Technology > Communications > Networks (0.73)
Information Technology > Data Science > Data Mining (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.56)

Add feedback

SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

Ma, Oubo, Pu, Yuwen, Du, Linkang, Dai, Yang, Wang, Ruo, Liu, Xiaolei, Wu, Yingcai, Ji, Shouling

arXiv.org Artificial IntelligenceFeb-6-2024

Recent advances in multi-agent reinforcement learning (MARL) have opened up vast application prospects, including swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent researches reveal that an attacker can rapidly exploit the victim's vulnerabilities and generate adversarial policies, leading to the victim's failure in specific tasks. For example, reducing the winning rate of a superhuman-level Go AI to around 20%. They predominantly focus on two-player competitive environments, assuming attackers possess complete global state observation. In this study, we unveil, for the first time, the capability of attackers to generate adversarial policies even when restricted to partial observations of the victims in multi-agent competitive environments. Specifically, we propose a novel black-box attack (SUB-PLAY), which incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability and suggests the sharing of transitions among subpolicies to improve the exploitative ability of attackers. Extensive evaluations demonstrate the effectiveness of SUB-PLAY under three typical partial observability limitations. Visualization results indicate that adversarial policies induce significantly different activations of the victims' policy networks. Furthermore, we evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies, providing constructive recommendations for deploying MARL in competitive environments.

adversarial policy, attacker, victim, (14 more...)

arXiv.org Artificial Intelligence

2402.03741

Country:

Asia > China (0.05)
North America > United States > Texas (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation (1.00)
Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

Logic-guided Deep Reinforcement Learning for Stock Trading

Li, Zhiming, Jiang, Junzhe, Cao, Yushi, Cui, Aixin, Wu, Bozhi, Li, Bo, Liu, Yang

arXiv.org Artificial IntelligenceOct-9-2023

Deep reinforcement learning (DRL) has revolutionized quantitative finance by achieving excellent performance without significant manual effort. Whereas we observe that the DRL models behave unstably in a dynamic stock market due to the low signal-to-noise ratio nature of the financial data. In this paper, we propose a novel logic-guided trading framework, termed as SYENS (Program Synthesis-based Ensemble Strategy). Different from the previous state-of-the-art ensemble reinforcement learning strategy which arbitrarily selects the best-performing agent for testing based on a single measurement, our framework proposes regularizing the model's behavior in a hierarchical manner using the program synthesis by sketching paradigm. First, we propose a high-level, domain-specific language (DSL) that is used for the depiction of the market environment and action. Then based on the DSL, a novel program sketch is introduced, which embeds human expert knowledge in a logical manner. Finally, based on the program sketch, we adopt the program synthesis by sketching a paradigm and synthesizing a logical, hierarchical trading strategy. We evaluate SYENS on the 30 Dow Jones stocks under the cash trading and the margin trading settings. Experimental results demonstrate that our proposed framework can significantly outperform the baselines with much higher cumulative return and lower maximum drawdown under both settings.

program sketch, reinforcement, reinforcement learning, (9 more...)

arXiv.org Artificial Intelligence

2310.05551

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Hong Kong (0.05)
North America > United States > District of Columbia > Washington (0.05)
(4 more...)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

Li, Haorui, Liang, Jiaqi, Li, Linjing, Zeng, Daniel

arXiv.org Artificial IntelligenceAug-2-2023

Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks. Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies. However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further. Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.

machine learning, reinforcement learning, subpolicy, (12 more...)

arXiv.org Artificial Intelligence

2308.00989

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Attention-Based Planning with Active Perception

Ma, Haoxiang, Fu, Jie

arXiv.org Artificial IntelligenceNov-30-2020

Attention control is a key cognitive ability for humans to select information relevant to the current task. This paper develops a computational model of attention and an algorithm for attention-based probabilistic planning in Markov decision processes. In attention-based planning, the robot decides to be in different attention modes. An attention mode corresponds to a subset of state variables monitored by the robot. By switching between different attention modes, the robot actively perceives task-relevant information to reduce the cost of information acquisition and processing, while achieving near-optimal task performance. Though planning with attention-based active perception inevitably introduces partial observations, a partially observable MDP formulation makes the problem computational expensive to solve. Instead, our proposed method employs a hierarchical planning framework in which the robot determines what to pay attention to and for how long the attention should be sustained before shifting to other information sources. During the attention sustaining phase, the robot carries out a sub-policy, computed from an abstraction of the original MDP given the current attention. We use an example where a robot is tasked to capture a set of intruders in a stochastic gridworld. The experimental results show that the proposed method enables information- and computation-efficient optimal planning in stochastic environments.

information acquisition, mdp, robot, (15 more...)

arXiv.org Artificial Intelligence

2012.00053

Country:

North America > United States > Massachusetts > Worcester County > Worcester (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Research Report (0.70)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.48)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Emergent Road Rules In Multi-Agent Driving Environments

Pal, Avik, Philion, Jonah, Liao, Yuan-Hong, Fidler, Sanja

arXiv.org Artificial IntelligenceNov-21-2020

For autonomous vehicles to safely share the road with human drivers, autonomous vehicles must abide by specific "road rules" that human drivers have agreed to follow. "Road rules" include rules that drivers are required to follow by law -- such as the requirement that vehicles stop at red lights -- as well as more subtle social rules -- such as the implicit designation of fast lanes on the highway. In this paper, we provide empirical evidence that suggests that -- instead of hard-coding road rules into self-driving algorithms -- a scalable alternative may be to design multi-agent environments in which road rules emerge as optimal solutions to the problem of maximizing traffic flow. We analyze what ingredients in driving environments cause the emergence of these road rules and find that two crucial factors are noisy perception and agents' spatial density. We provide qualitative and quantitative evidence of the emergence of seven social driving behaviors, ranging from obeying traffic signals to following lanes, all of which emerge from training agents to drive quickly to destinations without colliding. Our results add empirical support for the social road rules that countries worldwide have agreed on for safe, efficient driving.

agent, arxiv preprint arxiv, intersection, (11 more...)

arXiv.org Artificial Intelligence

2011.10753

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Hierarchial Reinforcement Learning in StarCraft II with Human Expertise in Subgoals Selection

Xu, Xinyi, Huang, Tiancheng, Wei, Pengfei, Narayan, Akshay, Leong, Tze-Yun

arXiv.org Artificial IntelligenceAug-8-2020

This work is inspired by recent advances in hierarchical reinforcement learning (HRL) (Barto and Mahadevan 2003;Hengst 2010), and improvements in learning efficiency with heuristic-based subgoal selection and hindsight experience replay (HER)(Andrychowicz et al. 2017; Levy et al. 2019). We propose a new method to integrate HRL, HER and effective subgoal selection based on human expertise to support sample-efficient learning and enhance interpretability of the agent's behavior. Human expertise remains indispensable in many areas such as medicine (Buch, Ahmed, and Maruthappu 2018) and law (Cath 2018), where interpretability, explainability and transparency are crucial in the decision making process, for ethical and legal reasons. Our method simplifies the complex task sets for achieving the overall objectives by decomposing into subgoals at different levels of abstraction. Incorporating relevant subjective knowledge also significantly reduces the computational resources spent in exploration for RL, especially in high speed, changing, and complex environments where the transition dynamics cannot be effectively learned and modelled in a short time. Experimental results in two StarCraft II (SC2) minigames demonstrate that our method can achieve better sample efficiency than flat and end-to-end RL methods, and provide an effective method for explaining the agent's performance.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2008.03444

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
(5 more...)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.85)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback