AITopics | admissible policy

Collaborating Authors

admissible policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Recursively-Constrained Partially Observable Markov Decision Processes

Ho, Qi Heng, Becker, Tyler, Kraske, Benjamin, Laouar, Zakariya, Feather, Martin S., Rossi, Federico, Lahijanian, Morteza, Sunberg, Zachary N.

arXiv.org Artificial IntelligenceDec-20-2023

In many problems, it is desirable to optimize an objective function while imposing constraints on some other objectives. A Constrained Partially Observable Markov Decision Process (C-POMDP) allows modeling of such problems under transition uncertainty and partial observability. Typically, the constraints in C-POMDPs enforce a threshold on expected cumulative costs starting from an initial state distribution. In this work, we first show that optimal C-POMDP policies may violate Bellman's principle of optimality and thus may exhibit unintuitive behaviors, which can be undesirable for some (e.g., safety critical) applications. Additionally, online re-planning with C-POMDPs is often ineffective due to the inconsistency resulting from the violation of Bellman's principle of optimality. To address these drawbacks, we introduce a new formulation: the Recursively-Constrained POMDP (RC-POMDP), that imposes additional history-dependent cost constraints on the C-POMDP. We show that, unlike C-POMDPs, RC-POMDPs always have deterministic optimal policies, and that optimal policies obey Bellman's principle of optimality. We also present a point-based dynamic programming algorithm that synthesizes admissible near-optimal policies for RC-POMDPs. Evaluations on a set of benchmark problems demonstrate the efficacy of our algorithm and show that policies for RC-POMDPs produce more desirable behaviors than policies for C-POMDPs.

constraint, optimal policy, rc-pomdp, (15 more...)

arXiv.org Artificial Intelligence

2310.09688

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > California (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Relaxed Actor-Critic with Convergence Guarantees for Continuous-Time Optimal Control of Nonlinear Systems

Duan, Jingliang, Li, Jie, Ge, Qiang, Li, Shengbo Eben, Bujarbaruah, Monimoy, Ma, Fei, Zhang, Dezhao

arXiv.org Artificial IntelligenceMar-30-2023

This paper presents the Relaxed Continuous-Time Actor-critic (RCTAC) algorithm, a method for finding the nearly optimal policy for nonlinear continuous-time (CT) systems with known dynamics and infinite horizon, such as the path-tracking control of vehicles. RCTAC has several advantages over existing adaptive dynamic programming algorithms for CT systems. It does not require the ``admissibility" of the initialized policy or the input-affine nature of controlled systems for convergence. Instead, given any initial policy, RCTAC can converge to an admissible, and subsequently nearly optimal policy for a general nonlinear system with a saturated controller. RCTAC consists of two phases: a warm-up phase and a generalized policy iteration phase. The warm-up phase minimizes the square of the Hamiltonian to achieve admissibility, while the generalized policy iteration phase relaxes the update termination conditions for faster convergence. The convergence and optimality of the algorithm are proven through Lyapunov analysis, and its effectiveness is demonstrated through simulations and real-world path-tracking tasks.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIV.2023.3255264

1909.05402

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Asia > China > Beijing > Beijing (0.05)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Automobiles & Trucks (0.93)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Control Systems (0.84)

Add feedback

Admissible Policy Teaching through Reward Design

Banihashem, Kiarash, Singla, Adish, Gan, Jiarui, Radanovic, Goran

arXiv.org Artificial IntelligenceJan-6-2022

We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible and performs well under the original reward function. This problem can be viewed as a dual to the problem of optimal reward poisoning attacks: instead of forcing an agent to adopt a specific policy, the reward designer incentivizes an agent to avoid taking actions that are inadmissible in certain states. Perhaps surprisingly, and in contrast to the problem of optimal reward poisoning attacks, we first show that the reward design problem for admissible policy teaching is computationally challenging, and it is NP-hard to find an approximately optimal reward modification. We then proceed by formulating a surrogate problem whose optimal solution approximates the optimal solution to the reward design problem in our setting, but is more amenable to optimization techniques and analysis. For this surrogate problem, we present characterization results that provide bounds on the value of the optimal solution. Finally, we design a local search algorithm to solve the surrogate problem and showcase its utility using simulation-based experiments.

agent, optimization problem, p4-apt, (15 more...)

arXiv.org Artificial Intelligence

2201.02185

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Coarticulation in Markov Decision Processes

Rohanimanesh, Khashayar, Platt, Robert, Mahadevan, Sridhar, Grupen, Roderic

Neural Information Processing SystemsDec-31-2005

We investigate an approach for simultaneously committing to multiple activities, each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define a set of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue function associated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities. We present our theoretical results and empirically evaluate our approach in a simulated domain.

admissible policy, controller, subgoal, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Coarticulation in Markov Decision Processes

Rohanimanesh, Khashayar, Platt, Robert, Mahadevan, Sridhar, Grupen, Roderic

Neural Information Processing SystemsDec-31-2005

admissible policy, controller, subgoal, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.29)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback

Coarticulation in Markov Decision Processes

Rohanimanesh, Khashayar, Platt, Robert, Mahadevan, Sridhar, Grupen, Roderic

Neural Information Processing SystemsDec-31-2005

We investigate an approach for simultaneously committing to multiple activities,each modeled as a temporally extended action in a semi-Markov decision process (SMDP). For each activity we define aset of admissible solutions consisting of the redundant set of optimal policies, and those policies that ascend the optimal statevalue functionassociated with them. A plan is then generated by merging them in such a way that the solutions to the subordinate activities are realized in the set of admissible solutions satisfying the superior activities.

artificial intelligence, controller, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Hampshire County > Amherst (0.30)

Technology:

Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Add feedback