AITopics | tmdp

Collaborating Authors

tmdp

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Track-MDP: Reinforcement Learning for Target Tracking with Controlled Sensing

Subramaniam, Adarsh M., Gerogiannis, Argyrios, Hare, James Z., Veeravalli, Venugopal V.

arXiv.org Machine LearningJul-18-2024

State of the art methods for target tracking with sensor management (or controlled sensing) are model-based and are obtained through solutions to Partially Observable Markov Decision Process (POMDP) formulations. In this paper a Reinforcement Learning (RL) approach to the problem is explored for the setting where the motion model for the object/target to be tracked is unknown to the observer. It is assumed that the target dynamics are stationary in time, the state space and the observation space are discrete, and there is complete observability of the location of the target under certain (a priori unknown) sensor control actions. Then, a novel Markov Decision Process (MDP) rather than POMDP formulation is proposed for the tracking problem with controlled sensing, which is termed as Track-MDP. In contrast to the POMDP formulation, the Track-MDP formulation is amenable to an RL based solution. It is shown that the optimal policy for the Track-MDP formulation, which is approximated through RL, is guaranteed to track all significant target paths with certainty. The Track-MDP method is then compared with the optimal POMDP policy, and it is shown that the infinite horizon tracking reward of the optimal Track-MDP policy is the same as that of the optimal POMDP policy. In simulations it is demonstrated that Track-MDP based RL leads to a policy that can track the target with high accuracy.

pomdp, sensor, track-mdp, (15 more...)

arXiv.org Machine Learning

2407.13995

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

TPMDP: Threshold Personalized Multi-party Differential Privacy via Optimal Gaussian Mechanism

Liu, Jiandong, Zhang, Lan, Lv, Chaojie, Yu, Ting, Freris, Nikolaos M., Li, Xiang-Yang

arXiv.org Artificial IntelligenceMay-29-2023

In modern distributed computing applications, such as federated learning and AIoT systems, protecting privacy is crucial to prevent adversarial parties from colluding to steal others' private information. However, guaranteeing the utility of computation outcomes while protecting all parties' data privacy can be challenging, particularly when the parties' privacy requirements are highly heterogeneous. In this paper, we propose a novel privacy framework for multi-party computation called Threshold Personalized Multi-party Differential Privacy (TPMDP), which addresses a limited number of semi-honest colluding adversaries. Our framework enables each party to have a personalized privacy budget. We design a multi-party Gaussian mechanism that is easy to implement and satisfies TPMDP, wherein each party perturbs the computation outcome in a secure multi-party computation protocol using Gaussian noise. To optimize the utility of the mechanism, we cast the utility loss minimization problem into a linear programming (LP) problem. We exploit the specific structure of this LP problem to compute the optimal solution after O(n) computations, where n is the number of parties, while a generic solver may require exponentially many computations. Extensive experiments demonstrate the benefits of our approach in terms of low utility loss and high efficiency compared to existing private mechanisms that do not consider personalized privacy requirements or collusion thresholds.

artificial intelligence, machine learning, mechanism, (19 more...)

arXiv.org Artificial Intelligence

2305.11192

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Qatar (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

A Review of Off-Policy Evaluation in Reinforcement Learning

Uehara, Masatoshi, Shi, Chengchun, Kallus, Nathan

arXiv.org Artificial IntelligenceDec-12-2022

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine learning and has been recently applied to solve a number of challenging problems. In this paper, we primarily focus on off-policy evaluation (OPE), one of the most fundamental topics in RL. In recent years, a number of OPE methods have been developed in the statistics and computer science literature. We provide a discussion on the efficiency bound of OPE, some of the existing state-of-the-art OPE methods, their statistical properties and some other related research directions that are currently actively explored.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2212.06355

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Washington > King County > Seattle (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.92)

Industry: Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Multi-Objective Policy Gradients with Topological Constraints

Wray, Kyle Hollins, Tiomkin, Stas, Kochenderfer, Mykel J., Abbeel, Pieter

arXiv.org Artificial IntelligenceSep-15-2022

Multi-objective optimization models that encode ordered sequential constraints provide a solution to model various challenging problems including encoding preferences, modeling a curriculum, and enforcing measures of safety. A recently developed theory of topological Markov decision processes (TMDPs) captures this range of problems for the case of discrete states and actions. In this work, we extend TMDPs towards continuous spaces and unknown transition dynamics by formulating, proving, and implementing the policy gradient theorem for TMDPs. This theoretical result enables the creation of TMDP learning algorithms that use function approximators, and can generalize existing deep reinforcement learning (DRL) approaches. Specifically, we present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2209.07096

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Efficient Evaluation of Natural Stochastic Policies in Offline Reinforcement Learning

Kallus, Nathan, Uehara, Masatoshi

arXiv.org Machine LearningNov-3-2020

We study the efficient off-policy evaluation of natural stochastic policies, which are defined in terms of deviations from the behavior policy. This is a departure from the literature on off-policy evaluation where most work consider the evaluation of explicitly specified policies. Crucially, offline reinforcement learning with natural stochastic policies can help alleviate issues of weak overlap, lead to policies that build upon current practice, and improve policies' implementability in practice. Compared with the classic case of a pre-specified evaluation policy, when evaluating natural stochastic policies, the efficiency bound, which measures the best-achievable estimation error, is inflated since the evaluation policy itself is unknown. In this paper, we derive the efficiency bounds of two major types of natural stochastic policies: tilting policies and modified treatment policies. We then propose efficient nonparametric estimators that attain the efficiency bounds under very lax conditions. These also enjoy a (partial) double robustness property.

efficiency, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2006.03886

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

Opponent Aware Reinforcement Learning

Gallego, Victor, Naveiro, Roi, Insua, David Rios, Oteiza, David Gomez-Ullate

arXiv.org Machine LearningAug-22-2019

In several reinforcement learning (RL) scenarios such as security settings, there may be adversaries trying to interfere with the reward generating process for their own benefit. We introduce Threatened Markov Decision Processes (TMDPs) as a framework to support an agent against potential opponents in a RL context. We also propose a level-k thinking scheme resulting in a novel learning approach to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries in RL while the agent learns

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1908.08773

Country: North America > United States (0.28)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
(2 more...)

Add feedback

Reinforcement Learning under Threats

Gallego, Víctor, Naveiro, Roi, Insua, David Ríos

arXiv.org Artificial IntelligenceSep-5-2018

In several reinforcement learning (RL) scenarios, mainly in security settings, there may be adversaries trying to interfere with the reward generating process. In this paper, we introduce Threatened Markov Decision Processes (TMDPs), which provide a framework to support a decision maker against a potential adversary in RL. Furthermore, we propose a level-$k$ thinking scheme resulting in a new learning framework to deal with TMDPs. After introducing our framework and deriving theoretical results, relevant empirical evidence is given via extensive experiments, showing the benefits of accounting for adversaries while the agent learns.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

1809.0156

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Hippocampal Contributions to Control: The Third Way

Lengyel, Máté, Dayan, Peter

Neural Information Processing SystemsDec-31-2008

Recent experimental studies have focused on the specialization of different neural structures for different types of instrumental behavior. Recent theoretical work has provided normative accounts for why there should be more than one control system, and how the output of different controllers can be integrated. Two particlar controllers have been identified, one associated with a forward model and the prefrontal cortex and a second associated with computationally simpler, habitual, actor-critic methods and part of the striatum. We argue here for the normative appropriateness of an additional, but so far marginalized control system, associated with episodic memory, and involving the hippocampus and medial temporal cortices. We analyze in depth a class of simple environments to show that episodic control should be useful in a range of cases characterized by complexity and inferential noise, and most particularly at the very early stages of learning, long before habitization has set in. We interpret data on the transfer of control from the hippocampus to the striatum in the light of this hypothesis.

controller, model-based control, noise, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Hippocampal Contributions to Control: The Third Way

Lengyel, Máté, Dayan, Peter

Neural Information Processing SystemsDec-31-2008

controller, model-based control, noise, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)

Add feedback

Hippocampal Contributions to Control: The Third Way

Lengyel, Máté, Dayan, Peter

Neural Information Processing SystemsDec-31-2008

Recent experimental studies have focused on the specialization of different neural structures for different types of instrumental behavior. Recent theoretical work has provided normative accounts for why there should be more than one control system, and how the output of different controllers can be integrated. Two particlar controllershave been identified, one associated with a forward model and the prefrontal cortex and a second associated with computationally simpler, habitual, actor-criticmethods and part of the striatum. We argue here for the normative appropriateness of an additional, but so far marginalized control system, associated withepisodic memory, and involving the hippocampus and medial temporal cortices. We analyze in depth a class of simple environments to show that episodic control should be useful in a range of cases characterized by complexity and inferential noise,and most particularly at the very early stages of learning, long before habitization has set in. We interpret data on the transfer of control from the hippocampus to the striatum in the light of this hypothesis.

controller, model-based control, noise, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)

Add feedback