AITopics

doi: 10.1109/TRO.2022.3200138

2209.10342

Country:

North America > United States (0.27)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > Canada > Ontario > Toronto (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report (0.81)

Industry:

Information Technology > Robotics & Automation (0.88)
Transportation > Ground > Road (0.49)
Health & Medicine > Therapeutic Area > Neurology (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Hasanbeig, Hosein, Kroening, Daniel, Abate, Alessandro

LCRL: Certified Policy Synthesis via Logically-Constrained Reinforcement Learning

arXiv.org Artificial IntelligenceSep-21-2022

LCRL is a software tool that implements model-free Reinforcement Learning (RL) algorithms over unknown Markov Decision Processes (MDPs), synthesising policies that satisfy a given linear temporal specification with maximal probability. LCRL leverages partially deterministic finite-state machines known as Limit Deterministic Buchi Automata (LDBA) to express a given linear temporal specification. A reward function for the RL algorithm is shaped on-the-fly, based on the structure of the LDBA. Theoretical guarantees under proper assumptions ensure the convergence of the RL algorithm to an optimal policy that maximises the satisfaction probability. We present case studies to demonstrate the applicability, ease of use, scalability, and performance of LCRL. Owing to the LDBA-guided exploration and LCRL model-free architecture, we observe robust performance, which also scales well when compared to standard RL approaches (whenever applicable to LTL specifications). Full instructions on how to execute all the case studies in this paper are provided on a GitHub page that accompanies the LCRL distribution www.github.com/grockious/lcrl.

machine learning, reinforcement learning, specification, (13 more...)

2209.10341

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Huang, Audrey, Leqi, Liu, Lipton, Zachary Chase, Azizzadenesheli, Kamyar

Off-Policy Risk Assessment in Markov Decision Processes

arXiv.org Artificial IntelligenceSep-21-2022

Addressing such diverse ends as safety alignment with human preferences, and the efficiency of learning, a growing line of reinforcement learning research focuses on risk functionals that depend on the entire distribution of returns. Recent work on \emph{off-policy risk assessment} (OPRA) for contextual bandits introduced consistent estimators for the target policy's CDF of returns along with finite sample guarantees that extend to (and hold simultaneously over) all risk. In this paper, we lift OPRA to Markov decision processes (MDPs), where importance sampling (IS) CDF estimators suffer high variance on longer trajectories due to small effective sample size. To mitigate these problems, we incorporate model-based estimation to develop the first doubly robust (DR) estimator for the CDF of returns in MDPs. This estimator enjoys significantly less variance and, when the model is well specified, achieves the Cramer-Rao variance lower bound. Moreover, for many risk functionals, the downstream estimates enjoy both lower bias and lower variance. Additionally, we derive the first minimax lower bounds for off-policy CDF and risk estimation, which match our error bounds up to a constant factor. Finally, we demonstrate the precision of our DR CDF estimates experimentally on several different environments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2209.10444

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Virginia (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)
Information Technology > Security & Privacy (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Gupta, Samarth, Hill, Daniel N., Ying, Lexing, Dhillon, Inderjit

Bayesian regularization of empirical MDPs

arXiv.org Artificial IntelligenceSep-20-2022

In most applications of model-based Markov decision processes, the parameters for the unknown underlying model are often estimated from the empirical data. Due to noise, the policy learnedfrom the estimated model is often far from the optimal policy of the underlying model. When applied to the environment of the underlying model, the learned policy results in suboptimal performance, thus calling for solutions with better generalization performance. In this work we take a Bayesian perspective and regularize the objective function of the Markov decision process with prior information in order to obtain more robust policies. Two approaches are proposed, one based on $L^1$ regularization and the other on relative entropic regularization. We evaluate our proposed algorithms on synthetic simulations and on real-world search logs of a large scale online shopping store. Our results demonstrate the robustness of regularized MDP policies against the noise present in the models.

artificial intelligence, machine learning, regularization, (17 more...)

2208.02362

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.54)

Industry:

Retail > Online (0.49)
Information Technology > Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Kamale, Disha, Haesaert, Sofie, Vasile, Cristian-Ioan

Cautious Planning with Incremental Symbolic Perception: Designing Verified Reactive Driving Maneuvers

arXiv.org Artificial IntelligenceSep-20-2022

This work presents a step towards utilizing incrementally-improving symbolic perception knowledge of the robot's surroundings for provably correct reactive control synthesis applied to an autonomous driving problem. Combining abstract models of motion control and information gathering, we show that assume-guarantee specifications (a subclass of Linear Temporal Logic) can be used to define and resolve traffic rules for cautious planning. We propose a novel representation called symbolic refinement tree for perception that captures the incremental knowledge about the environment and embodies the relationships between various symbolic perception inputs. The incremental knowledge is leveraged for synthesizing verified reactive plans for the robot. The case studies demonstrate the efficacy of the proposed approach in synthesizing control inputs even in case of partially occluded environments.

artificial intelligence, information, machine learning, (18 more...)

2209.09818

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
North America > United States > Pennsylvania > Northampton County > Bethlehem (0.04)

Genre: Research Report (0.40)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceSep-20-2022

Rethinking Individual Global Max in Cooperative Multi-Agent Reinforcement Learning

Hong, Yitian, Jin, Yaochu, Tang, Yang

In cooperative multi-agent reinforcement learning, centralized training and decentralized execution (CTDE) has achieved remarkable success. Individual Global Max (IGM) decomposition, which is an important element of CTDE, measures the consistency between local and joint policies. The majority of IGM-based research focuses on how to establish this consistent relationship, but little attention has been paid to examining IGM's potential flaws. In this work, we reveal that the IGM condition is a lossy decomposition, and the error of lossy decomposition will accumulated in hypernetwork-based methods. To address the above issue, we propose to adopt an imitation learning strategy to separate the lossy decomposition from Bellman iterations, thereby avoiding error accumulation. The proposed strategy is theoretically proved and empirically verified on the StarCraft Multi-Agent Challenge benchmark problem with zero sight view. The results also confirm that the proposed method outperforms state-of-the-art IGM-based approaches.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2209.0964

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Aubret, Arthur, Matignon, Laetitia, Hassas, Salima

An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey

Traditionally, an agent maximizes a reward defined according to the task to perform: it may be a score when the agent learns to solve a game or a distance function when the agent learns to reach a goal. The reward is then considered as extrinsic (or as a feedback) because the reward function is provided expertly and specifically for the task. With an extrinsic reward, many spectacular results have been obtained on Atari game [Bellemare et al. 2015] with the Deep Q-network (DQN) [Mnih et al. 2015] through the integration of deep learning to RL, leading to deep reinforcement learning (DRL). However, despite the recent improvements of DRL approaches, they turn out to be most of the time unsuccessful when the rewards are scattered in the environment, as the agent is then unable to learn the desired behavior for the targeted task [Francois-Lavet et al. 2018]. Moreover, the behaviors learned by the agent are hardly reusable, both within the same task and across many different tasks [Francois-Lavet et al. 2018]. It is difficult for an agent to generalize the learnt skills to make high-level decisions in the environment. For example, such skill could be go to the door using primitive actions consisting in moving in the four cardinal directions; or even to move forward controlling different joints of a humanoid robot like in the robotic simulator MuJoCo [Todorov et al. 2012]. On another side, unlike RL, developmental learning [Cangelosi and Schlesinger 2018; Oudeyer and Smith 2016; Piaget and Cook 1952] is based on the trend that babies, or more broadly organisms, acquire new skill while spontaneously exploring their environment [Barto 2013; Gopnik et al. 1999].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

doi: 10.3390/e25020327

2209.0889

Country:

Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
North America > United States > New York (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(9 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Education (1.00)
Leisure & Entertainment > Games > Computer Games (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Seo, Sangwon, Unhelkar, Vaibhav V.

Semi-Supervised Imitation Learning of Team Policies from Suboptimal Demonstrations

We present Bayesian Team Imitation Learner (BTIL), an imitation learning algorithm to model the behavior of teams performing sequential tasks in Markovian domains. In contrast to existing multi-agent imitation learning techniques, BTIL explicitly models and infers the time-varying mental states of team members, thereby enabling learning of decentralized team policies from demonstrations of suboptimal teamwork. Further, to allow for sample- and label-efficient policy learning from small datasets, BTIL employs a Bayesian perspective and is capable of learning from semi-supervised demonstrations. We demonstrate and benchmark the performance of BTIL on synthetic multi-agent tasks as well as a novel dataset of human-agent teamwork. Our experiments show that BTIL can successfully learn team policies from demonstrations despite the influence of team members' (time-varying and potentially misaligned) mental states on their behavior.

artificial intelligence, demonstration, machine learning, (18 more...)

doi: 10.24963/ijcai.2022/346

2205.02959

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
(2 more...)

Age of Semantics in Cooperative Communications: To Expedite Simulation Towards Real via Offline Reinforcement Learning

Chen, Xianfu, Zhao, Zhifeng, Mao, Shiwen, Wu, Celimuge, Zhang, Honggang, Bennis, Mehdi

The age of information metric fails to correctly describe the intrinsic semantics of a status update. In an intelligent reflecting surface-aided cooperative relay communication system, we propose the age of semantics (AoS) for measuring semantics freshness of the status updates. Specifically, we focus on the status updating from a source node (SN) to the destination, which is formulated as a Markov decision process (MDP). The objective of the SN is to maximize the expected satisfaction of AoS and energy consumption under the maximum transmit power constraint. To seek the optimal control policy, we first derive an online deep actor-critic (DAC) learning scheme under the on-policy temporal difference learning framework. However, implementing the online DAC in practice poses the key challenge in infinitely repeated interactions between the SN and the system, which can be dangerous particularly during the exploration. We then put forward a novel offline DAC scheme, which estimates the optimal control policy from a previously collected dataset without any further interactions with the system. Numerical experiments verify the theoretical results and show that our offline DAC scheme significantly outperforms the online DAC scheme and the most representative baselines in terms of mean utility, demonstrating strong robustness to dataset quality.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2209.08947

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
(16 more...)

Genre: Research Report (0.40)

Industry:

Information Technology (0.46)
Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

de Carvalho, Gonçalo Hora

C-Causal Blindness An experimental computational framework on the isomorphic relationship between biological computation, artificial computation, and logic using weighted hidden Markov models

This text is concerned with a hypothetical flavour of cognitive blindness referred to in this paper as \textit{C-Causal Blindness} or C-CB. A cognitive blindness where the policy to obtain the objective leads to the state to be avoided. A literal example of C-CB would be \textit{Kurt G\"odel's} decision to starve for \textit{"fear of being poisoned"} - take this to be premise \textbf{A}. The objective being \textit{"to avoid being poisoned (so as to not die)"}: \textbf{C}, the plan or policy being \textit{"don't eat"}: \textbf{B}, and the actual outcome having been \textit{"dying"}: $\lnot$\textbf{C} - the state that G\"odel wanted to avoid to begin with. Like many, G\"odel pursued a strategy that caused the result he wanted to avoid. An experimental computational framework is proposed to show the isomorphic relationship between C-CB in brain computations, logic, and computer computations using hidden Markov models.

artificial intelligence, c-cb, machine learning, (16 more...)

2208.07143

Country:

Europe > Portugal (0.05)
North America > United States > New York (0.04)
North America > United States > Maryland > Prince George's County > Hyattsville (0.04)

Genre: Research Report (0.64)

Industry:

Law (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)