Goto

Collaborating Authors

 Undirected Networks


Learning Logic Specifications for Soft Policy Guidance in POMCP

arXiv.org Artificial Intelligence

Partially Observable Monte Carlo Planning (POMCP) is an efficient solver for Partially Observable Markov Decision Processes (POMDPs). It allows scaling to large state spaces by computing an approximation of the optimal policy locally and online, using a Monte Carlo Tree Search based strategy. However, POMCP suffers from sparse reward function, namely, rewards achieved only when the final goal is reached, particularly in environments with large state spaces and long horizons. Recently, logic specifications have been integrated into POMCP to guide exploration and to satisfy safety requirements. However, such policy-related rules require manual definition by domain experts, especially in real-world scenarios. In this paper, we use inductive logic programming to learn logic specifications from traces of POMCP executions, i.e., sets of belief-action pairs generated by the planner. Specifically, we learn rules expressed in the paradigm of answer set programming. We then integrate them inside POMCP to provide soft policy bias toward promising actions. In the context of two benchmark scenarios, rocksample and battery, we show that the integration of learned rules from small task instances can improve performance with fewer Monte Carlo simulations and in larger task instances. We make our modified version of POMCP publicly available at https://github.com/GiuMaz/pomcp_clingo.git.


Bayesian Generalization Error in Linear Neural Networks with Concept Bottleneck Structure and Multitask Formulation

arXiv.org Artificial Intelligence

Concept bottleneck model (CBM) is a ubiquitous method that can interpret neural networks using concepts. In CBM, concepts are inserted between the output layer and the last intermediate layer as observable values. This helps in understanding the reason behind the outputs generated by the neural networks: the weights corresponding to the concepts from the last hidden layer to the output layer. However, it has not yet been possible to understand the behavior of the generalization error in CBM since a neural network is a singular statistical model in general. When the model is singular, a one to one map from the parameters to probability distributions cannot be created. This non-identifiability makes it difficult to analyze the generalization performance. In this study, we mathematically clarify the Bayesian generalization error and free energy of CBM when its architecture is three-layered linear neural networks. We also consider a multitask problem where the neural network outputs not only the original output but also the concepts. The results show that CBM drastically changes the behavior of the parameter region and the Bayesian generalization error in three-layered linear neural networks as compared with the standard version, whereas the multitask formulation does not.


Reinforcement Learning for Omega-Regular Specifications on Continuous-Time MDP

arXiv.org Artificial Intelligence

Continuous-time Markov decision processes (CTMDPs) are canonical models to express sequential decision-making under dense-time and stochastic environments. When the stochastic evolution of the environment is only available via sampling, model-free reinforcement learning (RL) is the algorithm-of-choice to compute optimal decision sequence. RL, on the other hand, requires the learning objective to be encoded as scalar reward signals. Since doing such translations manually is both tedious and error-prone, a number of techniques have been proposed to translate high-level objectives (expressed in logic or automata formalism) to scalar rewards for discrete-time Markov decision processes (MDPs). Unfortunately, no automatic translation exists for CTMDPs. We consider CTMDP environments against the learning objectives expressed as omega-regular languages. Omega-regular languages generalize regular languages to infinite-horizon specifications and can express properties given in popular linear-time logic LTL. To accommodate the dense-time nature of CTMDPs, we consider two different semantics of omega-regular objectives: 1) satisfaction semantics where the goal of the learner is to maximize the probability of spending positive time in the good states, and 2) expectation semantics where the goal of the learner is to optimize the long-run expected average time spent in the ``good states" of the automaton. We present an approach enabling correct translation to scalar reward signals that can be readily used by off-the-shelf RL algorithms for CTMDPs. We demonstrate the effectiveness of the proposed algorithms by evaluating it on some popular CTMDP benchmarks with omega-regular objectives.


Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) is the study of the learning dynamics of strategic agents that coexist in a shared environment, and is one of the important frontiers of machine learning and control. In this paper, we study MARL in stochastic games, also known as Markov games, a multi-agent generalization of Markov decision problems (MDPs) in which the cost-relevant history of the system is summarized by a state variable Shapley [1953]. Due to its ability to model both dynamic inter-temporal choice as well as strategic interaction, the stochastic games model has long been a popular framework for studying multi-agent learning Littman [1994]. In comparison to single-agent reinforcement learning, analysis of MARL is difficult due to several challenges inherent to multi-agent systems, including non-stationarity, conflicting interests, and decentralized information. As a result, fundamental understanding of multi-agent reinforcement learning theory has lagged behind its single-agent counterpart Zhang et al. [2021].


Recommender Systems and Deep Learning in Python - Udemy Free Coupons Discount - Couse Sites

#artificialintelligence

Free Coupon Discount - The most in-depth course on recommendation systems with deep learning, machine learning, data science, and AI techniques Created by Lazy Programmer Inc. Students also bought Artificial Intelligence: Reinforcement Learning in Python Data Science: Natural Language Processing (NLP) in Python Unsupervised Machine Learning Hidden Markov Models in Python Natural Language Processing with Deep Learning in Python Cluster Analysis and Unsupervised Machine Learning in Python Preview this Udemy Course GET COUPON CODE Description Believe it or not, almost all online businesses today make use of recommender systems in some way or another. What do I mean by "recommender systems", and why are they useful? Let's look at the top 3 websites on the Internet, according to Alexa: Google, YouTube, and Facebook. Recommender systems form the very foundation of these technologies. Google: Search results They are why Google is the most successful technology company today.


Generative Logic with Time: Beyond Logical Consistency and Statistical Possibility

arXiv.org Artificial Intelligence

This paper gives a simple theory of inference to logically reason symbolic knowledge fully from data over time. We take a Bayesian approach to model how data causes symbolic knowledge. Probabilistic reasoning with symbolic knowledge is modelled as a process of going the causality forwards and backwards. The forward and backward processes correspond to an interpretation and inverse interpretation of formal logic, respectively. The theory is applied to a localisation problem to show a robot with broken or noisy sensors can efficiently solve the problem in a fully data-driven fashion.


Who's in Charge? Roles and Responsibilities of Decision-Making Components in Conversational Robots

arXiv.org Artificial Intelligence

Software architectures for conversational robots typically consist of multiple modules, each designed for a particular processing task or functionality. Some of these modules are developed for the purpose of making decisions about the next action that the robot ought to perform in the current context. Those actions may relate to physical movements, such as driving forward or grasping an object, but may also correspond to communicative acts, such as asking a question to the human user. In this position paper, we reflect on the organization of those decision modules in human-robot interaction platforms. We discuss the relative benefits and limitations of modular vs. end-to-end architectures, and argue that, despite the increasing popularity of end-to-end approaches, modular architectures remain preferable when developing conversational robots designed to execute complex tasks in collaboration with human users. We also show that most practical HRI architectures tend to be either robot-centric or dialogue-centric, depending on where developers wish to place the ``command center'' of their system. While those design choices may be justified in some application domains, they also limit the robot's ability to flexibly interleave physical movements and conversational behaviours. We contend that architectures placing ``action managers'' and ``interaction managers'' on an equal footing may provide the best path forward for future human-robot interaction systems.


Incorporating Human Path Preferences in Robot Navigation with Minimal Interventions

arXiv.org Artificial Intelligence

Robots that can effectively understand human intentions from actions are crucial for successful human-robot collaboration. In this work, we address the challenge of a robot navigating towards an unknown goal while also accounting for a human's preference for a particular path in the presence of obstacles. This problem is particularly challenging when both the goal and path preference are unknown a priori. To overcome this challenge, we propose a method for encoding and inferring path preference online using a partitioning of the space into polytopes. Our approach enables joint inference over the goal and path preference using a stochastic observation model for the human. We evaluate our method on an unknown-goal navigation problem with sparse human interventions, and find that it outperforms baseline approaches as the human's inputs become increasingly sparse. We find that the time required to update the robot's belief does not increase with the complexity of the environment, which makes our method suitable for online applications.


Simultaneous Action Recognition and Human Whole-Body Motion and Dynamics Prediction from Wearable Sensors

arXiv.org Artificial Intelligence

This paper presents a novel approach to solve simultaneously the problems of human activity recognition and whole-body motion and dynamics prediction for real-time applications. Starting from the dynamics of human motion and motor system theory, the notion of mixture of experts from deep learning has been extended to address this problem. In the proposed approach, experts are modelled as a sequence-to-sequence recurrent neural networks (RNN) architecture. Experiments show the results of 66-DoF real-world human motion prediction and action recognition during different tasks like walking and rotating. The code associated with this paper is available at: \url{github.com/ami-iit/paper_darvish_2022_humanoids_action-kindyn-predicition}


Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

arXiv.org Artificial Intelligence

We study Markov decision processes (MDPs), where agents have direct control over when and how they gather information, as formalized by action-contingent noiselessly observable MDPs (ACNO-MPDs). In these models, actions consist of two components: a control action that affects the environment, and a measurement action that affects what the agent can observe. To solve ACNO-MDPs, we introduce the act-then-measure (ATM) heuristic, which assumes that we can ignore future state uncertainty when choosing control actions. We show how following this heuristic may lead to shorter policy computation times and prove a bound on the performance loss incurred by the heuristic. To decide whether or not to take a measurement action, we introduce the concept of measuring value. We develop a reinforcement learning algorithm based on the ATM heuristic, using a Dyna-Q variant adapted for partially observable domains, and showcase its superior performance compared to prior methods on a number of partially-observable environments.