Undirected Networks
A Framework of Explanation Generation toward Reliable Autonomous Robots
Sakai, Tatsuya, Miyazawa, Kazuki, Horii, Takato, Nagai, Takayuki
To realize autonomous collaborative robots, it is important to increase the trust that users have in them. Toward this goal, this paper proposes an algorithm which endows an autonomous agent with the ability to explain the transition from the current state to the target state in a Markov decision process (MDP). According to cognitive science, to generate an explanation that is acceptable to humans, it is important to present the minimum information necessary to sufficiently understand an event. To meet this requirement, this study proposes a framework for identifying important elements in the decision-making process using a prediction model for the world and generating explanations based on these elements. To verify the ability of the proposed method to generate explanations, we conducted an experiment using a grid environment. It was inferred from the result of a simulation experiment that the explanation generated using the proposed method was composed of the minimum elements important for understanding the transition from the current state to the target state. Furthermore, subject experiments showed that the generated explanation was a good summary of the process of state transition, and that a high evaluation was obtained for the explanation of the reason for an action.
Utilizing Skipped Frames in Action Repeats via Pseudo-Actions
Hashimoto, Taisei, Tsuruoka, Yoshimasa
In many deep reinforcement learning settings, when an agent takes an action, it repeats the same action a predefined number of times without observing the states until the next action-decision point. This technique of action repetition has several merits in training the agent, but the data between action-decision points (i.e., intermediate frames) are, in effect, discarded. Since the amount of training data is inversely proportional to the interval of action repeats, they can have a negative impact on the sample efficiency of training. In this paper, we propose a simple but effective approach to alleviate to this problem by introducing the concept of pseudo-actions. The key idea of our method is making the transition between action-decision points usable as training data by considering pseudo-actions. Pseudo-actions for continuous control tasks are obtained as the average of the action sequence straddling an action-decision point. For discrete control tasks, pseudo-actions are computed from learned action embeddings. This method can be combined with any model-free reinforcement learning algorithm that involves the learning of Q-functions. We demonstrate the effectiveness of our approach on both continuous and discrete control tasks in OpenAI Gym.
Explainable Autonomous Robots: A Survey and Perspective
Sakai, Tatsuya, Nagai, Takayuki
It is commonly claimed that AI will replace most manual labor in the future; however, is this really the case? AI technologies do have higher image recognition accuracy compared to humans in some limited contexts, and have consistently outperformed humans in classical games such as Go and chess. Nonetheless, we believe that even advanced future developments based on current technology will not lead to robots replacing humans. AI systems' fundamental lack of ability to communicate naturally and effectively with humans is among the most significant reasons that they cannot replace human labor. Here, one may believe that such communication could be achieved via the development of natural language processing (NLP) technology [4]; however, NLP technologies are systems for estimating the content of human statements and their meanings; they do not constitute communication. That is, humans do not feel that robots using such systems truly understand and respond to them appropriately. Therefore, if effective communication is not achieved, robots will continue to function only as tools to assist humans. Advancements improving the accuracy or effectiveness of various specific tasks do not indicate that robots are equivalent to human beings. Under this scenario, how can we enable robots to communicate with humans?
On the Cluster Admission Problem for Cloud Computing
Dierks, Ludwig, Kash, Ian, Seuken, Sven
Cloud computing providers face the problem of matching heterogeneous customer workloads to resources that will serve them. This is particularly challenging if customers, who are already running a job on a cluster, scale their resource usage up and down over time. The provider therefore has to continuously decide whether she can add additional workloads to a given cluster or if doing so would impact existing workloadsโ ability to scale. Currently, this is often done using simple threshold policies to reserve large parts of each cluster, which leads to low efficiency (i.e., low average utilization of the cluster). We propose more sophisticated policies for controlling admission to a cluster and demonstrate that they significantly increase cluster utilization. We first introduce the cluster admission problem and formalize it as a constrained Partially Observable Markov Decision Process (POMDP). As it is infeasible to solve the POMDP optimally, we then systematically design admission policies that estimate moments of each workloadโs distribution of future resource usage. Via extensive simulations grounded in a trace from Microsoft Azure, we show that our admission policies lead to a substantial improvement over the simple threshold policy. We then show that substantial further gains are possible if high-quality information is available about arriving workloads. Based on this, we propose an information elicitation approach to incentivize users to provide this information and simulate its effects.
Human Activity Recognition Models in Ontology Networks
Buoncompagni, Luca, Kareem, Syed Yusha, Mastrogiovanni, Fulvio
We present Arianna+, a framework to design networks of ontologies for representing knowledge enabling smart homes to perform human activity recognition online. In the network, nodes are ontologies allowing for various data contextualisation, while edges are general-purpose computational procedures elaborating data. Arianna+ provides a flexible interface between the inputs and outputs of procedures and statements, which are atomic representations of ontological knowledge. Arianna+ schedules procedures on the basis of events by employing logic-based reasoning, i.e., by checking the classification of certain statements in the ontologies. Each procedure involves input and output statements that are differently contextualised in the ontologies based on specific prior knowledge. Arianna+ allows to design networks that encode data within multiple contexts and, as a reference scenario, we present a modular network based on a spatial context shared among all activities and a temporal context specialised for each activity to be recognised. In the paper, we argue that a network of small ontologies is more intelligible and has a reduced computational load than a single ontology encoding the same knowledge. Arianna+ integrates in the same architecture heterogeneous data processing techniques, which may be better suited to different contexts. Thus, we do not propose a new algorithmic approach to activity recognition, instead, we focus on the architectural aspects for accommodating logic-based and data-driven activity models in a context-oriented way. Also, we discuss how to leverage data contextualisation and reasoning for activity recognition, and to support an iterative development process driven by domain experts.
RDMSim: An Exemplar for Evaluation and Comparison of Decision-Making Techniques for Self-Adaptation
Samin, Huma, Paucar, Luis H. Garcia, Bencomo, Nelly, Hurtado, Cesar M. Carranza, Fredericks, Erik M.
Decision-making for self-adaptation approaches need to address different challenges, including the quantification of the uncertainty of events that cannot be foreseen in advance and their effects, and dealing with conflicting objectives that inherently involve multi-objective decision making (e.g., avoiding costs vs. providing reliable service). To enable researchers to evaluate and compare decision-making techniques for self-adaptation, we present the RDMSim exemplar. RDMSim enables researchers to evaluate and compare techniques for decision-making under environmental uncertainty that support self-adaptation. The focus of the exemplar is on the domain problem related to Remote Data Mirroring, which gives opportunity to face the challenges described above. RDMSim provides probe and effector components for easy integration with external adaptation managers, which are associated with decision-making techniques and based on the MAPE-K loop. Specifically, the paper presents (i) RDMSim, a simulator for real-world experimentation, (ii) a set of realistic simulation scenarios that can be used for experimentation and comparison purposes, (iii) data for the sake of comparison.
The Flipped Classroom model for teaching Conditional Random Fields in an NLP course
In this article, we show and discuss our experience in applying the flipped classroom method for teaching Conditional Random Fields in a Natural Language Processing course. We present the activities that we developed together with their relationship to a cognitive complexity model (Bloom's taxonomy). After this, we provide our own reflections and expectations of the model itself. Based on the evaluation got from students, it seems that students learn about the topic and also that the method is rewarding for some students. Additionally, we discuss some shortcomings and we propose possible solutions to them. We conclude the paper with some possible future work.
A learning gap between neuroscience and reinforcement learning
Wauthier, Samuel T., Mazzaglia, Pietro, รatal, Ozan, De Boom, Cedric, Verbelen, Tim, Dhoedt, Bart
Historically, artificial intelligence has drawn much inspiration from neuroscience to fuel advances in the field. However, current progress in reinforcement learning is largely focused on benchmark problems that fail to capture many of the aspects that are of interest in neuroscience today. We illustrate this point by extending a T-maze task from neuroscience for use with reinforcement learning algorithms, and show that state-of-the-art algorithms are not capable of solving this problem. Finally, we point out where insights from neuroscience could help explain some of the issues encountered.
Learning Good State and Action Representations via Tensor Decomposition
Ni, Chengzhuo, Zhang, Anru, Duan, Yaqi, Wang, Mengdi
The transition kernel of a continuous-state-action Markov decision process (MDP) admits a natural tensor structure. This paper proposes a tensor-inspired unsupervised learning method to identify meaningful low-dimensional state and action representations from empirical trajectories. The method exploits the MDP's tensor structure by kernelization, importance sampling and low-Tucker-rank approximation. This method can be further used to cluster states and actions respectively and find the best discrete MDP abstraction. We provide sharp statistical error bounds for tensor concentration and the preservation of diffusion distance after embedding.
Mixing Time Guarantees for Unadjusted Hamiltonian Monte Carlo
Bou-Rabee, Nawaf, Eberle, Andreas
We provide quantitative upper bounds on the total variation mixing time of the Markov chain corresponding to the unadjusted Hamiltonian Monte Carlo (uHMC) algorithm. For two general classes of models and fixed time discretization step size $h$, the mixing time is shown to depend only logarithmically on the dimension. Moreover, we provide quantitative upper bounds on the total variation distance between the invariant measure of the uHMC chain and the true target measure. As a consequence, we show that an $\varepsilon$-accurate approximation of the target distribution $\mu$ in total variation distance can be achieved by uHMC for a broad class of models with $O\left(d^{3/4}\varepsilon^{-1/2}\log (d/\varepsilon )\right)$ gradient evaluations, and for mean field models with weak interactions with $O\left(d^{1/2}\varepsilon^{-1/2}\log (d/\varepsilon )\right)$ gradient evaluations. The proofs are based on the construction of successful couplings for uHMC that realize the upper bounds.