observation trace
Learning to Act and Observe in Partially Observable Domains
Bolander, Thomas, Gierasimczuk, Nina, Liberman, Andrés Occhipinti
We consider a learning agent in a partially observable environment, with which the agent has never interacted before, and about which it learns both what it can observe and how its actions affect the environment. The agent can learn about this domain from experience gathered by taking actions in the domain and observing their results. We present learning algorithms capable of learning as much as possible (in a well-defined sense) both about what is directly observable and about what actions do in the domain, given the learner's observational constraints. We differentiate the level of domain knowledge attained by each algorithm, and characterize the type of observations required to reach it. The algorithms use dynamic epistemic logic (DEL) to represent the learned domain information symbolically. Our work continues that of Bolander and Gierasimczuk (2015), which developed DEL-based learning algorithms based to learn domain information in fully observable domains.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)
Induction of Subgoal Automata for Reinforcement Learning
Furelos-Blanco, Daniel, Law, Mark, Russo, Alessandra, Broda, Krysia, Jonsson, Anders
Our method relies on inducing an automaton whose transitions are subgoals expressed as propositional formulas over a set of observable events. A state-of-the-art inductive logic programming system is used to learn the automaton from observation traces perceived by the RL agent. The reinforcement learning and automaton learning processes are interleaved: a new refined automaton is learned whenever the RL agent generates a trace not recognized by the current automaton. We evaluate ISA in several gridworld problems and show that it performs similarly to a method for which automata are given in advance. We also show that the learned automata can be exploited to speed up convergence through reward shaping and transfer learning across multiple tasks. Finally, we analyze the running time and the number of traces that ISA needs to learn an automata, and the impact that the number of observable events has on the learner's performance.
- North America > United States (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)