Undirected Networks
PHASE: PHysically-grounded Abstract Social Events for Machine Social Perception
Netanyahu, Aviv, Shu, Tianmin, Katz, Boris, Barbu, Andrei, Tenenbaum, Joshua B.
The ability to perceive and reason about social interactions in the context of physical environments is core to human social intelligence and human-machine cooperation. However, no prior dataset or benchmark has systematically evaluated physically grounded perception of complex social interactions that go beyond short actions, such as high-fiving, or simple group activities, such as gathering. In this work, we create a dataset of physically-grounded abstract social events, PHASE, that resemble a wide range of real-life social interactions by including social concepts such as helping another agent. PHASE consists of 2D animations of pairs of agents moving in a continuous space generated procedurally using a physics engine and a hierarchical planner. Agents have a limited field of view, and can interact with multiple objects, in an environment that has multiple landmarks and obstacles. Using PHASE, we design a social recognition task and a social prediction task. PHASE is validated with human experiments demonstrating that humans perceive rich interactions in the social events, and that the simulated agents behave similarly to humans. As a baseline model, we introduce a Bayesian inverse planning approach, SIMPLE (SIMulation, Planning and Local Estimation), which outperforms state-of-the-art feed-forward neural networks. We hope that PHASE can serve as a difficult new challenge for developing new models that can recognize complex social interactions.
Knowledge-Based Hierarchical POMDPs for Task Planning
Serrano, Sergio A., Santiago, Elizabeth, Martinez-Carranza, Jose, Morales, Eduardo, Sucar, L. Enrique
The main goal in task planning is to build a sequence of actions that takes an agent from an initial state to a goal state. In robotics, this is particularly difficult because actions usually have several possible results, and sensors are prone to produce measurements with error. Partially observable Markov decision processes (POMDPs) are commonly employed, thanks to their capacity to model the uncertainty of actions that modify and monitor the state of a system. However, since solving a POMDP is computationally expensive, their usage becomes prohibitive for most robotic applications. In this paper, we propose a task planning architecture for service robotics. In the context of service robot design, we present a scheme to encode knowledge about the robot and its environment, that promotes the modularity and reuse of information. Also, we introduce a new recursive definition of a POMDP that enables our architecture to autonomously build a hierarchy of POMDPs, so that it can be used to generate and execute plans that solve the task at hand. Experimental results show that, in comparison to baseline methods, by following a recursive hierarchical approach the architecture is able to significantly reduce the planning time, while maintaining (or even improving) the robustness under several scenarios that vary in uncertainty and size.
Lossless compression with state space models using bits back coding
We generalize the'bits back with ANS' method to time-series models with a latent Markov structure. This family of models includes hidden Markov models (HMMs), linear Gaussian state space models (LGSSMs) and many more. We provide experimental evidence that our method is effective for small scale models, and discuss its applicability to larger scale settings such as video compression. Recent work by Townsend et al. (2019) shows the existence of a practical method, called'bits back with ANS' (BB-ANS), for doing lossless compression with a latent variable model, at rates close to the negative variational free energy of the model (this quantity bounds the model's marginal log-likelihood and is often referred to as the'evidence lower bound', or ELBO). BB-ANS depends on a last-in-first-out (LIFO) source coding algorithm called Asymmetric Numeral Systems (ANS; Duda, 2009), and also uses an idea called bits back coding (Wallace, 1990; Hinton & van Camp, 1993).
Filter-Based Abstractions with Correctness Guarantees for Planning under Uncertainty
Badings, Thom S., Jansen, Nils, Poonawala, Hasan A., Stoelinga, Marielle
We study planning problems for continuous control systems with uncertainty caused by measurement and process noise. The goal is to find an optimal plan that guarantees that the system reaches a desired goal state within finite time. Measurement noise causes limited observability of system states, and process noise causes uncertainty in the outcome of a given plan. These factors render the problem undecidable in general. Our key contribution is a novel abstraction scheme that employs Kalman filtering as a state estimator to obtain a finite-state model, which we formalize as a Markov decision process (MDP). For this MDP, we employ state-of-the-art model checking techniques to efficiently compute plans that maximize the probability of reaching goal states. Moreover, we account for numerical imprecision in computing the abstraction by extending the MDP with intervals of probabilities as a more robust model. We show the correctness of the abstraction and provide several optimizations that aim to balance the quality of the plan and the scalability of the approach. We demonstrate that our method can handle systems that result in MDPs with thousands of states and millions of transitions.
A Practical Guide to Multi-Objective Reinforcement Learning and Planning
Hayes, Conor F., Rădulescu, Roxana, Bargiacchi, Eugenio, Källström, Johan, Macfarlane, Matthew, Reymond, Mathieu, Verstraeten, Timothy, Zintgraf, Luisa M., Dazeley, Richard, Heintz, Fredrik, Howley, Enda, Irissappane, Athirai A., Mannion, Patrick, Nowé, Ann, Ramos, Gabriel, Restelli, Marcello, Vamplew, Peter, Roijers, Diederik M.
Real-world decision-making tasks are generally complex, requiring trade-offs between multiple, often conflicting, objectives. Despite this, the majority of research in reinforcement learning and decision-theoretic planning either assumes only a single objective, or that multiple objectives can be adequately handled via a simple linear combination. Such approaches may oversimplify the underlying problem and hence produce suboptimal results. This paper serves as a guide to the application of multi-objective methods to difficult problems, and is aimed at researchers who are already familiar with single-objective reinforcement learning and planning methods who wish to adopt a multi-objective perspective on their research, as well as practitioners who encounter multi-objective decision problems in practice. It identifies the factors that may influence the nature of the desired solution, and illustrates by example how these influence the design of multi-objective decision-making systems for complex problems.
Dynamic programming for machine learning: Hidden Markov Models
Machine learning requires many sophisticated algorithms to learn from existing data, then apply the learnings to new data. Dynamic programming turns up in many of these algorithms. This may be because dynamic programming excels at solving problems involving "non-local" information, making greedy or divide-and-conquer algorithms ineffective. In this article, I'll explore one technique used in machine learning, Hidden Markov Models (HMMs), and how dynamic programming is used when applying this technique. After discussing HMMs, I'll show a few real-world examples where HMMs are used.
Building Safer Autonomous Agents by Leveraging Risky Driving Behavior Knowledge
Simulation environments are good for learning different driving tasks like lane changing, parking or handling intersections etc. in an abstract manner. However, these simulation environments often restrict themselves to operate under conservative interactions behavior amongst different vehicles. But, as we know that the real driving tasks often involves very high risk scenarios where other drivers often don't behave in the expected sense. There can be many reasons for this behavior like being tired or inexperienced. The simulation environments doesn't take this information into account while training the navigation agent. Therefore, in this study we especially focus on systematically creating these risk prone scenarios with heavy traffic and unexpected random behavior for creating better model-free learning agents. We generate multiple autonomous driving scenarios by creating new custom Markov Decision Process (MDP) environment iterations in highway-env simulation package. The behavior policy is learnt by agents trained with the help from deep reinforcement learning models. Our behavior policy is deliberated to handle collisions and risky randomized driver behavior. We train model free learning agents with supplement information of risk prone driving scenarios and compare their performance with baseline agents. Finally, we casually measure the impact of adding these perturbations in the training process to precisely account for the performance improvement attained from utilizing the learnings from these scenarios.
Regenerativity of Viterbi process for pairwise Markov models
For hidden Markov models one of the most popular estimates of the hidden chain is the Viterbi path -- the path maximising the posterior probability. We consider a more general setting, called the pairwise Markov model (PMM), where the joint process consisting of finite-state hidden process and observation process is assumed to be a Markov chain. It has been recently proven that under some conditions the Viterbi path of the PMM can almost surely be extended to infinity, thereby defining the infinite Viterbi decoding of the observation sequence, called the Viterbi process. This was done by constructing a block of observations, called a barrier, which ensures that the Viterbi path goes trough a given state whenever this block occurs in the observation sequence. In this paper we prove that the joint process consisting of Viterbi process and PMM is regenerative. The proof involves a delicate construction of regeneration times which coincide with the occurrences of barriers. As one possible application of our theory, some results on the asymptotics of the Viterbi training algorithm are derived.
Double Articulation Analyzer with Prosody for Unsupervised Word and Phoneme Discovery
Okuda, Yasuaki, Ozaki, Ryo, Taniguchi, Tadahiro
Infants acquire words and phonemes from unsegmented speech signals using segmentation cues, such as distributional, prosodic, and co-occurrence cues. Many pre-existing computational models that represent the process tend to focus on distributional or prosodic cues. This paper proposes a nonparametric Bayesian probabilistic generative model called the prosodic hierarchical Dirichlet process-hidden language model (Prosodic HDP-HLM). Prosodic HDP-HLM, an extension of HDP-HLM, considers both prosodic and distributional cues within a single integrative generative model. We conducted three experiments on different types of datasets, and demonstrate the validity of the proposed method. The results show that the Prosodic DAA successfully uses prosodic cues and outperforms a method that solely uses distributional cues. The main contributions of this study are as follows: 1) We develop a probabilistic generative model for time series data including prosody that potentially has a double articulation structure; 2) We propose the Prosodic DAA by deriving the inference procedure for Prosodic HDP-HLM and show that Prosodic DAA can discover words directly from continuous human speech signals using statistical information and prosodic information in an unsupervised manner; 3) We show that prosodic cues contribute to word segmentation more in naturally distributed case words, i.e., they follow Zipf's law.
Whole brain Probabilistic Generative Model toward Realizing Cognitive Architecture for Developmental Robots
Taniguchi, Tadahiro, Yamakawa, Hiroshi, Nagai, Takayuki, Doya, Kenji, Sakagami, Masamichi, Suzuki, Masahiro, Nakamura, Tomoaki, Taniguchi, Akira
Building a humanlike integrative artificial cognitive system, that is, an artificial general intelligence, is one of the goals in artificial intelligence and developmental robotics. Furthermore, a computational model that enables an artificial cognitive system to achieve cognitive development will be an excellent reference for brain and cognitive science. This paper describes the development of a cognitive architecture using probabilistic generative models (PGMs) to fully mirror the human cognitive system. The integrative model is called a whole-brain PGM (WB-PGM). It is both brain-inspired and PGMbased. In this paper, the process of building the WB-PGM and learning from the human brain to build cognitive architectures is described.