Goto

Collaborating Authors

 Markov Models


Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

arXiv.org Artificial Intelligence

Continuous control and planning remains a major challenge in robotics and machine learning. Neuroscience offers the possibility of learning from animal brains that implement highly successful controllers, but it is unclear how to relate an animal's behavior to control principles. Animals may not always act optimally from the perspective of an external observer, but may still act rationally: we hypothesize that animals choose actions with highest expected future subjective value according to their own internal model of the world. Their actions thus result from solving a different optimal control problem from those on which they are evaluated in neuroscience experiments. With this assumption, we propose a novel framework of model-based inverse rational control that learns the agent's internal model that best explains their actions in a task described as a partially observable Markov decision process (POMDP). In this approach we first learn optimal policies generalized over the entire model space of dynamics and subjective rewards, using an extended Kalman filter to represent the belief space, a neural network in the actor-critic framework to optimize the policy, and a simplified basis for the parameter space. We then compute the model that maximizes the likelihood of the experimentally observable data comprising the agent's sensory observations and chosen actions. Our proposed method is able to recover the true model of simulated agents within theoretical error bounds given by limited data. We illustrate this method by applying it to a complex naturalistic task currently used in neuroscience experiments. This approach provides a foundation for interpreting the behavioral and neural dynamics of highly adapted controllers in animal brains.


Decision making in dynamic and interactive environments based on cognitive hierarchy theory: Formulation, solution, and application to autonomous driving

arXiv.org Artificial Intelligence

Abstract-- In this paper, we describe a framework for autonomous decision making in a dynamic and interactive environment based on cognitive hierarchy theory. We model the in - teractions between the ego agent and its operating environm ent as a two-player dynamic game, and integrate cognitive behav - ioral models, Bayesian inference, and receding-horizon op timal control to define a dynamically-evolving decision strategy for the ego agent. Simulation examples representing autonomou s vehicle control in three traffic scenarios where the autonom ous ego vehicle interacts with a human-driven vehicle are repor ted. Autonomous systems are becoming more capable, better accepted, and more commonplace. Many autonomous systems, including collaborative robots [1] and self-driv ing cars [2], operate in dynamic and interactive environments.


Boltzmann Machines Transformation of Unsupervised Deep Learning -- Part 1

#artificialintelligence

Unlike task-specific algorithms, Deep Learning is a part of Machine Learning family based on learning data representations. With massive amounts of computational power, machines can now recognize objects and translate speech in real time, enabling a smart Artificial intelligence in systems. The concept of a software simulating the neocortex's large array of neurons in an artificial neural network is decades old, and it has led to as many disappointments as breakthroughs. But because of improvements in mathematical formulas and increasingly powerful computers, today researchers & data scientists can model many more layers of virtual neurons than ever before. "Recent improvements in Deep Learning has reignited some of the grand challenges in Artificial Intelligence."


Autonomous Target Search with Multiple Coordinated UAVs

Journal of Artificial Intelligence Research

Search and tracking is the problem of locating a moving target and following it to its destination. In this work, we consider a scenario in which the target moves across a large geographical area by following a road network and the search is performed by a team of unmanned aerial vehicles (UAVs). We formulate search and tracking as a combinatorial optimization problem and prove that the objective function is submodular. We exploit this property to devise a greedy algorithm. Although this algorithm does not offer strong theoretical guarantees because of the presence of temporal constraints that limit the feasibility of the solutions, it presents remarkably good performance, especially when several UAVs are available for the mission. As the greedy algorithm suffers when resources are scarce, we investigate two alternative optimization techniques: Constraint Programming (CP) and AI planning. Both approaches struggle to cope with large problems, and so we strengthen them by leveraging the greedy algorithm. We use the greedy solution to warm start the CP model and to devise a domain-dependent heuristic for planning. Our extensive experimental evaluation studies the scalability of the different techniques and identifies the conditions under which one approach becomes preferable to the others.


Vision-based Navigation Using Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Jon a ˇ s Kulh anek 1, Erik Derner 2, Tim de Bruin 1, and Robert Babu ˇ ska 3 Abstract -- Deep reinforcement learning (RL) has been successfully applied to a variety of game-like environments. However, the application of deep RL to visual navigation with realistic environments is a challenging task. We propose a novel learning architecture capable of navigating an agent, e.g. a mobile robot, to a target given by an image. T o achieve this, we have extended the batched A2C algorithm with auxiliary tasks designed to improve visual navigation performance. We propose three additional auxiliary tasks: predicting the segmentation of the observation image and of the target image and predicting the depth-map. These tasks enable the use of supervised learning to pre-train a large part of the network and to reduce the number of training steps substantially. The training performance has been further improved by increasing the environment complexity gradually over time. An efficient neural network structure is proposed, which is capable of learning for multiple targets in multiple environments. Our method navigates in continuous state spaces and on the AI2-THOR environment simulator outperforms state-of-the-art goal-oriented visual navigation methods from the literature. I NTRODUCTION Visual navigation is the problem of navigating an agent, e.g. a mobile robot, in an environment using camera input only. The agent is given a target image (an image it will see from the target position), and its goal is to move from its current position to the target by applying a sequence of actions, based on the camera observations only. We focus on the case when the environment is initially unknown, i.e., no explicit map is available.


Unifying System Health Management and Automated Decision Making

Journal of Artificial Intelligence Research

Health management of complex dynamic systems has evolved from simple automated alarms into a subfield of artificial intelligence with techniques for analyzing off-nominal conditions and generating responses. This evolution took place largely apart from the development of automated system control, planning, and scheduling (generally referred to in this work as decision making). While there have been efforts to establish an information exchange between system health management and decision making, successful practical implementations of integrated architectures remain limited. This article proposes that rather than being treated as connected yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and algorithms, we believe that a unified approach will increase systems' resilience to faults and improve their effectiveness. We overview the prevalent system health management methodology, illustrate its limitations through numerical examples, and describe a proposed unified approach. We then show how typical system health management concepts are accommodated in the proposed approach without loss of functionality or generality. A computational complexity analysis of the unified approach is also provided.


Transferring knowledge from monitored to unmonitored areas for forecasting parking spaces

arXiv.org Artificial Intelligence

Smart cities around the world have begun monitoring parking areas in order to estimate available parking spots and help drivers looking for parking. The current results are promising, indeed. However, existing approaches are limited by the high cost of sensors that need to be installed throughout the city in order to achieve an accurate estimation. This work investigates the extension of estimating parking information from areas equipped with sensors to areas where they are missing. To this end, the similarity between city neighborhoods is determined based on background data, i.e., from geographic information systems. Using the derived similarity values, we analyze the adaptation of occupancy rates from monitored- to unmonitored parking areas.


Strengthening the Case for a Bayesian Approach to Car-following Model Calibration and Validation using Probabilistic Programming

arXiv.org Machine Learning

-- Compute and memory constraints have historically prevented traffic simulation software users from fully utilizing the predictive models underlying them. When calibrating car-following models, particularly, accommodations have included 1) using sensitivity analysis to limit the number of parameters to be calibrated, and 2) identifying only one set of parameter values using data collected from multiple car-following instances across multiple drivers. Shortcuts are further motivated by insufficient data set sizes, for which a driver may have too few instances to fully account for the variation in their driving behavior . In this paper, we demonstrate that recent technological advances can enable transportation researchers and engineers to overcome these constraints and produce calibration results that 1) outperform industry standard approaches, and 2) allow for a unique set of parameters to be estimated for each driver in a data set, even given a small amount of data. We propose a novel calibration procedure for car-following models based on Bayesian machine learning and probabilistic programming, and apply it to real-world data from a naturalistic driving study. We also discuss how this combination of mathematical and software tools can offer additional benefits such as more informative model validation and the incorporation of true-to-data uncertainty into simulation traces. Traffic simulation software packages are widely used in transportation engineering to estimate the impacts of potential changes to a roadway network and forecast system performance under future scenarios. These packages are underpinned by math-and physics-based models, which are designed to describe behavior at an aggregate (macroscopic) level or at the level of individual drivers (microscopic).


Viterbi Extraction tutorial with Hidden Markov Toolkit

arXiv.org Artificial Intelligence

An algorithm used to extract HMM parameters is revisited. Most parts of the extraction process are taken from implemented Hidden Markov Toolkit (HTK) program under name HInit. The algorithm itself shows a few variations compared to another domain of implementations. The HMM model is introduced briefly based on the theory of Discrete Time Markov Chain. We schematically outline the Viterbi method implemented in HTK. Iterative definition of the method which is ready to be implemented in computer programs is reviewed. We also illustrate the method calculation precisely using manual calculation and extensive graphical illustration. The distribution of observation probability used is simply independent Gaussians r.v.s. The purpose of the content is not to justify the performance or accuracy of the method applied in a specific area. This writing merely to describe how the algorithm is performed. The whole content should enlighten the audience the insight of the Viterbi Extraction method used by HTK.


Online Planning for Decentralized Stochastic Control with Partial History Sharing

arXiv.org Artificial Intelligence

Computational challenges are further compounded if agents do not possess complete model knowledge. In this paper, we take advantage of the fact that in many problems agents share some common information, or history, termed partial history sharing . Under this information structure the policy search space is greatly reduced. We propose a provably convergent, online tree-search based algorithm that does not require a closed-form model or explicit communication among agents. Interestingly, our algorithm can be viewed as a generalization of several existing heuristic solvers for decentralized partially observable Markov decision processes. T o demonstrate the applicability of the model, we propose a novel collaborative intrusion response model, where multiple agents (defenders) possessing asymmetric information aim to collaboratively defend a computer network. Numerical results demonstrate the performance of our algorithm.