Markov Models
Restricted Value Iteration: Theory and Algorithms
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs.
Use of Markov Chains to Design an Agent Bidding Strategy for Continuous Double Auctions
Birmingham, W. P., Durfee, E. H., Park, S.
As computational agents are developed for increasingly c omplicated e-commerce applications, the complexity of the decisions they face demands advances in artificial intelligence techniques. For example, an agent representing a seller in an au ction should try to maximize the seller's profit by reasoning about a variety of possibly uncertain pieces of information, such as the maximum prices various buyers might be willing to pay, the possible prices being offered by competing sellers, the rules by which the auction operates, t he dynamic arrival and matching of offers to buy and sell, and so on. A naïve application of multiagent reasoning techniques would require the seller's agent to explicitly model all of the other agents through an extended time horizon, rendering the problem intractable for many realisti cally-sized problems. We have instead devised a new strategy that an agent can use to determine its bid price based on a more tractable Markov chain model of the auction process. We have experimentally identified the conditions under which our new strategy works well, as well as how well it works in comparison to the optimal performance the agent could have achieved had it kn own the future. Our results show that our new strategy in general performs well, outperforming other tractable heuristic strategies in a majority of experiments, and is particularly effective in a "seller's market," where many buy offers are available.
Implementing Human-like Intuition Mechanism in Artificial Intelligence
Human intuition has been simulated by several research projects using artificial intelligence techniques. Most of these algorithms or models lack the ability to handle complications or diversions. Moreover, they also do not explain the factors influencing intuition and the accuracy of the results from this process. In this paper, we present a simple series based model for implementation of human-like intuition using the principles of connectivity and unknown entities. By using Poker hand datasets and Car evaluation datasets, we compare the performance of some well-known models with our intuition model. The aim of the experiment was to predict the maximum accurate answers using intuition based models. We found that the presence of unknown entities, diversion from the current problem scenario, and identifying weakness without the normal logic based execution, greatly affects the reliability of the answers. Generally, the intuition based models cannot be a substitute for the logic based mechanisms in handling such problems. The intuition can only act as a support for an ongoing logic based model that processes all the steps in a sequential manner. However, when time and computational cost are very strict constraints, this intuition based model becomes extremely important and useful, because it can give a reasonably good performance. Factors affecting intuition are analyzed and interpreted through our model.
Proceedings of the 2011 New York Workshop on Computer, Earth and Space Science
Way, Michael J., Naud, Catherine
The purpose of the New York Workshop on Computer, Earth and Space Sciences is to bring together the New York area's finest Astronomers, Statisticians, Computer Scientists, Space and Earth Scientists to explore potential synergies between their respective fields. The 2011 edition (CESS2011) was a great success, and we would like to thank all of the presenters and participants for attending. This year was also special as it included authors from the upcoming book titled "Advances in Machine Learning and Data Mining for Astronomy". Over two days, the latest advanced techniques used to analyze the vast amounts of information now available for the understanding of our universe and our planet were presented. These proceedings attempt to provide a small window into what the current state of research is in this vast interdisciplinary field and we'd like to thank the speakers who spent the time to contribute to this volume.
On Polynomial Sized MDP Succinct Policies
Policies of Markov Decision Processes (MDPs) determine the next action to execute from the current state and, possibly, the history (the past states). When the number of states is large, succinct representations are often used to compactly represent both the MDPs and the policies in a reduced amount of space. In this paper, some problems related to the size of succinctly represented policies are analyzed. Namely, it is shown that some MDPs have policies that can only be represented in space super-polynomial in the size of the MDP, unless the polynomial hierarchy collapses. This fact motivates the study of the problem of deciding whether a given MDP has a policy of a given size and reward. Since some algorithms for MDPs work by finding a succinct representation of the value function, the problem of deciding the existence of a succinct representation of a value function of a given size and reward is also considered.
The Communicative Multiagent Team Decision Problem: Analyzing Teamwork Theories and Models
Despite the significant progress in multiagent teamwork, existing research does not address the optimality of its prescriptions nor the complexity of the teamwork problem. Without a characterization of the optimality-complexity tradeoffs, it is impossible to determine whether the assumptions and approximations made by a particular theory gain enough efficiency to justify the losses in overall performance. To provide a tool for use by multiagent researchers in evaluating this tradeoff, we present a unified framework, the COMmunicative Multiagent Team Decision Problem (COM-MTDP). The COM-MTDP model combines and extends existing multiagent theories, such as decentralized partially observable Markov decision processes and economic team theory. In addition to their generality of representation, COM-MTDPs also support the analysis of both the optimality of team performance and the computational complexity of the agents' decision problem. In analyzing complexity, we present a breakdown of the computational complexity of constructing optimal teams under various classes of problem domains, along the dimensions of observability and communication cost. In analyzing optimality, we exploit the COM-MTDP's ability to encode existing teamwork theories and models to encode two instantiations of joint intentions theory taken from the literature. Furthermore, the COM-MTDP model provides a basis for the development of novel team coordination algorithms. We derive a domain-independent criterion for optimal communication and provide a comparative analysis of the two joint intentions instantiations with respect to this optimal policy. We have implemented a reusable, domain-independent software package based on COM-MTDPs to analyze teamwork coordination strategies, and we demonstrate its use by encoding and evaluating the two joint intentions strategies within an example domain.
Parameter Learning of Logic Programs for Symbolic-Statistical Modeling
We propose a logical/mathematical framework for statistical parameter learning of parameterized logic programs, i.e. definite clause programs containing probabilistic facts with a parameterized distribution. It extends the traditional least Herbrand model semantics in logic programming to distribution semantics, possible world semantics with a probability distribution which is unconditionally applicable to arbitrary logic programs including ones for HMMs, PCFGs and Bayesian networks. We also propose a new EM algorithm, the graphical EM algorithm, that runs for a class of parameterized logic programs representing sequential decision processes where each decision is exclusive and independent. It runs on a new data structure called support graphs describing the logical relationship between observations and their explanations, and learns parameters by computing inside and outside probability generalized for logic programs. The complexity analysis shows that when combined with OLDT search for all explanations for observations, the graphical EM algorithm, despite its generality, has the same time complexity as existing EM algorithms, i.e. the Baum-Welch algorithm for HMMs, the Inside-Outside algorithm for PCFGs, and the one for singly connected Bayesian networks that have been developed independently in each research field. Learning experiments with PCFGs using two corpora of moderate size indicate that the graphical EM algorithm can significantly outperform the Inside-Outside algorithm.
A Probabilistic Perspective on Gaussian Filtering and Smoothing
Deisenroth, Marc Peter, Ohlsson, Henrik
We present a general probabilistic perspective on Gaussian filtering and smoothing. This allows us to show that common approaches to Gaussian filtering/smoothing can be distinguished solely by their methods of computing/approximating the means and covariances of joint probabilities. This implies that novel filters and smoothers can be derived straightforwardly by providing methods for computing these moments. Based on this insight, we derive the cubature Kalman smoother and propose a novel robust filtering and smoothing algorithm based on Gibbs sampling. Inference in latent variable models is about extracting information about a not directly observable quantity, the latent variable, from noisy observations. Both recursive and batch methods are of interest and referred to as filtering respective smoothing. Filtering and smoothing in latent variable time series models, including hidden Markov models and dynamic systems, have been playing an important role in signal processing, control, and machine learning for decades [12, 15, 3].
Reconstruction of Epsilon-Machines in Predictive Frameworks and Decisional States
This article introduces both a new algorithm for reconstructing epsilon-machines from data, as well as the decisional states. These are defined as the internal states of a system that lead to the same decision, based on a user-provided utility or pay-off function. The utility function encodes some a priori knowledge external to the system, it quantifies how bad it is to make mistakes. The intrinsic underlying structure of the system is modeled by an epsilon-machine and its causal states. The decisional states form a partition of the lower-level causal states that is defined according to the higher-level user's knowledge. In a complex systems perspective, the decisional states are thus the "emerging" patterns corresponding to the utility function. The transitions between these decisional states correspond to events that lead to a change of decision. The new REMAPF algorithm estimates both the epsilon-machine and the decisional states from data. Application examples are given for hidden model reconstruction, cellular automata filtering, and edge detection in images.
Accelerating Reinforcement Learning through Implicit Imitation
Imitation can be viewed as a means of enhancing learning in multiagent environments. It augments an agent's ability to learn useful behaviors by making intelligent use of the knowledge implicit in behaviors demonstrated by cooperative teachers or other more experienced agents. We propose and study a formal model of implicit imitation that can accelerate reinforcement learning dramatically in certain cases. Roughly, by observing a mentor, a reinforcement-learning agent can extract information about its own capabilities in, and the relative value of, unvisited parts of the state space. We study two specific instantiations of this model, one in which the learning agent and the mentor have identical abilities, and one designed to deal with agents and mentors with different action sets. We illustrate the benefits of implicit imitation by integrating it with prioritized sweeping, and demonstrating improved performance and convergence through observation of single and multiple mentors. Though we make some stringent assumptions regarding observability and possible interactions, we briefly comment on extensions of the model that relax these restricitions.