Genre
ABC Reinforcement Learning
Dimitrakakis, Christos, Tziortziotis, Nikolaos
This paper introduces a simple, general framework for likelihood-free Bayesian reinforcement learning, through Approximate Bayesian Computation (ABC). The main advantage is that we only require a prior distribution on a class of simulators (generative models). This is useful in domains where an analytical probabilistic model of the underlying process is too complex to formulate, but where detailed simulation models are available. ABC-RL allows the use of any Bayesian reinforcement learning technique, even in this case. In addition, it can be seen as an extension of rollout algorithms to the case where we do not know what the correct model to draw rollouts from is. We experimentally demonstrate the potential of this approach in a comparison with LSPI. Finally, we introduce a theorem showing that ABC is a sound methodology in principle, even when non-sufficient statistics are used.
Investigation of "Enhancing flexibility and robustness in multi-agent task scheduling"
Wilson et al. propose a measure of flexibility in project scheduling problems and propose several ways of distributing flexibility over tasks without overrunning the deadline. These schedules prove quite robust: delays of some tasks do not necessarily lead to delays of subsequent tasks. The number of tasks that finish late depends, among others, on the way of distributing flexibility. In this paper I study the different flexibility distributions proposed by Wilson et al. and the differences in number of violations (tasks that finish too late). I show one factor in the instances that causes differences in the number of violations, as well as two properties of the flexibility distribution that cause them to behave differently. Based on these findings, I propose three new flexibility distributions. Depending on the nature of the delays, these new flexibility distributions perform as good as or better than the distributions by Wilson et al.
Axiomatic properties of inconsistency indices for pairwise comparisons
Brunelli, Matteo, Fedrizzi, Michele
Pairwise comparisons are a well-known method for the representation of the subjective preferences of a decision maker. Evaluating their inconsistency has been a widely studied and discussed topic and several indices have been proposed in the literature to perform this task. Since an acceptable level of consistency is closely related with the reliability of preferences, a suitable choice of an inconsistency index is a crucial phase in decision making processes. The use of different methods for measuring consistency must be carefully evaluated, as it can affect the decision outcome in practical applications. In this paper, we present five axioms aimed at characterizing inconsistency indices. In addition, we prove that some of the indices proposed in the literature satisfy these axioms, while others do not, and therefore, in our view, they may fail to correctly evaluate inconsistency.
Traffic data reconstruction based on Markov random field modeling
Kataoka, Shun, Yasuda, Muneki, Furtlehner, Cyril, Tanaka, Kazuyuki
We consider the traffic data reconstruction problem. Suppose we have the traffic data of an entire city that are incomplete because some road data are unobserved. The problem is to reconstruct the unobserved parts of the data. In this paper, we propose a new method to reconstruct incomplete traffic data collected from various traffic sensors. Our approach is based on Markov random field modeling of road traffic. The reconstruction is achieved by using mean-field method and a machine learning method. We numerically verify the performance of our method using realistic simulated traffic data for the real road network of Sendai, Japan.
Optimal Feature Selection in High-Dimensional Discriminant Analysis
We consider the high-dimensional discriminant analysis problem. For this problem, different methods have been proposed and justified by establishing exact convergence rates for the classification risk, as well as the l2 convergence results to the discriminative rule. However, sharp theoretical analysis for the variable selection performance of these procedures have not been established, even though model interpretation is of fundamental importance in scientific data analysis. This paper bridges the gap by providing sharp sufficient conditions for consistent variable selection using the sparse discriminant analysis (Mai et al., 2012). Through careful analysis, we establish rates of convergence that are significantly faster than the best known results and admit an optimal scaling of the sample size n, dimensionality p, and sparsity level s in the high-dimensional setting. Sufficient conditions are complemented by the necessary information theoretic limits on the variable selection problem in the context of high-dimensional discriminant analysis. Exploiting a numerical equivalence result, our method also establish the optimal results for the ROAD estimator (Fan et al., 2012) and the sparse optimal scaling estimator (Clemmensen et al., 2011). Furthermore, we analyze an exhaustive search procedure, whose performance serves as a benchmark, and show that it is variable selection consistent under weaker conditions. Extensive simulations demonstrating the sharpness of the bounds are also provided.
Measurements of collective machine intelligence
Independent from the still ongoing research in measuring individual intelligence, we anticipate and provide a framework for measuring collective intelligence. Collective intelligence refers to the idea that several individuals can collaborate in order to achieve high levels of intelligence. We present thus some ideas on how the intelligence of a group can be measured and simulate such tests. We will however focus here on groups of artificial intelligence agents (i.e., machines). We will explore how a group of agents is able to choose the appropriate problem and to specialize for a variety of tasks. This is a feature which is an important contributor to the increase of intelligence in a group (apart from the addition of more agents and the improvement due to common decision making). Our results reveal some interesting results about how (collective) intelligence can be modeled, about how collective intelligence tests can be designed and about the underlying dynamics of collective intelligence. As it will be useful for our simulations, we provide also some improvements of the threshold allocation model originally used in the area of swarm intelligence but further generalized here.
A Fuzzy Topsis Multiple-Attribute Decision Making for Scholarship Selection
As the education fees are becoming more expe nsive, more students apply for scholarships. Consequently, hundreds and even thousands of applicati ons need to be handled by the sponsor. To solve the problems, some alternatives based on several attri butes (criteria) need to be selected. In order to make a decision on such fuzzy problems, Fuzzy Multiple Attribute Decision Making (FMDAM) can be applied. In this study, Unified Modeling Language (UML) in FMADM with TOPSIS and Weighted Product (WP) methods is applied to select the candidates for ac ademic and non-academic scholarships at Universitas Islam Negeri Sunan Kalijaga. Data used were a crisp and fuzzy data. The result s show that TOPSIS and Weighted Product FMADM methods can be used to se lect the most suitable candidates to receive the scholarships since the preference values applied in this method can show applicants with the highest eligibility.
Solving Relational MDPs with Exogenous Events and Additive Rewards
Joshi, S., Khardon, R., Tadepalli, P., Raghavan, A., Fern, A.
We formalize a simple but natural subclass of service domains for relational planning problems with object-centered, independent exogenous events and additive rewards capturing, for example, problems in inventory control. Focusing on this subclass, we present a new symbolic planning algorithm which is the first algorithm that has explicit performance guarantees for relational MDPs with exogenous events. In particular, under some technical conditions, our planning algorithm provides a monotonic lower bound on the optimal value function. To support this algorithm we present novel evaluation and reduction techniques for generalized first order decision diagrams, a knowledge representation for real-valued functions over relational world states. Our planning algorithm uses a set of focus states, which serves as a training set, to simplify and approximate the symbolic solution, and can thus be seen to perform learning for planning. A preliminary experimental evaluation demonstrates the validity of our approach.
Scaling Up Robust MDPs by Reinforcement Learning
Tamar, Aviv, Xu, Huan, Mannor, Shie
We consider large-scale Markov decision processes (MDPs) with parameter uncertainty, under the robust MDP paradigm. Previous studies showed that robust MDPs, based on a minimax approach to handle uncertainty, can be solved using dynamic programming for small to medium sized problems. However, due to the "curse of dimensionality", MDPs that model real-life problems are typically prohibitively large for such approaches. In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. We show that the proposed method provably succeeds under certain technical conditions, and demonstrate its effectiveness through simulation of an option pricing problem. To the best of our knowledge, this is the first attempt to scale up the robust MDPs paradigm.
A Data Mining Approach to Solve the Goal Scoring Problem
Oliveira, Renato, Adeodato, Paulo, Carvalho, Arthur, Viegas, Icamaan, Diego, Christian, Ing-Ren, Tsang
In soccer, scoring goals is a fundamental objective which depends on many conditions and constraints. Considering the RoboCup soccer 2D-simulator, this paper presents a data mining-based decision system to identify the best time and direction to kick the ball towards the goal to maximize the overall chances of scoring during a simulated soccer match. Following the CRISP-DM methodology, data for modeling were extracted from matches of major international tournaments (10691 kicks), knowledge about soccer was embedded via transformation of variables and a Multilayer Perceptron was used to estimate the scoring chance. Experimental performance assessment to compare this approach against previous LDA-based approach was conducted from 100 matches. Several statistical metrics were used to analyze the performance of the system and the results showed an increase of 7.7% in the number of kicks, producing an overall increase of 78% in the number of goals scored.