Government
Efficient Solution Algorithms for Factored MDPs
Guestrin, C., Koller, D., Parr, R., Venkataraman, S.
This paper addresses the problem of planning under uncertainty in large Markov Decision Processes (MDPs). Factored MDPs represent a complex state space using state variables and the transition model using a dynamic Bayesian network. This representation often allows an exponential reduction in the representation size of structured MDPs, but the complexity of exact solution algorithms for such MDPs can grow exponentially in the representation size. In this paper, we present two approximate solution algorithms that exploit structure in factored MDPs. Both use an approximate value function represented as a linear combination of basis functions, where each basis function involves only a small subset of the domain variables. A key contribution of this paper is that it shows how the basic operations of both algorithms can be performed efficiently in closed form, by exploiting both additive and context-specific structure in a factored MDP. A central element of our algorithms is a novel linear program decomposition technique, analogous to variable elimination in Bayesian networks, which reduces an exponentially large LP to a provably equivalent, polynomial-sized one. One algorithm uses approximate linear programming, and the second approximate dynamic programming. Our dynamic programming algorithm is novel in that it uses an approximation based on max-norm, a technique that more directly minimizes the terms that appear in error bounds for approximate MDP algorithms. We provide experimental results on problems with over 10^40 states, demonstrating a promising indication of the scalability of our approach, and compare our algorithm to an existing state-of-the-art approach, showing, in some problems, exponential gains in computation time.
Inferring Strategies for Sentence Ordering in Multidocument News Summarization
The problem of organizing information for multidocument summarization so that the generated summary is coherent has received relatively little attention. While sentence ordering for single document summarization can be determined from the ordering of sentences in the input article, this is not the case for multidocument summarization where summary sentences may be drawn from different input articles. In this paper, we propose a methodology for studying the properties of ordering information in the news genre and describe experiments done on a corpus of multiple acceptable orderings we developed for the task. Based on these experiments, we implemented a strategy for ordering information that combines constraints from chronological order of events and topical relatedness. Evaluation of our augmented algorithm shows a significant improvement of the ordering over two baseline strategies.
SMOTE: Synthetic Minority Over-sampling Technique
Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P.
An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of over-sampling the minority (abnormal) class and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space) than only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space) than varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC) and the ROC convex hull strategy.
Moment based estimation of stochastic Kronecker graph parameters
Gleich, David F., Owen, Art B.
Stochastic Kronecker graphs supply a parsimonious model for large sparse real world graphs. They can specify the distribution of a large random graph using only three or four parameters. Those parameters have however proved difficult to choose in specific applications. This article looks at method of moments estimators that are computationally much simpler than maximum likelihood. The estimators are fast and in our examples, they typically yield Kronecker parameters with expected feature counts closer to a given graph than we get from KronFit. The improvement was especially prominent for the number of triangles in the graph.
ATTac-2000: An Adaptive Autonomous Bidding Agent
Kearns, M., Littman, M. L., Singh, S., Stone, P.
TAC was designed to create a benchmark problem in the complex domain of e-marketplaces and to motivate researchers to apply unique approaches to a common task. Their goals included providing a benchmark problem in the complex and rapidly advancing domain of e-marketplaces (Eisenberg, 2000) and motivating researchers to apply unique approaches to a common task. Another key feature of TAC was that participating agents competed against each other in a preliminary round and many practice games leading up to the nals. Thus, developers changed strategies in response to each others' agents in a sort of escalating arms race. Leading into the competition day, a wide variety of scenarios were possible. A successful agent needed to be able to perform well in any of these possible circumstances.
Asimovian Adaptive Agents
The goal of this research is to develop agents that are adaptive and predictable and timely. At first blush, these three requirements seem contradictory. For example, adaptation risks introducing undesirable side effects, thereby making agents' behavior less predictable. Furthermore, although formal verification can assist in ensuring behavioral predictability, it is known to be time-consuming. Our solution to the challenge of satisfying all three requirements is the following. Agents have finite-state automaton plans, which are adapted online via evolutionary learning (perturbation) operators. To ensure that critical behavioral constraints are always satisfied, agents' plans are first formally verified. They are then reverified after every adaptation. If reverification concludes that constraints are violated, the plans are repaired. The main objective of this paper is to improve the efficiency of reverification after learning, so that agents have a sufficiently rapid response time. We present two solutions: positive results that certain learning operators are a priori guaranteed to preserve useful classes of behavioral assurance constraints (which implies that no reverification is needed for these operators), and efficient incremental reverification algorithms for those learning operators that have negative a priori results.
OBDD-based Universal Planning for Synchronized Agents in Non-Deterministic Domains
Recently model checking representation and search techniques were shown to be efficiently applicable to planning, in particular to non-deterministic planning. Such planning approaches use Ordered Binary Decision Diagrams (OBDDs) to encode a planning domain as a non-deterministic finite automaton and then apply fast algorithms from model checking to search for a solution. OBDDs can effectively scale and can provide universal plans for complex planning domains. We are particularly interested in addressing the complexities arising in non-deterministic, multi-agent domains. In this article, we present UMOP, a new universal OBDD-based planning framework for non-deterministic, multi-agent domains. We introduce a new planning domain description language, NADL, to specify non-deterministic, multi-agent domains. The language contributes the explicit definition of controllable agents and uncontrollable environment agents. We describe the syntax and semantics of NADL and show how to build an efficient OBDD-based representation of an NADL description. The UMOP planning system uses NADL and different OBDD-based universal planning algorithms. It includes the previously developed strong and strong cyclic planning algorithms. In addition, we introduce our new optimistic planning algorithm that relaxes optimality guarantees and generates plausible universal plans in some domains where no strong nor strong cyclic solution exists. We present empirical results applying UMOP to domains ranging from deterministic and single-agent with no environment actions to non-deterministic and multi-agent with complex environment actions. UMOP is shown to be a rich and efficient planning system.
Evolutionary Algorithms for Reinforcement Learning
Grefenstette, J. J., Moriarty, D. E., Schultz, A. C.
There are two distinct approaches to solving reinforcement learning problems, namely, searching in value function space and searching in policy space. Temporal difference methods and evolutionary algorithms are well-known examples of these approaches. Kaelbling, Littman and Moore recently provided an informative survey of temporal difference methods. This article focuses on the application of evolutionary algorithms to the reinforcement learning problem, emphasizing alternative policy representations, credit assignment methods, and problem-specific genetic operators. Strengths and weaknesses of the evolutionary approach to reinforcement learning are presented, along with a survey of representative applications.
Identifying Mislabeled Training Data
The goal of this approach is to improve classication accuracies produced by learning algorithms by improving the quality of the training data. Our approach uses a set of learning algorithms to create classiers that serve as noise lters for the training data. We evaluate single algorithm, majority vote and consensus lters on ve datasets that are prone to labeling errors. Our experiments illustrate that ltering signicantly improves classication accuracy for noise levels up to 30%. An analytical and empirical evaluation of the precision of our approach shows that consensus lters are conservative at throwing away good data at the expense of retaining bad data and that majority lters are better at detecting bad data at the expense of throwing away good data. This suggests that for situations in which there is a paucity of data, consensus lters are preferable, whereas majority vote lters are preferable for situations with an abundance of data. 1. Introducti The maximum accuracy achievable depends on the quality of the data and on the appropriateness of the chosen learning algorithm for the data. The work described here focuses on improving the quality of training data by identifying and eliminating mislabeled instances prior to applying the chosen learning algorithm, thereby increasing classication accuracy. Labeling error can occur for several reasons including subjectivity, data-entry error, or inadequacy of the information used to label each object. Subjectivity may arise when observations need to be ranked in some way such as disease severity or when the information used to label an object is dierent from the information to which the learning algorithm will have access. For example, when labeling pixels in image data, the analyst typically uses visual input rather than the numeric values of the feature vector corresponding to the observation. Domains in which experts disagree are natural places for subjective labeling errors (Smyth, 1996). A third cause of labeling error arises when the information used to label each observation is inadequate. For example, in the medical domain it may not be possible to perform the tests necessary to guarantee that a diagnosis is 100% accurate. For domains in which labeling errors occur, an automated method of eliminating or correcting mislabeled observations will improve the predictive accuracy of the classier formed from the training data. In this article we address the problem of identifying training instances that are mislabeled.
Complexity of and Algorithms for Borda Manipulation
Davies, Jessica, Katsirelos, George, Narodytska, Nina, Walsh, Toby
We prove that it is NP-hard for a coalition of two manipulators to compute how to manipulate the Borda voting rule. This resolves one of the last open problems in the computational complexity of manipulating common voting rules. Because of this NP-hardness, we treat computing a manipulation as an approximation problem where we try to minimize the number of manipulators. Based on ideas from bin packing and multiprocessor scheduling, we propose two new approximation methods to compute manipulations of the Borda rule. Experiments show that these methods significantly outperform the previous best known %existing approximation method. We are able to find optimal manipulations in almost all the randomly generated elections tested. Our results suggest that, whilst computing a manipulation of the Borda rule by a coalition is NP-hard, computational complexity may provide only a weak barrier against manipulation in practice.