Technology
Rollout Sampling Approximate Policy Iteration
Dimitrakakis, Christos, Lagoudakis, Michail G.
Supervised and reinforcement learning are two well-known learning paradigms, which have been researched mostly independently. Recent studies have investigated the use of supervised learning methods for reinforcement learning, either for value function Lagoudakis and Parr(2003a); Riedmiller(2005) or policy representation Lagoudakis and Parr(2003b); Fern et al.(2004); Langford and Zadrozny (2005). Initial results have shown that policies can be approximately represented using either multi-class classifiers or combinations of binary classifiers Rexakis and Lagoudakis (2008) and, therefore, it is possible to incorporate classification algorithms within the inner loops of several reinforcement learning algorithms Lagoudakis and Parr (2003b); Fern et al. (2004). This viewpoint allows the quantification of the performance of reinforcement learning algorithms in terms of the performance of classification algorithms Langford and Zadrozny (2005). While a variety of promising combinations become possible through this synergy, heretofore there have been limited practical and widely-applicable algorithms. Our work builds on the work of Lagoudakis and Parr Lagoudakis and Parr (2003b) who suggested an approximate policy iteration algorithm for learning a good policy represented as a classifier, avoiding representations of any kind of value function. At each iteration, a new policy/classifier is produced using 1 training data obtained through extensive simulation (rollouts) of the previous policy on a generative model of the process. These rollouts aim at identifying better action choices over a subset of states in order to form a set of data for training the classifier representing the improved policy. A similar algorithm was proposed by Fern et al.
A New Approach to Automated Epileptic Diagnosis Using EEG and Probabilistic Neural Network
Bao, Forrest Sheng, Lie, Donald Yu-Chun, Zhang, Yuanlin
Epilepsy is one of the most common neurological disorders that greatly impair patient' daily lives. Traditional epileptic diagnosis relies on tedious visual screening by neurologists from lengthy EEG recording that requires the presence of seizure (ictal) activities. Nowadays, there are many systems helping the neurologists to quickly find interesting segments of the lengthy signal by automatic seizure detection. However, we notice that it is very difficult, if not impossible, to obtain long-term EEG data with seizure activities for epilepsy patients in areas lack of medical resources and trained neurologists. Therefore, we propose to study automated epileptic diagnosis using interictal EEG data that is much easier to collect than ictal data. The authors are not aware of any report on automated EEG diagnostic system that can accurately distinguish patients' interictal EEG from the EEG of normal people. The research presented in this paper, therefore, aims to develop an automated diagnostic system that can use interictal EEG data to diagnose whether the person is epileptic. Such a system should also detect seizure activities for further investigation by doctors and potential patient monitoring. To develop such a system, we extract four classes of features from the EEG data and build a Probabilistic Neural Network (PNN) fed with these features. Leave-one-out cross-validation (LOO-CV) on a widely used epileptic-normal data set reflects an impressive 99.5% accuracy of our system on distinguishing normal people's EEG from patient's interictal EEG. We also find our system can be used in patient monitoring (seizure detection) and seizure focus localization, with 96.7% and 77.5% accuracy respectively on the data set.
Belief decision support and reject for textured images characterization
The textured images' classification assumes to consider the images in terms of area with the same texture. In uncertain environment, it could be better to take an imprecise decision or to reject the area corresponding to an unlearning class. Moreover, on the areas that are the classification units, we can have more than one texture. These considerations allows us to develop a belief decision model permitting to reject an area as unlearning and to decide on unions and intersections of learning classes. The proposed approach finds all its justification in an application of seabed characterization from sonar images, which contributes to an illustration.
Modeling belief systems with scale-free networks
Evolution of belief systems has always been in focus of cognitive research. In this paper we delineate a new model describing belief systems as a network of statements considered true. Testing the model a small number of parameters enabled us to reproduce a variety of well-known mechanisms ranging from opinion changes to development of psychological problems. The self-organizing opinion structure showed a scale-free degree distribution. The novelty of our work lies in applying a convenient set of definitions allowing us to depict opinion network dynamics in a highly favorable way, which resulted in a scale-free belief network. As an additional benefit, we listed several conjectural consequences in a number of areas related to thinking and reasoning.
Sparse Online Learning via Truncated Gradient
Langford, John, Li, Lihong, Zhang, Tong
We propose a general method called truncated gradient to induce sparsity in the weights of online learning algorithms with convex loss functions. This method has several essential properties: The degree of sparsity is continuous -- a parameter controls the rate of sparsification from no sparsification to total sparsification. The approach is theoretically motivated, and an instance of it can be regarded as an online counterpart of the popular $L_1$-regularization method in the batch setting. We prove that small rates of sparsification result in only small additional regret with respect to typical online learning guarantees. The approach works well empirically. We apply the approach to several datasets and find that for datasets with large numbers of features, substantial sparsity is discoverable.
Unveiling the mystery of visual information processing in human brain
It is generally accepted that human vision is an extremely powerful information processing system that facilitates our interaction with the surrounding world. However, despite extended and extensive research efforts, which encompass many exploration fields, the underlying fundamentals and operational principles of visual information processing in human brain remain unknown. We still are unable to figure out where and how along the path from eyes to the cortex the sensory input perceived by the retina is converted into a meaningful object representation, which can be consciously manipulated by the brain. Studying the vast literature considering the various aspects of brain information processing, I was surprised to learn that the respected scholarly discussion is totally indifferent to the basic keynote question: "What is information?" in general or "What is visual information?" in particular. In the old days, it was assumed that any scientific research approach has first to define its basic departure points. Why was it overlooked in brain information processing research remains a conundrum. In this paper, I am trying to find a remedy for this bizarre situation. I propose an uncommon definition of "information", which can be derived from Kolmogorov's Complexity Theory and Chaitin's notion of Algorithmic Information. Embracing this new definition leads to an inevitable revision of traditional dogmas that shape the state of the art of brain information processing research. I hope this revision would better serve the challenging goal of human visual information processing modeling.
A General Theory of Additive State Space Abstractions
Yang, F., Culberson, J., Holte, R., Zahavi, U., Felner, A.
Informally, a set of abstractions of a state space S is additive if the distance between any two states in S is always greater than or equal to the sum of the corresponding distances in the abstract spaces. The first known additive abstractions, called disjoint pattern databases, were experimentally demonstrated to produce state of the art performance on certain state spaces. However, previous applications were restricted to state spaces with special properties, which precludes disjoint pattern databases from being defined for several commonly used testbeds, such as Rubik's Cube, TopSpin and the Pancake puzzle. In this paper we give a general definition of additive abstractions that can be applied to any state space and prove that heuristics based on additive abstractions are consistent as well as admissible. We use this new definition to create additive abstractions for these testbeds and show experimentally that well chosen additive abstractions can reduce search time substantially for the (18,4)-TopSpin puzzle and by three orders of magnitude over state of the art methods for the 17-Pancake puzzle. We also derive a way of testing if the heuristic value returned by additive abstractions is provably too low and show that the use of this test can reduce search time for the 15-puzzle and TopSpin by roughly a factor of two.
A Unifying Framework for Structural Properties of CSPs: Definitions, Complexity, Tractability
Bordeaux, L., Cadoli, M., Mancini, T.
Literature on Constraint Satisfaction exhibits the definition of several ``structural'' properties that can be possessed by CSPs, like (in)consistency, substitutability or interchangeability. Current tools for constraint solving typically detect such properties efficiently by means of incomplete yet effective algorithms, and use them to reduce the search space and boost search. In this paper, we provide a unifying framework encompassing most of the properties known so far, both in CSP and other fields' literature, and shed light on the semantical relationships among them. This gives a unified and comprehensive view of the topic, allows new, unknown, properties to emerge, and clarifies the computational complexity of the various detection problems. In particular, among the others, two new concepts, fixability and removability emerge, that come out to be the ideal characterisations of values that may be safely assigned or removed from a variable's domain, while preserving problem satisfiability. These two notions subsume a large number of known properties, including inconsistency, substitutability and others. Because of the computational intractability of all the property-detection problems, by following the CSP approach we then determine a number of relaxations which provide sufficient conditions for their tractability. In particular, we exploit forms of language restrictions and local reasoning.
SATzilla: Portfolio-based Algorithm Selection for SAT
Xu, L., Hutter, F., Hoos, H. H., Leyton-Brown, K.
It has been widely observed that there is no single "dominant" SAT solver; instead, different solvers perform best on different instances. Rather than following the traditional approach of choosing the best solver for a given class of instances, we advocate making this decision online on a per-instance basis. Building on previous work, we describe SATzilla, an automated approach for constructing per-instance algorithm portfolios for SAT that use so-called empirical hardness models to choose among their constituent solvers. This approach takes as input a distribution of problem instances and a set of component solvers, and constructs a portfolio optimizing a given objective function (such as mean runtime, percent of instances solved, or score in a competition). The excellent performance of SATzilla was independently verified in the 2007 SAT Competition, where our SATzilla07 solvers won three gold, one silver and one bronze medal. In this article, we go well beyond SATzilla07 by making the portfolio construction scalable and completely automated, and improving it by integrating local search solvers as candidate solvers, by predicting performance score instead of runtime, and by using hierarchical hardness models that take into account different types of SAT instances. We demonstrate the effectiveness of these new techniques in extensive experimental results on data sets including instances from the most recent SAT competition.
A new Hedging algorithm and its application to inferring latent random variables
We present a new online learning algorithm for cumulative discounted gain. This learning algorithm does not use exponential weights on the experts. Instead, it uses a weighting scheme that depends on the regret of the master algorithm relative to the experts. In particular, experts whose discounted cumulative gain is smaller (worse) than that of the master algorithm receive zero weight. We also sketch how a regret-based algorithm can be used as an alternative to Bayesian averaging in the context of inferring latent random variables.