Europe
Q-Error as a Selection Mechanism in Modular Reinforcement-Learning Systems
Ring, Mark B. (IDSIA) | Schaul, Tom (IDSIA)
This paper introduces a novel multi-modular method for reinforcement learning.A multi-modular system is one that partitions the learning task among a set of experts (modules), where each expert is incapable of solving the entire task by itself.There are many advantages to splitting up large tasks in this way, but existing methods face difficulties when choosing which module(s) should contribute to the agent's actions at any particular moment.We introduce a novel selection mechanism where every module, besides calculating a set of action values, also estimates its own error for the current input.The selection mechanism combines each module's estimate of long-term reward and self-error to produce a score by which the next module is chosen.As a result, the modules can use their resources effectively and efficiently divide up the task.The system is shown to learn complex tasks even when the individual modules use only linear function approximators.
Strategy Learning for Autonomous Agents in Smart Grid Markets
Reddy, Prashant P. (Carnegie Mellon University) | Veloso, Manuela M. (Carnegie Mellon University)
Distributed electricity producers, such as small wind farms and solar installations, pose several technical and economic challenges in Smart Grid design. One approach to addressing these challenges is through Broker Agents who buy electricity from distributed producers, and also sell electricity to consumers, via a Tariff Market--a new market mechanism where Broker Agents publish concurrent bid and ask prices. We investigate the learning of pricing strategies for an autonomous Broker Agent to profitably participate in a Tariff Market. We employ Markov Decision Processes (MDPs) and reinforcement learning. An important concern with this method is that even simple representations of the problem domain result in very large numbers of states in the MDP formulation because market prices can take nearly arbitrary real values. In this paper, we present the use of derived state space features, computed using statistics on Tariff Market prices and Broker Agent customer portfolios, to obtain a scalable state representation. We also contribute a set of pricing tactics that form building blocks in the learned Broker Agent strategy. We further present a Tariff Market simulation model based on real-world data and anticipated market dynamics. We use this model to obtain experimental results that show the learned strategy performing vastly better than a random strategy and significantly better than two other non-learning strategies.
Biclustering-Driven Ensemble of Bayesian Belief Network Classifiers for Underdetermined Problems
Pansombut, Tatdow (North Carolina State University, Oak Ridge National Laboratory) | Hendrix, William (North Carolina State University, Oak Ridge National Laboratory) | Gao, Zekai J. (Zhejiang University) | Harrison, Brent E. (North Carolina State University, Oak Ridge National Laboratory) | Samatova, Nagiza F. (North Carolina State University, Oak Ridge National Laboratory)
In this paper, we present BENCH (BiclusteringdrivenENsemble of Classifiers), an algorithm toconstruct an ensemble of classifiers through concurrentfeature and data point selection guided byunsupervised knowledge obtained from biclustering.BENCH is designed for underdeterminedproblems. In our experiments, we use Bayesian BeliefNetwork (BBN) classifiers as base classifiers inthe ensemble; however, BENCH can be applied toother classification models as well. We show thatBENCH is able to increase prediction accuracy ofa single classifier and traditional ensemble of classifiersby up to 15% on three microarray datasetsusing various weighting schemes for combining individualpredictions in the ensemble.
Robust Principal Component Analysis with Non-Greedy ℓ1-Norm Maximization
Nie, Feiping (University of Texas at Arlington) | Huang, Heng (University of Texas at Arlington) | Ding, Chris (University of Texas at Arlington) | Luo, Dijun (University of Texas at Arlington) | Wang, Hua (University of Texas at Arlington)
Principal Component Analysis (PCA) is one of the most important methods to handle high-dimensional data. However, the high computa-tional complexity makes it hard to apply to the large scale data with high dimensionality, and the used ℓ2-norm makes it sensitive to outliers. A recent work proposed principal component analysis based on ℓ1-norm maximization, which is efficient and robust to outliers. In that work, a greedy strategy was applied due to the difficulty of directly solving the ℓ1-norm maximization problem, which is easy to get stuck in local solution. In this paper, we first propose an efficient optimization algorithm to solve a general ℓ1-norm maximization problem, and then propose a robust principal component analysis with non-greedy ℓ1-norm maximization. Experimental results on real world datasets show that the non-greedy method always obtains much better solution than that of the greedy method.
Distribution-Aware Online Classifiers
Nguyen, Tam T. (Nanyang Technological University) | Chang, Kuiyu (Nanyang Technological University) | Hui, Cheung Siu (Nanyang Technological University)
We propose a family of Passive-Aggressive Mahalanobis (PAM) algorithms, which are incremental (online) binary classifiers that consider the distribution of data. PAM is in fact a generalization of the Passive-Aggressive (PA) algorithms to handle data distributions that can be represented by a covariance matrix. The update equations for PAM are derived and theoretical error loss bounds computed. We benchmarked PAM against the original PA-I, PA-II, and Confidence Weighted (CW) learning. Although PAM somewhat resembles CW in its update equations, PA minimizes differences in the weights while CW minimizes differences in weight distributions. Results on 8 classification datasets, which include a real-life micro-blog sentiment classification task, show that PAM consistently outperformed its competitors, most notably CW. This shows that a simple approach like PAM is more practical in real-life classification tasks, compared to more elegant and sophisticated approaches like CW.
Imitation Learning in Relational Domains: A Functional-Gradient Boosting Approach
Natarajan, Sriraam (Wake Forest University School of Medicine) | Joshi, Saket (Oregon State University) | Tadepalli, Prasad (Oregon State University) | Kersting, Kristian (Fraunhofer IAIS) | Shavlik, Jude (University of Wisconsin-Madiso)
Imitation learning refers to the problem of learning how to behave by observinga teacher in action. We consider imitation learning in relational domains, in which there is a varying number of objects and relations among them. In prior work, simple relational policies are learned by viewing imitation learning as supervised learning of a function from states to actions. For propositional worlds, functional gradient methods have been proved to be beneficial. They are simpler to implement than most existing methods, more efficient, more naturally satisfy common constraints on the cost function, and better represent our prior beliefs about the form of the function. Building on recent generalizations of functional gradient boosting to relational representations, we implement a functional gradient boosting approach to imitation learning in relational domains. In particular, given a set of traces from the human teacher, our system learns a policy in the form of a set of relational regression trees that additively approximate the functional gradients. The use of multiple additive trees combined with relational representation allows for learning more expressive policies than what has been done before. We demonstrate the usefulness of our approach in several different domains.
Agent-Oriented Incremental Team and Activity Recognition
Masato, Daniele (University of Aberdeen) | Norman, Timothy J. (University of Aberdeen) | Vasconcelos, Wamberto W. (University of Aberdeen) | Sycara, Katia (Carnegie Mellon University)
Monitoring team activity is beneficial when human teams cooperate in the enactment of a joint plan. Monitoring allows teams to maintain awareness of each other's progress within the plan and it enables anticipation of information needs. Humans find this difficult, particularly in time-stressed and uncertain environments. In this paper we introduce a probabilistic model, based on Conditional Random Fields, to automatically recognise the composition of teams and the team activities in relation to a plan. The team composition and activities are recognised incrementally by interpreting a stream of spatio-temporal observations.
Probit Classifiers with a Generalized Gaussian Scale Mixture Prior
Liu, Guoqing (Nanyang Technological University) | Wu, Jianxin (Nanyang Technological University) | Zhou, Suiping (Teesside University)
Most of the existing probit classifiers are based on sparsity-oriented modeling. However, we show that sparsity is not always desirable in practice, and only an appropriate degree of sparsity is profitable. In this work, we propose a flexible probabilistic model using a generalized Gaussian scale mixture prior that can promote an appropriate degree of sparsity for its model parameters, and yield either sparse or non-sparse estimates according to the intrinsic sparsity of features in a dataset. Model learning is carried out by an efficient modified maximum a posteriori (MAP) estimate. We also show relationships of the proposed model to existing probit classifiers as well as iteratively re-weighted l1 and l2 minimizations. Experiments demonstrate that the proposed method has better or comparable performances in feature selection for linear classifiers as well as in kernel-based classification.
Modular Community Detection in Networks
Li, Wenye (Macao Polytechnic Institute) | Schuurmans, Dale (University of Alberta)
Network community detection — the problem of dividing a network of interest into clusters for intelligent analysis — has recently attracted significant attention in diverse fields of research. To discover intrinsic community structure a quantitative measure called modularity has been widely adopted as an optimization objective. Unfortunately, modularity is inherently NP-hard to optimize and approximate solutions must be sought if tractability is to be ensured. In practice, a spectral relaxation method is most often adopted, after which a community partition is recovered from relaxed fractional values by a rounding process. In this paper, we propose an iterative rounding strategy for identifying the partition decisions that is coupled with a fast constrained power method that sequentially achieves tighter spectral relaxations. Extensive evaluation with this coupled relaxation-rounding method demonstrates consistent and sometimes dramatic improvements in the modularity of the communities discovered.
Learning Hash Functions for Cross-View Similarity Search
Kumar, Shaishav (Microsoft Research India) | Udupa, Raghavendra (Microsoft Research India)
Many applications in Multilingual and Multimodal Information Access involve searching large databases of high dimensional data objects with multiple (conditionally independent) views. In this work we consider the problem of learning hash functions for similarity search across the views for such applications. We propose a principled method for learning a hash function for each view given a set of multiview training data objects. The hash functions map similar objects to similar codes across the views thus enabling cross-view similarity search. We present results from an extensive empirical study of the proposed approach which demonstrate its effectiveness on Japanese language People Search and Multilingual People Search problems.