Country
Model-based Utility Functions
Orseau and Ring, as well as Dewey, have recently described problems, including self-delusion, with the behavior of agents using various definitions of utility functions. An agent's utility function is defined in terms of the agent's history of interactions with its environment. This paper argues, via two examples, that the behavior problems can be avoided by formulating the utility function in two steps: 1) inferring a model of the environment from interactions, and 2) computing utility as a function of the environment model. Basing a utility function on a model that the agent must learn implies that the utility function must initially be expressed in terms of specifications to be matched to structures in the learned model. These specifications constitute prior assumptions about the environment so this approach will not work with arbitrary environments. But the approach should work for agents designed by humans to act in the physical world. The paper also addresses the issue of self-modifying agents and shows that if provided with the possibility to modify their utility functions agents will not choose to do so, under some usual assumptions.
An improved approach to attribute reduction with covering rough sets
Wang, Changzhong, Sun, Baiqing, Hu, Qinhua
Attribute reduction is viewed as an important preprocessing step for pattern recognition and data mining. Most of researches are focused on attribute reduction by using rough sets. Recently, Tsang et al. discussed attribute reduction with covering rough sets in the paper [E. C.C. Tsang, D. Chen, Daniel S. Yeung, Approximations and reducts with covering generalized rough sets, Computers and Mathematics with Applications 56 (2008) 279-289], where an approach based on discernibility matrix was presented to compute all attribute reducts. In this paper, we provide an improved approach by constructing simpler discernibility matrix with covering rough sets, and then proceed to improve some characterizations of attribute reduction provided by Tsang et al. It is proved that the improved discernible matrix is equivalent to the old one, but the computational complexity of discernible matrix is greatly reduced.
A Discussion on Parallelization Schemes for Stochastic Vector Quantization Algorithms
Durut, Matthieu, Patra, Benoรฎt, Rossi, Fabrice
This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better performances than the sequential algorithm. Another distributed scheme is therefore introduced which obtains the expected speed-ups. Then, it is improved to fit implementation on distributed architectures where communications are slow and inter-machines synchronization too costly. The schemes are tested with simulated distributed architectures and, for the last one, with Microsoft Windows Azure platform obtaining speed-ups up to 32 Virtual Machines.
Using the Gene Ontology Hierarchy when Predicting Gene Function
Mostafavi, Sara, Morris, Quaid
The problem of multilabel classification when the labels are related through a hierarchical categorization scheme occurs in many application domains such as computational biology. For example, this problem arises naturally when trying to automatically assign gene function using a controlled vocabularies like Gene Ontology. However, most existing approaches for predicting gene functions solve independent classification problems to predict genes that are involved in a given function category, independently of the rest. Here, we propose two simple methods for incorporating information about the hierarchical nature of the categorization scheme. In the first method, we use information about a gene's previous annotation to set an initial prior on its label. In a second approach, we extend a graph-based semi-supervised learning algorithm for predicting gene function in a hierarchy. We show that we can efficiently solve this problem by solving a linear system of equations. We compare these approaches with a previous label reconciliation-based approach. Results show that using the hierarchy information directly, compared to using reconciliation methods, improves gene function prediction.
Improving Compressed Counting
Compressed Counting (CC) [22] was recently proposed for estimating the ath frequency moments of data streams, where 0 < a <= 2. CC can be used for estimating Shannon entropy, which can be approximated by certain functions of the ath frequency moments as a -> 1. Monitoring Shannon entropy for anomaly detection (e.g., DDoS attacks) in large networks is an important task. This paper presents a new algorithm for improving CC. The improvement is most substantial when a -> 1--. For example, when a = 0:99, the new algorithm reduces the estimation variance roughly by 100-fold. This new algorithm would make CC considerably more practical for estimating Shannon entropy. Furthermore, the new algorithm is statistically optimal when a = 0.5.
Bayesian Discovery of Linear Acyclic Causal Models
Hoyer, Patrik O., Hyttinen, Antti
Methods for automated discovery of causal relationships from non-interventional data have received much attention recently. A widely used and well understood model family is given by linear acyclic causal models (recursive structural equation models). For Gaussian data both constraint-based methods (Spirtes et al., 1993; Pearl, 2000) (which output a single equivalence class) and Bayesian score-based methods (Geiger and Heckerman, 1994) (which assign relative scores to the equivalence classes) are available. On the contrary, all current methods able to utilize non-Gaussianity in the data (Shimizu et al., 2006; Hoyer et al., 2008) always return only a single graph or a single equivalence class, and so are fundamentally unable to express the degree of certainty attached to that output. In this paper we develop a Bayesian score-based approach able to take advantage of non-Gaussianity when estimating linear acyclic causal models, and we empirically demonstrate that, at least on very modest size networks, its accuracy is as good as or better than existing methods. We provide a complete code package (in R) which implements all algorithms and performs all of the analysis provided in the paper, and hope that this will further the application of these methods to solving causal inference problems.
Alternating Projections for Learning with Expectation Constraints
Bellare, Kedar, Druck, Gregory, McCallum, Andrew
We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints. We optimize this objective function using a procedure that alternates between information and moment projections. Our method provides an alternate interpretation of the posterior regularization framework (Graca et al., 2008), maintains uncertainty during optimization unlike constraint-driven learning (Chang et al., 2007), and is more efficient than generalized expectation criteria (Mann & McCallum, 2008). Applications of this framework include minimally supervised learning, semisupervised learning, and learning with constraints that are more expressive than the underlying model. In experiments, we demonstrate comparable accuracy to generalized expectation criteria for minimally supervised learning, and use expressive structural constraints to guide semi-supervised learning, providing a 3%-6% improvement over stateof-the-art constraint-driven learning.
Products of Hidden Markov Models: It Takes N>1 to Tango
Taylor, Graham W, Hinton, Geoffrey E.
Products of Hidden Markov Models(PoHMMs) are an interesting class of generative models which have received little attention since their introduction. This maybe in part due to their more computationally expensive gradient-based learning algorithm,and the intractability of computing the log likelihood of sequences under the model. In this paper, we demonstrate how the partition function can be estimated reliably via Annealed Importance Sampling. We perform experiments using contrastive divergence learning on rainfall data and data captured from pairs of people dancing. Our results suggest that advances in learning and evaluation for undirected graphical models and recent increases in available computing power make PoHMMs worth considering for complex time-series modeling tasks.
Multiple Source Adaptation and the Renyi Divergence
Mansour, Yishay, Mohri, Mehryar, Rostamizadeh, Afshin
This paper presents a novel theoretical study of the general problem of multiple source adaptation using the notion of Renyi divergence. Our results build on our previous work [12], but significantly broaden the scope of that work in several directions. We extend previous multiple source loss guarantees based on distribution weighted combinations to arbitrary target distributions P, not necessarily mixtures of the source distributions, analyze both known and unknown target distribution cases, and prove a lower bound. We further extend our bounds to deal with the case where the learner receives an approximate distribution for each source instead of the exact one, and show that similar loss guarantees can be achieved depending on the divergence between the approximate and true distributions. We also analyze the case where the labeling functions of the source domains are somewhat different. Finally, we report the results of experiments with both an artificial data set and a sentiment analysis task, showing the performance benefits of the distribution weighted combinations and the quality of our bounds based on the Renyi divergence.
Which Spatial Partition Trees are Adaptive to Intrinsic Dimension?
Verma, Nakul, Kpotufe, Samory, Dasgupta, Sanjoy
Recent theory work has found that a special type of spatial partition tree - called a random projection tree - is adaptive to the intrinsic dimension of the data from which it is built. Here we examine this same question, with a combination of theory and experiments, for a broader class of trees that includes k-d trees, dyadic trees, and PCA trees. Our motivation is to get a feel for (i) the kind of intrinsic low dimensional structure that can be empirically verified, (ii) the extent to which a spatial partition can exploit such structure, and (iii) the implications for standard statistical tasks such as regression, vector quantization, and nearest neighbor search.