Europe
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field.
Interactive Cost Configuration Over Decision Diagrams
Andersen, H. R., Hadzic, T., Pisinger, D.
In many AI domains such as product configuration, a user should interactively specify a solution that must satisfy a set of constraints. In such scenarios, offline compilation of feasible solutions into a tractable representation is an important approach to delivering efficient backtrack-free user interaction online. In particular,binary decision diagrams (BDDs) have been successfully used as a compilation target for product and service configuration. In this paper we discuss how to extend BDD-based configuration to scenarios involving cost functions which express user preferences. We first show that an efficient, robust and easy to implement extension is possible if the cost function is additive, and feasible solutions are represented using multi-valued decision diagrams (MDDs). We also discuss the effect on MDD size if the cost function is non-additive or if it is encoded explicitly into MDD. We then discuss interactive configuration in the presence of multiple cost functions. We prove that even in its simplest form, multiple-cost configuration is NP-hard in the input MDD. However, for solving two-cost configuration we develop a pseudo-polynomial scheme and a fully polynomial approximation scheme. The applicability of our approach is demonstrated through experiments over real-world configuration models and product-catalogue datasets. Response times are generally within a fraction of a second even for very large instances.
Security Analysis of Online Centroid Anomaly Detection
Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). It is to be expected in such cases that learning algorithms will have to deal with manipulated data aimed at hampering decision making. Although some previous work addressed the handling of malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution we analyze the performance of a particular method -- online centroid anomaly detection -- in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, analysis of its efficiency and constraints. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly under different conditions: bounded and unbounded percentage of traffic, and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker's gain) if external constraints are properly used. Our experimental evaluation carried out on real HTTP and exploit traces confirms the tightness of our theoretical bounds and practicality of our protection mechanisms.
Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG
Jakaite, L., Schetinin, V., Maple, C.
The methodology of Bayesian Model Averaging (BMA) is applied for assessment of newborn brain maturity from sleep EEG. In theory this methodology provides the most accurate assessments of uncertainty in decisions. However, the existing BMA techniques have been shown providing biased assessments in the absence of some prior information enabling to explore model parameter space in details within a reasonable time. The lack in details leads to disproportional sampling from the posterior distribution. In case of the EEG assessment of brain maturity, BMA results can be biased because of the absence of information about EEG feature importance. In this paper we explore how the posterior information about EEG features can be used in order to reduce a negative impact of disproportional sampling on BMA performance. We use EEG data recorded from sleeping newborns to test the efficiency of the proposed BMA technique.
Syntactic Topic Models
Boyd-Graber, Jordan, Blei, David M.
The syntactic topic model (STM) is a Bayesian nonparametric model of language that discovers latent distributions of words (topics) that are both semantically and syntactically coherent. The STM models dependency parsed corpora where sentences are grouped into documents. It assumes that each word is drawn from a latent topic chosen by combining document-level features and the local syntactic context. Each document has a distribution over latent topics, as in topic models, which provides the semantic consistency. Each element in the dependency parse tree also has a distribution over the topics of its children, as in latent-state syntax models, which provides the syntactic consistency. These distributions are convolved so that the topic of each word is likely under both its document and syntactic context. We derive a fast posterior inference algorithm based on variational methods. We report qualitative and quantitative studies on both synthetic data and hand-parsed documents. We show that the STM is a more predictive model of language than current models based only on syntax or only on topics.
Predicting the Performance of IDA* using Conditional Distributions
Zahavi, U., Felner, A., Burch, N., Holte, R. C.
Korf, Reid, and Edelkamp introduced a formula to predict the number of nodes IDA* will expand on a single iteration for a given consistent heuristic, and experimentally demonstrated that it could make very accurate predictions. In this paper we show that, in addition to requiring the heuristic to be consistent, their formula's predictions are accurate only at levels of the brute-force search tree where the heuristic values obey the unconditional distribution that they defined and then used in their formula. We then propose a new formula that works well without these requirements, i.e., it can make accurate predictions of IDA*'s performance for inconsistent heuristics and if the heuristic values in any level do not obey the unconditional distribution. In order to achieve this we introduce the conditional distribution of heuristic values which is a generalization of their unconditional heuristic distribution. We also provide extensions of our formula that handle individual start states and the augmentation of IDA* with bidirectional pathmax (BPMX), a technique for propagating heuristic values when inconsistent heuristics are used. Experimental results demonstrate the accuracy of our new method and all its variations.
Convergence of Bayesian Control Rule
Ortega, Pedro A., Braun, Daniel A.
Recently, new approaches to adaptive control have sought to reformulate the problem as a minimization of a relative entropy criterion to obtain tractable solutions. In particular, it has been shown that minimizing the expected deviation from the causal input-output dependencies of the true plant leads to a new promising stochastic control rule called the Bayesian control rule. This work proves the convergence of the Bayesian control rule under two sufficient assumptions: boundedness, which is an ergodicity condition; and consistency, which is an instantiation of the sure-thing principle.
Using ATL to define advanced and flexible constraint model transformations
Chenouard, Raphael, Granvilliers, Laurent, Soto, Ricardo
Transforming constraint models is an important task in re- cent constraint programming systems. User-understandable models are defined during the modeling phase but rewriting or tuning them is manda- tory to get solving-efficient models. We propose a new architecture al- lowing to define bridges between any (modeling or solver) languages and to implement model optimizations. This architecture follows a model- driven approach where the constraint modeling process is seen as a set of model transformations. Among others, an interesting feature is the def- inition of transformations as concept-oriented rules, i.e. based on types of model elements where the types are organized into a hierarchy called a metamodel.
Rewriting Constraint Models with Metamodels
Chenouard, Raphael, Granvilliers, Laurent, Soto, Ricardo
An important challenge in constraint programming is to rewrite constraint models into executable programs calculat- ing the solutions. This phase of constraint processing may require translations between constraint programming lan- guages, transformations of constraint representations, model optimizations, and tuning of solving strategies. In this paper, we introduce a pivot metamodel describing the common fea- tures of constraint models including different kinds of con- straints, statements like conditionals and loops, and other first-class elements like object classes and predicates. This metamodel is general enough to cope with the constructions of many languages, from object-oriented modeling languages to logic languages, but it is independent from them. The rewriting operations manipulate metamodel instances apart from languages. As a consequence, the rewriting operations apply whatever languages are selected and they are able to manage model semantic information. A bridge is created between the metamodel space and languages using parsing techniques. Tools from the software engineering world can be useful to implement this framework.
Operator norm convergence of spectral clustering on level sets
Pelletier, Bruno, Pudlo, Pierre
The aim of data clustering, or unsupervised classification, is to partition a data set into several homogeneous groups relatively separated one from each other with respect to a certain distance or notion of similarity. There exists an extensive literature on clustering methods, and we refer the reader to Anderberg [1973], Hartigan [1975], McLachlan and Peel [2000], Chapter 10 in Duda et al. [2000], and Chapter 14 in Hastie et al. [2001] for general materials on the subject. In particular, popular clustering algorithms, such as Gaussian mixture models or k-means, have proved useful in a number of applications, yet they suffer from some internal and computational limitations. Indeed, the parametric assumption at the core of mixture models may be too stringent, while the standard k-means algorithm fails at identifying complex shaped, possibly non-convex, clusters. The class of spectral clustering algorithms is presently emerging as a promising alternative, showing improved performance over classical clustering algorithms on several benchmark problems and applications; see e.g., Ng et al. [2002], von Luxburg [2007]. An overview of spectral clustering algorithms may be found in von Luxburg [2007], and connections with kernel methods are exposed in Fillipone et al. [2008]. The spectral clustering algorithm amounts at embedding the data into a feature space by using the eigenvectors of the similarity matrix in such a way that the clusters may be separated using simple rules, e.g. a separation by hyperplanes. The core component of the spectral clustering algorithm is therefore the similarity matrix, or certain normalizations of it, generally called graph Laplacian matrices; see Chung [1997]. Graph Laplacian matrices may be viewed as discrete versions of bounded operators between functional spaces.