Country
Achieving compositionality of the stable model semantics for Smodels programs
Oikarinen, Emilia, Janhunen, Tomi
In this paper, a Gaifman-Shapiro-style module architecture is tailored to the case of Smodels programs under the stable model semantics. The composition of Smodels program modules is suitably limited by module conditions which ensure the compatibility of the module system with stable models. Hence the semantics of an entire Smodels program depends directly on stable models assigned to its modules. This result is formalized as a module theorem which truly strengthens Lifschitz and Turner's splitting-set theorem for the class of Smodels programs. To streamline generalizations in the future, the module theorem is first proved for normal programs and then extended to cover Smodels programs using a translation from the latter class of programs to the former class. Moreover, the respective notion of module-level equivalence, namely modular equivalence, is shown to be a proper congruence relation: it is preserved under substitutions of modules that are modularly equivalent. Principles for program decomposition are also addressed. The strongly connected components of the respective dependency graph can be exploited in order to extract a module structure when there is no explicit a priori knowledge about the modules of a program. The paper includes a practical demonstration of tools that have been developed for automated (de)composition of Smodels programs. To appear in Theory and Practice of Logic Programming.
On the Use of Automatically Acquired Examples for All-Nouns Word Sense Disambiguation
Martinez, D., Lopez de Lacalle, O., Agirre, E.
This article focuses on Word Sense Disambiguation (WSD), which is a Natural Language Processing task that is thought to be important for many Language Technology applications, such as Information Retrieval, Information Extraction, or Machine Translation. One of the main issues preventing the deployment of WSD technology is the lack of training examples for Machine Learning systems, also known as the Knowledge Acquisition Bottleneck. A method which has been shown to work for small samples of words is the automatic acquisition of examples. We have previously shown that one of the most promising example acquisition methods scales up and produces a freely available database of 150 million examples from Web snippets for all polysemous nouns in WordNet. This paper focuses on the issues that arise when using those examples, all alone or in addition to manually tagged examples, to train a supervised WSD system for all nouns. The extensive evaluation on both lexical-sample and all-words Senseval benchmarks shows that we are able to improve over commonly used baselines and to achieve top-rank performance. The good use of the prior distributions from the senses proved to be a crucial factor.
Clustering of discretely observed diffusion processes
De Gregorio, Alessandro, Iacus, Stefano Maria
In this paper a new dissimilarity measure to identify groups of assets dynamics is proposed. The underlying generating process is assumed to be a diffusion process solution of stochastic differential equations and observed at discrete time. The mesh of observations is not required to shrink to zero. As distance between two observed paths, the quadratic distance of the corresponding estimated Markov operators is considered. Analysis of both synthetic data and real financial data from NYSE/NASDAQ stocks, give evidence that this distance seems capable to catch differences in both the drift and diffusion coefficients contrary to other commonly used metrics.
ICE: An Expressive Iterative Combinatorial Exchange
Lubin, B., Juda, A. I., Cavallo, R., Lahaie, S., Shneidman, J., Parkes, D. C.
We present the design and analysis of the first fully expressive, iterative combinatorial exchange (ICE). The exchange incorporates a tree-based bidding language (TBBL) that is concise and expressive for CEs. Bidders specify lower and upper bounds in TBBL on their value for different trades and refine these bounds across rounds. These bounds allow price discovery and useful preference elicitation in early rounds, and allow termination with an efficient trade despite partial information on bidder valuations. All computation in the exchange is carefully optimized to exploit the structure of the bid-trees and to avoid enumerating trades. A proxied interpretation of a revealed-preference activity rule, coupled with simple linear prices, ensures progress across rounds. The exchange is fully implemented, and we give results demonstrating several aspects of its scalability and economic properties with simulated bidding strategies.
Finding rare objects and building pure samples: Probabilistic quasar classification from low resolution Gaia spectra
Bailer-Jones, C. A. L., Smith, K. W., Tiede, C., Sordo, R., Vallenari, A.
We develop and demonstrate a probabilistic method for classifying rare objects in surveys with the particular goal of building very pure samples. It works by modifying the output probabilities from a classifier so as to accommodate our expectation (priors) concerning the relative frequencies of different classes of objects. We demonstrate our method using the Discrete Source Classifier, a supervised classifier currently based on Support Vector Machines, which we are developing in preparation for the Gaia data analysis. DSC classifies objects using their very low resolution optical spectra. We look in detail at the problem of quasar classification, because identification of a pure quasar sample is necessary to define the Gaia astrometric reference frame. By varying a posterior probability threshold in DSC we can trade off sample completeness and contamination. We show, using our simulated data, that it is possible to achieve a pure sample of quasars (upper limit on contamination of 1 in 40,000) with a completeness of 65% at magnitudes of G=18.5, and 50% at G=20.0, even when quasars have a frequency of only 1 in every 2000 objects. The star sample completeness is simultaneously 99% with a contamination of 0.7%. Including parallax and proper motion in the classifier barely changes the results. We further show that not accounting for class priors in the target population leads to serious misclassifications and poor predictions for sample completeness and contamination. (Truncated)
Anytime Induction of Low-cost, Low-error Classifiers: a Sampling-based Approach
Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex real-world applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning and classification. In this work we introduce ACT (anytime cost-sensitive tree learner), a novel framework for operating in such complex environments. ACT is an anytime algorithm that allows learning time to be increased in return for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques, ACT approximates the cost of the subtree under each candidate split and favors the one with a minimal cost. As a stochastic algorithm, ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare ACT to the state-of-the-art cost-sensitive tree learners. The results show that for the majority of domains ACT produces significantly less costly trees. ACT also exhibits good anytime behavior with diminishing returns.
Finding links and initiators: a graph reconstruction problem
Mannila, Heikki, Terzi, Evimaria
Analyzing 0-1 matrices is one of the main themes in data mining. Techniques such as clustering or mixture modelling, matrix decomposition techniques such as PCA, ICA, and NMR, and Bayesian all aim to give an answer to the informal question: "Where does the matrix come from?" These approaches aim at describing a probabilistic generative model that describes the observed matrix well. In this paper we consider yet another way of answering the question "Where does a 0-1 matrix M come from?" In our model, the matrix M of size n m is considered to arise from initiators, certain few entries that are initially 1. The initiators propagate their 1's by following the links of a directed influence graph G (represented by an n n adjacency matrix). We denote the initiator matrix of size n m by N and we use G (of size n n) to refer both to the directed graph between the rows of M and as well as its adjacency matrix. Then, we believe that the structure of N and G can tell how a matrix M has been created.
Collective Classification in Network Data
Sen, Prithviraj (University of Maryland) | Namata, Galileo (University of Maryland) | Bilgic, Mustafa (University of Maryland) | Getoor, Lise (University of Maryland) | Galligher, Brian (University of Maryland) | Eliassi-Rad, Tina (University of Maryland)
Many real-world applications produce networked data such as the world-wide web (hypertext documents connected via hyperlinks), social networks (for example, people connected by friendship links), communication networks (computers connected via communication links) and biological networks (for example, protein interaction networks). A recent focus in machine learning research has been to extend traditional machine learning classification techniques to classify nodes in such networks. In this article, we provide a brief introduction to this area of research and how it has progressed during the past decade. We introduce four of the most widely used inference algorithms for classifying networked data and empirically compare them on both synthetic and real-world data.
Calendar of Events
RuleML-2008 will be held October 30-31, 2008 in Orlando, Florida, USA. ICAART 2009 will be held January 19-21, 2009, in Porto, Portugal. This page includes all the AAAI sponsored conferences, conferences presented by AAAI Affiliates, and conferences held in cooperation with AAAI. IUI 2009 will be 11-15, 2010, in Atlanta, Georgia USA. held February 8-11, 2009, on Sanibel ICINCO 2009 will be Twenty-First International Joint be held May 19-21, 2009, at the Sundial held July 1-5, 2009, in Milan, Italy. IJCAI-09 will be held July 11-Island, FL. 17, 2009, in Pasadena, California.