Country
The CIFF Proof Procedure for Abductive Logic Programming with Constraints: Theory, Implementation and Experiments
Mancarella, P., Terreni, G., Sadri, F., Toni, F., Endriss, U.
We present the CIFF proof procedure for abductive logic programming with constraints, and we prove its correctness. CIFF is an extension of the IFF proof procedure for abductive logic programming, relaxing the original restrictions over variable quantification (allowedness conditions) and incorporating a constraint solver to deal with numerical constraints as in constraint logic programming. Finally, we describe the CIFF system, comparing it with state of the art abductive systems and answer set solvers and showing how to use it to program some applications. (To appear in Theory and Practice of Logic Programming - TPLP).
Exponential Family Graph Matching and Ranking
Petterson, James, Caetano, Tiberio, McAuley, Julian, Yu, Jin
We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application - web page ranking - exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning web page ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high.
Mining Compressed Repetitive Gapped Sequential Patterns Efficiently
Tong, Yongxin, Zhao, Li, Yu, Dan, Ma, Shilong, Xu, Ke
Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, in many problem domains (e.g, program execution traces), a novel sequential pattern mining research, called mining repetitive gapped sequential patterns, has attracted the attention of many researchers, considering not only the repetition of sequential pattern in different sequences but also the repetition within a sequence is more meaningful than the general sequential pattern mining which only captures occurrences in different sequences. However, the number of repetitive gapped sequential patterns generated by even these closed mining algorithms may be too large to understand for users, especially when support threshold is low. In this paper, we propose and study the problem of compressing repetitive gapped sequential patterns. Inspired by the ideas of summarizing frequent itemsets, RPglobal, we develop an algorithm, CRGSgrow (Compressing Repetitive Gapped Sequential pattern grow), including an efficient pruning strategy, SyncScan, and an efficient representative pattern checking scheme, -dominate sequential pattern checking. The CRGSgrow is a two-step approach: in the first step, we obtain all closed repetitive sequential patterns as the candidate set of representative repetitive sequential patterns, and at the same time get the most of representative repetitive sequential patterns; in the second step, we only spend a little time in finding the remaining the representative patterns from the candidate set. An empirical study with both real and synthetic data sets clearly shows that the CRGSgrow has good performance.
Learning Nonlinear Dynamic Models
Langford, John, Salakhutdinov, Ruslan, Zhang, Tong
We present a novel approach for learning nonlinear dynamic models, which leads to a new set of tools capable of solving problems that are otherwise difficult. We provide theory showing this new approach is consistent for models with long range structure, and apply the approach to motion capture and high-dimensional video data, yielding results superior to standard alternatives.
Optimal Tableau Decision Procedures for PDL
Nguyen, Linh Anh, Szaลas, Andrzej
We reformulate Pratt's tableau decision procedure of checking satisfiability of a set of formulas in PDL. Our formulation is simpler and more direct for implementation. Extending the method we give the first EXPTIME (optimal) tableau decision procedure not based on transformation for checking consistency of an ABox w.r.t. a TBox in PDL (here, PDL is treated as a description logic). We also prove the new result that the data complexity of the instance checking problem in PDL is coNP-complete.
Solar radiation forecasting using ad-hoc time series preprocessing and neural networks
Paoli, Christophe, Voyant, Cyril, Muselli, Marc, Nivet, Marie-Laure
In this paper, we present an application of neural networks in the renewable energy domain. We have developed a methodology for the daily prediction of global solar radiation on a horizontal surface. We use an ad-hoc time series preprocessing and a Multi-Layer Perceptron (MLP) in order to predict solar radiation at daily horizon. First results are promising with nRMSE < 21% and RMSE < 998 Wh/m2. Our optimized MLP presents prediction similar to or even better than conventional methods such as ARIMA techniques, Bayesian inference, Markov chains and k-Nearest-Neighbors approximators. Moreover we found that our data preprocessing approach can reduce significantly forecasting errors.
Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy
Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. "Structure" has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Beginning with the role of number theory in expressing data, we show how we can naturally proceed to hierarchical structures. We show how this both encapsulates traditional paradigms in data analysis, and also opens up new perspectives towards issues that are on the order of the day, including data mining of massive, high dimensional, heterogeneous data sets. Linkages with other fields are also discussed including computational logic and symbolic dynamics. The structures in data surveyed here are based on hierarchy, represented as p-adic numbers or an ultrametric topology.
Divide and Conquer: Partitioning Online Social Networks
Pujol, Josep M., Erramilli, Vijay, Rodriguez, Pablo
Online Social Networks (OSNs) have exploded in terms of scale and scope over the last few years. The unprecedented growth of these networks present challenges in terms of system design and maintenance. One way to cope with this is by partitioning such large networks and assigning these partitions to different machines. However, social networks possess unique properties that make the partitioning problem non-trivial. The main contribution of this paper is to understand different properties of social networks and how these properties can guide the choice of a partitioning algorithm. Using large scale measurements representing real OSNs, we first characterize different properties of social networks, and then we evaluate qualitatively different partitioning methods that cover the design space. We expose different trade-offs involved and understand them in light of properties of social networks. We show that a judicious choice of a partitioning scheme can help improve performance.
A Minimum Description Length Approach to Multitask Feature Selection
Many regression problems involve not one but several response variables (y's). Often the responses are suspected to share a common underlying structure, in which case it may be advantageous to share information across them; this is known as multitask learning. As a special case, we can use multiple responses to better identify shared predictive features -- a project we might call multitask feature selection. This thesis is organized as follows. Section 1 introduces feature selection for regression, focusing on ell_0 regularization methods and their interpretation within a Minimum Description Length (MDL) framework. Section 2 proposes a novel extension of MDL feature selection to the multitask setting. The approach, called the "Multiple Inclusion Criterion" (MIC), is designed to borrow information across regression tasks by more easily selecting features that are associated with multiple responses. We show in experiments on synthetic and real biological data sets that MIC can reduce prediction error in settings where features are at least partially shared across responses. Section 3 surveys hypothesis testing by regression with a single response, focusing on the parallel between the standard Bonferroni correction and an MDL approach. Mirroring the ideas in Section 2, Section 4 proposes a novel MIC approach to hypothesis testing with multiple responses and shows that on synthetic data with significant sharing of features across responses, MIC sometimes outperforms standard FDR-controlling methods in terms of finding true positives for a given level of false positives. Section 5 concludes.
I, Quantum Robot: Quantum Mind control on a Quantum Computer
The logic which describes quantum robots is not orthodox quantum logic, but a deductive calculus which reproduces the quantum tasks (computational processes, and actions) taking into account quantum superposition and quantum entanglement. A way toward the realization of intelligent quantum robots is to adopt a quantum metalanguage to control quantum robots. A physical implementation of a quantum metalanguage might be the use of coherent states in brain signals.