Asia
Characterization of Logic Program Revision as an Extension of Propositional Revision
Schwind, Nicolas, Inoue, Katsumi
We address the problem of belief revision of logic programs, i.e., how to incorporate to a logic program P a new logic program Q. Based on the structure of SE interpretations, Delgrande et al. adapted the well-known AGM framework to logic program (LP) revision. They identified the rational behavior of LP revision and introduced some specific operators. In this paper, a constructive characterization of all rational LP revision operators is given in terms of orderings over propositional interpretations with some further conditions specific to SE interpretations. It provides an intuitive, complete procedure for the construction of all rational LP revision operators and makes easier the comprehension of their semantic and computational properties. We give a particular consideration to logic programs of very general form, i.e., the generalized logic programs (GLPs). We show that every rational GLP revision operator is derived from a propositional revision operator satisfying the original AGM postulates. Interestingly, the further conditions specific to GLP revision are independent from the propositional revision operator on which a GLP revision operator is based. Taking advantage of our characterization result, we embed the GLP revision operators into structures of Boolean lattices, that allow us to bring to light some potential weaknesses in the adapted AGM postulates. To illustrate our claim, we introduce and characterize axiomatically two specific classes of (rational) GLP revision operators which arguably have a drastic behavior. We additionally consider two more restricted forms of logic programs, i.e., the disjunctive logic programs (DLPs) and the normal logic programs (NLPs) and adapt our characterization result to DLP and NLP revision operators.
Factorized Asymptotic Bayesian Inference for Factorial Hidden Markov Models
Li, Shaohua, Fujimaki, Ryohei, Miao, Chunyan
Factorial hidden Markov models (FHMMs) are powerful tools of modeling sequential data. Learning FHMMs yields a challenging simultaneous model selection issue, i.e., selecting the number of multiple Markov chains and the dimensionality of each chain. Our main contribution is to address this model selection issue by extending Factorized Asymptotic Bayesian (FAB) inference to FHMMs. First, we offer a better approximation of marginal log-likelihood than the previous FAB inference. Our key idea is to integrate out transition probabilities, yet still apply the Laplace approximation to emission probabilities. Second, we prove that if there are two very similar hidden states in an FHMM, i.e. one is redundant, then FAB will almost surely shrink and eliminate one of them, making the model parsimonious. Experimental results show that FAB for FHMMs significantly outperforms state-of-the-art nonparametric Bayesian iFHMM and Variational FHMM in model selection accuracy, with competitive held-out perplexity.
Safe Feature Pruning for Sparse High-Order Interaction Models
Nakagawa, Kazuya, Suzumura, Shinya, Karasuyama, Masayuki, Tsuda, Koji, Takeuchi, Ichiro
Taking into account high-order interactions among covariates is valuable in many practical regression problems. This is, however, computationally challenging task because the number of high-order interaction features to be considered would be extremely large unless the number of covariates is sufficiently small. In this paper, we propose a novel efficient algorithm for LASSO-based sparse learning of such high-order interaction models. Our basic strategy for reducing the number of features is to employ the idea of recently proposed safe feature screening (SFS) rule. An SFS rule has a property that, if a feature satisfies the rule, then the feature is guaranteed to be non-active in the LASSO solution, meaning that it can be safely screened-out prior to the LASSO training process. If a large number of features can be screened-out before training the LASSO, the computational cost and the memory requirment can be dramatically reduced. However, applying such an SFS rule to each of the extremely large number of high-order interaction features would be computationally infeasible. Our key idea for solving this computational issue is to exploit the underlying tree structure among high-order interaction features. Specifically, we introduce a pruning condition called safe feature pruning (SFP) rule which has a property that, if the rule is satisfied in a certain node of the tree, then all the high-order interaction features corresponding to its descendant nodes can be guaranteed to be non-active at the optimal solution. Our algorithm is extremely efficient, making it possible to work, e.g., with 3rd order interactions of 10,000 original covariates, where the number of possible high-order interaction features is greater than 10^{12}.
Fairness-Aware Learning with Restriction of Universal Dependency using f-Divergences
Fairness-aware learning is a novel framework for classification tasks. Like regular empirical risk minimization (ERM), it aims to learn a classifier with a low error rate, and at the same time, for the predictions of the classifier to be independent of sensitive features, such as gender, religion, race, and ethnicity. Existing methods can achieve low dependencies on given samples, but this is not guaranteed on unseen samples. The existing fairness-aware learning algorithms employ different dependency measures, and each algorithm is specifically designed for a particular one. Such diversity makes it difficult to theoretically analyze and compare them. In this paper, we propose a general framework for fairness-aware learning that uses f-divergences and that covers most of the dependency measures employed in the existing methods. We introduce a way to estimate the f-divergences that allows us to give a unified analysis for the upper bound of the estimation error; this bound is tighter than that of the existing convergence rate analysis of the divergence estimation. With our divergence estimate, we propose a fairness-aware learning algorithm, and perform a theoretical analysis of its generalization error. Our analysis reveals that, under mild assumptions and even with enforcement of fairness, the generalization error of our method is $O(\sqrt{1/n})$, which is the same as that of the regular ERM. In addition, and more importantly, we show that, for any f-divergence, the upper bound of the estimation error of the divergence is $O(\sqrt{1/n})$. This indicates that our fairness-aware learning algorithm guarantees low dependencies on unseen samples for any dependency measure represented by an f-divergence.
Joint community and anomaly tracking in dynamic networks
Baingana, Brian, Giannakis, Georgios B.
Most real-world networks exhibit community structure, a phenomenon characterized by existence of node clusters whose intra-edge connectivity is stronger than edge connectivities between nodes belonging to different clusters. In addition to facilitating a better understanding of network behavior, community detection finds many practical applications in diverse settings. Communities in online social networks are indicative of shared functional roles, or affiliation to a common socio-economic status, the knowledge of which is vital for targeted advertisement. In buyer-seller networks, community detection facilitates better product recommendations. Unfortunately, reliability of community assignments is hindered by anomalous user behavior often observed as unfair self-promotion, or "fake" highly-connected accounts created to promote fraud. The present paper advocates a novel approach for jointly tracking communities while detecting such anomalous nodes in time-varying networks. By postulating edge creation as the result of mutual community participation by node pairs, a dynamic factor model with anomalous memberships captured through a sparse outlier matrix is put forth. Efficient tracking algorithms suitable for both online and decentralized operation are developed. Experiments conducted on both synthetic and real network time series successfully unveil underlying communities and anomalous nodes.
Diffusion Nets
Mishne, Gal, Shaham, Uri, Cloninger, Alexander, Cohen, Israel
Non-linear manifold learning enables high-dimensional data analysis, but requires out-of-sample-extension methods to process new data points. In this paper, we propose a manifold learning algorithm based on deep learning to create an encoder, which maps a high-dimensional dataset and its low-dimensional embedding, and a decoder, which takes the embedded data back to the high-dimensional space. Stacking the encoder and decoder together constructs an autoencoder, which we term a diffusion net, that performs out-of-sample-extension as well as outlier detection. We introduce new neural net constraints for the encoder, which preserves the local geometry of the points, and we prove rates of convergence for the encoder. Also, our approach is efficient in both computational complexity and memory requirements, as opposed to previous methods that require storage of all training points in both the high-dimensional and the low-dimensional spaces to calculate the out-of-sample-extension and the pre-image.
Efficient Learning for Undirected Topic Models
Replicated Softmax model, a well-known undirected topic model, is powerful in extracting semantic representations of documents. Traditional learning strategies such as Contrastive Divergence are very inefficient. This paper provides a novel estimator to speed up the learning based on Noise Contrastive Estimate, extended for documents of variant lengths and weighted inputs. Experiments on two benchmarks show that the new estimator achieves great learning efficiency and high accuracy on document retrieval and classification.
GEFCOM 2014 - Probabilistic Electricity Price Forecasting
Barta, Gergo, Borbely, Gyula, Nagy, Gabor, Kazi, Sandor, Henk, Tamas
Energy price forecasting is a relevant yet hard task in the field of multi-step time series forecasting. In this paper we compare a well-known and established method, ARMA with exogenous variables with a relatively new technique Gradient Boosting Regression. The method was tested on data from Global Energy Forecasting Competition 2014 with a year long rolling window forecast. The results from the experiment reveal that a multi-model approach is significantly better performing in terms of error metrics. Gradient Boosting can deal with seasonality and auto-correlation out-of-the box and achieve lower rate of normalized mean absolute error on real-world data.
Regularization Path of Cross-Validation Error Lower Bounds
Shibagaki, Atsushi, Suzuki, Yoshiki, Karasuyama, Masayuki, Takeuchi, Ichiro
Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances. Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error. In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds. The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters. We demonstrate through numerical experiments that a theoretically guaranteed a choice of regularization parameter in the above sense is possible with reasonable computational costs.
A general framework for the IT-based clustering methods
Previously, we proposed a physically inspired rule to organize the data points in a sparse yet effective structure, called the in-tree (IT) graph, which is able to capture a wide class of underlying cluster structures in the datasets, especially for the density-based datasets. Although there are some redundant edges or lines between clusters requiring to be removed by computer, this IT graph has a big advantage compared with the k-nearest-neighborhood (k-NN) or the minimal spanning tree (MST) graph, in that the redundant edges in the IT graph are much more distinguishable and thus can be easily determined by several methods previously proposed by us. In this paper, we propose a general framework to re-construct the IT graph, based on an initial neighborhood graph, such as the k-NN or MST, etc, and the corresponding graph distances. For this general framework, our previous way of constructing the IT graph turns out to be a special case of it. This general framework 1) can make the IT graph capture a wider class of underlying cluster structures in the datasets, especially for the manifolds, and 2) should be more effective to cluster the sparse or graph-based datasets.