Plotting

 Technology


Large-Margin kNN Classification Using a Deep Encoder Network

arXiv.org Artificial Intelligence

KNN is one of the most popular classification methods, but it often fails to work well with inappropriate choice of distance metric or due to the presence of numerous class-irrelevant features. Linear feature transformation methods have been widely applied to extract class-relevant information to improve kNN classification, which is very limited in many applications. Kernels have been used to learn powerful non-linear feature transformations, but these methods fail to scale to large datasets. In this paper, we present a scalable non-linear feature mapping method based on a deep neural network pretrained with restricted boltzmann machines for improving kNN classification in a large-margin framework, which we call DNet-kNN. DNet-kNN can be used for both classification and for supervised dimensionality reduction. The experimental results on two benchmark handwritten digit datasets show that DNet-kNN has much better performance than large-margin kNN using a linear mapping and kNN based on a deep autoencoder pretrained with retricted boltzmann machines.


Knowledge Management in Economic Intelligence with Reasoning on Temporal Attributes

arXiv.org Artificial Intelligence

People have to make important decisions within a time frame. Hence, it is imperative to employ means or strategy to aid effective decision making. Consequently, Economic Intelligence (EI) has emerged as a field to aid strategic and timely decision making in an organization. In the course of attaining this goal: it is indispensable to be more optimistic towards provision for conservation of intellectual resource invested into the process of decision making. This intellectual resource is nothing else but the knowledge of the actors as well as that of the various processes for effecting decision making. Knowledge has been recognized as a strategic economic resource for enhancing productivity and a key for innovation in any organization or community. Thus, its adequate management with cognizance of its temporal properties is highly indispensable. Temporal properties of knowledge refer to the date and time (known as timestamp) such knowledge is created as well as the duration or interval between related knowledge. This paper focuses on the needs for a user-centered knowledge management approach as well as exploitation of associated temporal properties. Our perspective of knowledge is with respect to decision-problems projects in EI. Our hypothesis is that the possibility of reasoning about temporal properties in exploitation of knowledge in EI projects should foster timely decision making through generation of useful inferences from available and reusable knowledge for a new project.


Introduction to Semi-Supervised Learning

Morgan & Claypool Publishers

In this introductory book, we present some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi-supervised support vector machines. ISBN 9781598295474, 130 pages.


The CIFF Proof Procedure for Abductive Logic Programming with Constraints: Theory, Implementation and Experiments

arXiv.org Artificial Intelligence

We present the CIFF proof procedure for abductive logic programming with constraints, and we prove its correctness. CIFF is an extension of the IFF proof procedure for abductive logic programming, relaxing the original restrictions over variable quantification (allowedness conditions) and incorporating a constraint solver to deal with numerical constraints as in constraint logic programming. Finally, we describe the CIFF system, comparing it with state of the art abductive systems and answer set solvers and showing how to use it to program some applications. (To appear in Theory and Practice of Logic Programming - TPLP).


Exponential Family Graph Matching and Ranking

arXiv.org Artificial Intelligence

We present a method for learning max-weight matching predictors in bipartite graphs. The method consists of performing maximum a posteriori estimation in exponential families with sufficient statistics that encode permutations and data features. Although inference is in general hard, we show that for one very relevant application - web page ranking - exact inference is efficient. For general model instances, an appropriate sampler is readily available. Contrary to existing max-margin matching models, our approach is statistically consistent and, in addition, experiments with increasing sample sizes indicate superior improvement over such models. We apply the method to graph matching in computer vision as well as to a standard benchmark dataset for learning web page ranking, in which we obtain state-of-the-art results, in particular improving on max-margin variants. The drawback of this method with respect to max-margin alternatives is its runtime for large graphs, which is comparatively high.


Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

arXiv.org Artificial Intelligence

Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, in many problem domains (e.g, program execution traces), a novel sequential pattern mining research, called mining repetitive gapped sequential patterns, has attracted the attention of many researchers, considering not only the repetition of sequential pattern in different sequences but also the repetition within a sequence is more meaningful than the general sequential pattern mining which only captures occurrences in different sequences. However, the number of repetitive gapped sequential patterns generated by even these closed mining algorithms may be too large to understand for users, especially when support threshold is low. In this paper, we propose and study the problem of compressing repetitive gapped sequential patterns. Inspired by the ideas of summarizing frequent itemsets, RPglobal, we develop an algorithm, CRGSgrow (Compressing Repetitive Gapped Sequential pattern grow), including an efficient pruning strategy, SyncScan, and an efficient representative pattern checking scheme, -dominate sequential pattern checking. The CRGSgrow is a two-step approach: in the first step, we obtain all closed repetitive sequential patterns as the candidate set of representative repetitive sequential patterns, and at the same time get the most of representative repetitive sequential patterns; in the second step, we only spend a little time in finding the remaining the representative patterns from the candidate set. An empirical study with both real and synthetic data sets clearly shows that the CRGSgrow has good performance.


Learning Nonlinear Dynamic Models

arXiv.org Artificial Intelligence

We present a novel approach for learning nonlinear dynamic models, which leads to a new set of tools capable of solving problems that are otherwise difficult. We provide theory showing this new approach is consistent for models with long range structure, and apply the approach to motion capture and high-dimensional video data, yielding results superior to standard alternatives.


Conditional Probability Tree Estimation Analysis and Algorithms

arXiv.org Artificial Intelligence

We consider the problem of estimating the conditional probability of a label in time $O(\log n)$, where $n$ is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly $10^6$ labels.


Optimal Tableau Decision Procedures for PDL

arXiv.org Artificial Intelligence

We reformulate Pratt's tableau decision procedure of checking satisfiability of a set of formulas in PDL. Our formulation is simpler and more direct for implementation. Extending the method we give the first EXPTIME (optimal) tableau decision procedure not based on transformation for checking consistency of an ABox w.r.t. a TBox in PDL (here, PDL is treated as a description logic). We also prove the new result that the data complexity of the instance checking problem in PDL is coNP-complete.


Solar radiation forecasting using ad-hoc time series preprocessing and neural networks

arXiv.org Artificial Intelligence

In this paper, we present an application of neural networks in the renewable energy domain. We have developed a methodology for the daily prediction of global solar radiation on a horizontal surface. We use an ad-hoc time series preprocessing and a Multi-Layer Perceptron (MLP) in order to predict solar radiation at daily horizon. First results are promising with nRMSE < 21% and RMSE < 998 Wh/m2. Our optimized MLP presents prediction similar to or even better than conventional methods such as ARIMA techniques, Bayesian inference, Markov chains and k-Nearest-Neighbors approximators. Moreover we found that our data preprocessing approach can reduce significantly forecasting errors.