Asia
Building Rules on Top of Ontologies for the Semantic Web with Inductive Logic Programming
Building rules on top of ontologies is the ultimate goal of the logical layer of the Semantic Web. To this aim an ad-hoc mark-up language for this layer is currently under discussion. It is intended to follow the tradition of hybrid knowledge representation and reasoning systems such as $\mathcal{AL}$-log that integrates the description logic $\mathcal{ALC}$ and the function-free Horn clausal language \textsc{Datalog}. In this paper we consider the problem of automating the acquisition of these rules for the Semantic Web. We propose a general framework for rule induction that adopts the methodological apparatus of Inductive Logic Programming and relies on the expressive and deductive power of $\mathcal{AL}$-log. The framework is valid whatever the scope of induction (description vs. prediction) is. Yet, for illustrative purposes, we also discuss an instantiation of the framework which aims at description and turns out to be useful in Ontology Refinement. Keywords: Inductive Logic Programming, Hybrid Knowledge Representation and Reasoning Systems, Ontologies, Semantic Web. Note: To appear in Theory and Practice of Logic Programming (TPLP)
Supervised Machine Learning with a Novel Pointwise Density Estimator
Oyang, Yen-Jen, Chen, Chien-Yu, Chang, Darby Tien-Hao, Wu, Chih-Peng
This article proposes a novel density estimation based algorithm for carrying out supervised machine learning. The proposed algorithm features O(n) time complexity for generating a classifier, where n is the number of sampling instances in the training dataset. This feature is highly desirable in contemporary applications that involve large and still growing databases. In comparison with the kernel density estimation based approaches, the mathe-matical fundamental behind the proposed algorithm is not based on the assump-tion that the number of training instances approaches infinite. As a result, a classifier generated with the proposed algorithm may deliver higher prediction accuracy than the kernel density estimation based classifier in some cases.
New Inference Rules for Max-SAT
Li, C. M., Manya, F., Planes, J.
Exact Max-SAT solvers, compared with SAT solvers, apply little inference at each node of the proof tree. Commonly used SAT inference rules like unit propagation produce a simplified formula that preserves satisfiability but, unfortunately, solving the Max-SAT problem for the simplified formula is not equivalent to solving it for the original formula. In this paper, we define a number of original inference rules that, besides being applied efficiently, transform Max-SAT instances into equivalent Max-SAT instances which are easier to solve. The soundness of the rules, that can be seen as refinements of unit resolution adapted to Max-SAT, are proved in a novel and simple way via an integer programming transformation. With the aim of finding out how powerful the inference rules are in practice, we have developed a new Max-SAT solver, called MaxSatz, which incorporates those rules, and performed an experimental investigation. The results provide empirical evidence that MaxSatz is very competitive, at least, on random Max-2SAT, random Max-3SAT, Max-Cut, and Graph 3-coloring instances, as well as on the benchmarks from the Max-SAT Evaluation 2006.
Analyzing covert social network foundation behind terrorism disaster
Maeno, Yoshiharu, Ohsawa, Yukio
This paper addresses a method to analyze the covert social network foundation hidden behind the terrorism disaster. It is to solve a node discovery problem, which means to discover a node, which functions relevantly in a social network, but escaped from monitoring on the presence and mutual relationship of nodes. The method aims at integrating the expert investigator's prior understanding, insight on the terrorists' social network nature derived from the complex graph theory, and computational data processing. The social network responsible for the 9/11 attack in 2001 is used to execute simulation experiment to evaluate the performance of the method.
Topic and Role Discovery in Social Networks with Experiments on Enron and Academic Email
McCallum, A., Wang, X., Corrada-Emmanuel, A.
Previous work in social network analysis (SNA) has modeled the existence of links from one entity to another, but not the attributes such as language content or topics on those links. We present the Author-Recipient-Topic (ART) model for social network analysis, which learns topic distributions based on the direction-sensitive messages sent between entities. The model builds on Latent Dirichlet Allocation (LDA) and the Author-Topic (AT) model, adding the key attribute that distribution over topics is conditioned distinctly on both the sender and recipient---steering the discovery of topics according to the relationships between people. We give results on both the Enron email corpus and a researcher's email archive, providing evidence not only that clearly relevant topics are discovered, but that the ART model better predicts people's roles and gives lower perplexity on previously unseen messages. We also present the Role-Author-Recipient-Topic (RART) model, an extension to ART that explicitly represents people's roles.
Compressed Pattern Databases
Felner, A., Korf, R. E., Meshulam, R., Holte, R. C.
A pattern database (PDB) is a heuristic function implemented as a lookup table that stores the lengths of optimal solutions for subproblem instances. Standard PDBs have a distinct entry in the table for each subproblem instance. In this paper we investigate compressing PDBs by merging several entries into one, thereby allowing the use of PDBs that exceed available memory in their uncompressed form. We introduce a number of methods for determining which entries to merge and discuss their relative merits. These vary from domain-independent approaches that allow any set of entries in the PDB to be merged, to more intelligent methods that take into account the structure of the problem. The choice of the best compression method is based on domain-dependent attributes. We present experimental results on a number of combinatorial problems, including the four-peg Towers of Hanoi problem, the sliding-tile puzzles, and the Top-Spin puzzle. For the Towers of Hanoi, we show that the search time can be reduced by up to three orders of magnitude by using compressed PDBs compared to uncompressed PDBs of the same size. More modest improvements were observed for the other domains.
Knowledge Derived From Wikipedia For Computing Semantic Relatedness
Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.
A Heuristic Routing Mechanism Using a New Addressing Scheme
Ravanbakhsh, Mohsen, Abbasi-Yadkori, Yasin, Abbaspour, Maghsoud, Sarbazi-Azad, Hamid
Current methods of routing are based on network information in the form of routing tables, in which routing protocols determine how to update the tables according to the network changes. Despite the variability of data in routing tables, node addresses are constant. In this paper, we first introduce the new concept of variable addresses, which results in a novel framework to cope with routing problems using heuristic solutions. Then we propose a heuristic routing mechanism based on the application of genes for determination of network addresses in a variable address network and describe how this method flexibly solves different problems and induces new ideas in providing integral solutions for variety of problems. The case of ad-hoc networks is where simulation results are more supportive and original solutions have been proposed for issues like mobility.
Semantic distillation: a method for clustering objects by their contextual specificity
Sierocinski, Thomas, Béchec, Anthony Le, Théret, Nathalie, Petritis, Dimitri
Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.
Graph Abstraction in Real-time Heuristic Search
Bulitko, V., Sturtevant, N., Lu, J., Yau, T.
Real-time heuristic search methods are used by situated agents in applications that require the amount of planning per move to be independent of the problem size. Such agents plan only a few actions at a time in a local search space and avoid getting trapped in local minima by improving their heuristic function over time. We extend a wide class of real-time search algorithms with automatically-built state abstraction and prove completeness and convergence of the resulting family of algorithms. We then analyze the impact of abstraction in an extensive empirical study in real-time pathfinding. Abstraction is found to improve efficiency by providing better trading offs between planning time, learning speed and other negatively correlated performance measures.