Goto

Collaborating Authors

 Government


On Ranking Senators By Their Votes

arXiv.org Machine Learning

The problem of ranking a set of objects given some measure of similarity is one of the most basic in machine learning. Recently Agarwal proposed a method based on techniques in semi-supervised learning utilizing the graph Laplacian. In this work we consider a novel application of this technique to ranking binary choice data and apply it specifically to ranking US Senators by their ideology.


Ontology-Based Link Prediction in the LiveJournal Social Network

AAAI Conferences

LiveJournal is a social network journal service with focus on user interactions. As for many other online social networks, predicting potential friendships in the LiveJournal network is a problem of great practical interest. Previous work has shown that graph features extracted from the graph associated with the network are good predictors for friendship links. However, contrary to the intuition, user data (e.g., interests shared by two users) does not always improve the predictions obtained with graph features alone. This could be due to the fact that features constructed from a large number of user declared interests cannot capture the implicit semantic of the interests. To test this hypothesis, we use a clustering approach to build an interest ontology, and explore the ability of the ontology to improve the performance of learning algorithms at predicting friendship links, when interest-based features are used alone or in combination with graph-based features. The results show that ontology-based features can help improve the performance of several machine learning classifiers (in particular, random forest classifiers) at the task of predicting links in the LiveJournal social network.


Enhancing QA Systems with Complex Temporal Question Processing Capabilities

Journal of Artificial Intelligence Research

This paper presents a multilayered architecture that enhances the capabilities of current QA systems and allows different types of complex questions or queries to be processed. The answers to these questions need to be gathered from factual information scattered throughout different documents. Specifically, we designed a specialized layer to process the different types of temporal questions. Complex temporal questions are first decomposed into simple questions, according to the temporal relations expressed in the original question. In the same way, the answers to the resulting simple questions are recomposed, fulfilling the temporal restrictions of the original complex question. A novel aspect of this approach resides in the decomposition which uses a minimal quantity of resources, with the final aim of obtaining a portable platform that is easily extensible to other languages. In this paper we also present a methodology for evaluation of the decomposition of the questions as well as the ability of the implemented temporal layer to perform at a multilingual level. The temporal layer was first performed for English, then evaluated and compared with: a) a general purpose QA system (F-measure 65.47% for QA plus English temporal layer vs. 38.01% for the general QA system), and b) a well-known QA system. Much better results were obtained for temporal questions with the multilayered system. This system was therefore extended to Spanish and very good results were again obtained in the evaluation (F-measure 40.36% for QA plus Spanish temporal layer vs. 22.94% for the general QA system).


Classification by Set Cover: The Prototype Vector Machine

arXiv.org Machine Learning

We introduce a new nearest-prototype classifier, the prototype vector machine (PVM). It arises from a combinatorial optimization problem which we cast as a variant of the set cover problem. We propose two algorithms for approximating its solution. The PVM selects a relatively small number of representative points which can then be used for classification. It contains 1-NN as a special case. The method is compatible with any dissimilarity measure, making it amenable to situations in which the data are not embedded in an underlying feature space or in which using a non-Euclidean metric is desirable. Indeed, we demonstrate on the much studied ZIP code data how the PVM can reap the benefits of a problem-specific metric. In this example, the PVM outperforms the highly successful 1-NN with tangent distance, and does so retaining fewer than half of the data points. This example highlights the strengths of the PVM in yielding a low-error, highly interpretable model. Additionally, we apply the PVM to a protein classification problem in which a kernel-based distance is used.


Statistical ranking and combinatorial Hodge theory

arXiv.org Machine Learning

We propose a number of techniques for obtaining a global ranking from data that may be incomplete and imbalanced -- characteristics almost universal to modern datasets coming from e-commerce and internet applications. We are primarily interested in score or rating-based cardinal data. From raw ranking data, we construct pairwise rankings, represented as edge flows on an appropriate graph. Our statistical ranking method uses the graph Helmholtzian, the graph theoretic analogue of the Helmholtz operator or vector Laplacian, in much the same way the graph Laplacian is an analogue of the Laplace operator or scalar Laplacian. We study the graph Helmholtzian using combinatorial Hodge theory: we show that every edge flow representing pairwise ranking can be resolved into two orthogonal components, a gradient flow that represents the L2-optimal global ranking and a divergence-free flow (cyclic) that measures the validity of the global ranking obtained -- if this is large, then the data does not have a meaningful global ranking. This divergence-free flow can be further decomposed orthogonally into a curl flow (locally cyclic) and a harmonic flow (locally acyclic but globally cyclic); these provides information on whether inconsistency arises locally or globally. An obvious advantage over the NP-hard Kemeny optimization is that discrete Hodge decomposition may be computed via a linear least squares regression. We also investigated the L1-projection of edge flows, showing that this is dual to correlation maximization over bounded divergence-free flows, and the L1-approximate sparse cyclic ranking, showing that this is dual to correlation maximization over bounded curl-free flows. We discuss relations with Kemeny optimization, Borda count, and Kendall-Smith consistency index from social choice theory and statistics.


A Class of DSm Conditional Rules

arXiv.org Artificial Intelligence

This research has been supported by Air Force Research Laboratory, Rome, NY, USA, in June and July 2009. Florentin Smarandache, Mark Alford Air Force Research Laboratory, RIEA, 525 Brooks Rd., Rome, NY 13441-4505, USA Abstract: In this paper we introduce two new DSm fusion conditioning rules with example, and as a generalization of them a class of DSm fu sion conditioning rules, and then extend them to a class of DSm conditioning rules. Keywords: conditional fusion rules, Dempster's conditioning rule, Dezert-Smarandache Theory, DSm conditioning rules 0. Introduction In order to understand the material in this paper, it is first necessary to define the terms that we'll be using: - Frame of discernment th e set of all hypotheses. This research has been supported by Air Force Research Laboratory, Rome, NY, USA, in June and July 2009. In the case when their intersection is empty, we consider these hypotheses disjoint.}


A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning

arXiv.org Artificial Intelligence

We present a general framework of semi-supervised dimensionality reduction for manifold learning which naturally generalizes existing supervised and unsupervised learning frameworks which apply the spectral decomposition. Algorithms derived under our framework are able to employ both labeled and unlabeled examples and are able to handle complex problems where data form separate clusters of manifolds. Our framework offers simple views, explains relationships among existing frameworks and provides further extensions which can improve existing algorithms. Furthermore, a new semi-supervised kernelization framework called ``KPCA trick'' is proposed to handle non-linear problems.


How Hard Is Bribery in Elections?

Journal of Artificial Intelligence Research

We study the complexity of influencing elections through bribery: How computationally complex is it for an external actor to determine whether by paying certain voters to change their preferences a specified candidate can be made the elections winner? We study this problem for election systems as varied as scoring protocols and Dodgson voting, and in a variety of settings regarding homogeneous-vs.-nonhomogeneous electorate bribability, bounded-size-vs.-arbitrary-sized candidate sets, weighted-vs.-unweighted voters, and succinct-vs.-nonsuccinct input specification. We obtain both polynomial-time bribery algorithms and proofs of the intractability of bribery, and indeed our results show that the complexity of bribery is extremely sensitive to the setting. For example, we find settings in which bribery is NP-complete but manipulation (by voters) is in P, and we find settings in which bribing weighted voters is NP-complete but bribing voters with individual bribe thresholds is in P. For the broad class of elections (including plurality, Borda, k-approval, and veto) known as scoring protocols, we prove a dichotomy result for bribery of weighted voters: We find a simple-to-evaluate condition that classifies every case as either NP-complete or in P.


Developing an End-to-End Planning Application from a Timeline Representation Framework

AAAI Conferences

This paper describes aspects of a project aiming at creating general, flexible and reusable software architecture to address planning problems in space missions. It introduces recent work to realize an open software framework for supporting development of planning and scheduling space applications. The framework, which is named TRF (Timeline-based Representation Framework), aims at supporting application development within different space missions for the European Space Agency (ESA). It is currently being tested on three problem examples, all solved on top of the TRF functionalities. This paper describes the TRF three-layered software architecture and shows how it has been used to deploy a complete application, named MrSPOCK, an interactive system for Long Term Planning in the Mars Express operational mission.


Metropolitan Fixed Assets Change Judgment using Aerial Photographs

AAAI Conferences

The Tokyo Metropolitan Government is the largest municipality in Japan and conducts building change identification work. Recently, Tokyo terminated its traditional visual identification work that has been in use for 20 years and shifted to a new automated system. This paper is intended to introduce the Fixed Assets Change Judgment (FACJ) system and its core tool, RealScape. RealScape automatically detects the changes in the height and color of buildings based on three-dimensional (3D) analysis of aerial photographs. It employs a unique pixel-by-pixel stereo processing method and enables the foot-level precision for each building. RealScape detects building changes more accurately than visual judgment operations by humans and reduces the labor costs to one third of the traditional approach and the required judgment duration to about two weeks per 100km2.