Goto

Collaborating Authors

 Government


Multilingual person name recognition and transliteration

arXiv.org Artificial Intelligence

We present an exploratory tool that extracts person names from multilingual news collections, matches name variants referring to the same person, and infers relationships between people based on the co-occurrence of their names in related news. A novel feature is the matching of name variants across languages and writing systems, including names written with the Greek, Cyrillic and Arabic writing system. Due to our highly multilingual setting, we use an internal standard representation for name representation and matching, instead of adopting the traditional bilingual approach to transliteration. This work is part of the news analysis system NewsExplorer that clusters an average of 25,000 news articles per day to detect related news within the same and across different languages.


Analysis of Dynamic Task Allocation in Multi-Robot Systems

arXiv.org Artificial Intelligence

Dynamic task allocation is an essential requirement for multi-robot systems operating in unknown dynamic environments. It allows robots to change their behavior in response to environmental changes or actions of other robots in order to improve overall system performance. Emergent coordination algorithms for task allocation that use only local sensing and no direct communication between robots are attractive because they are robust and scalable. However, a lack of formal analysis tools makes emergent coordination algorithms difficult to design. In this paper we present a mathematical model of a general dynamic task allocation mechanism. Robots using this mechanism have to choose between two types of task, and the goal is to achieve a desired task division in the absence of explicit communication and global knowledge. Robots estimate the state of the environment from repeated local observations and decide which task to choose based on these observations. We model the robots and observations as stochastic processes and study the dynamics of the collective behavior. Specifically, we analyze the effect that the number of observations and the choice of the decision function have on the performance of the system. The mathematical models are validated in a multi-robot multi-foraging scenario. The model's predictions agree very closely with experimental results from sensor-based simulations.


Better than the real thing? Iterative pseudo-query processing using cluster-based language models

arXiv.org Artificial Intelligence

We present a novel approach to pseudo-feedback-based ad hoc retrieval that uses language models induced from both documents and clusters. First, we treat the pseudo-feedback documents produced in response to the original query as a set of pseudo-queries that themselves can serve as input to the retrieval process. Observing that the documents returned in response to the pseudo-queries can then act as pseudo-queries for subsequent rounds, we arrive at a formulation of pseudo-query-based retrieval as an iterative process. Experiments show that several concrete instantiations of this idea, when applied in conjunction with techniques designed to heighten precision, yield performance results rivaling those of a number of previously-proposed algorithms, including the standard language-modeling approach. The use of cluster-based language models is a key contributing factor to our algorithms' success.


PageRank without hyperlinks: Structural re-ranking using links induced by language models

arXiv.org Artificial Intelligence

Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks.


Anyone but Him: The Complexity of Precluding an Alternative

arXiv.org Artificial Intelligence

Preference aggregation in a multiagent setting is a central issue in both human and computer contexts. In this paper, we study in terms of complexity the vulnerability of preference aggregation to destructive control. That is, we study the ability of an election's chair to, through such mechanisms as voter/candidate addition/suppression/partition, ensure that a particular candidate (equivalently, alternative) does not win. And we study the extent to which election systems can make it impossible, or computationally costly (NP-complete), for the chair to execute such control. Among the systems we study--plurality, Condorcet, and approval voting--we find cases where systems immune or computationally resistant to a chair choosing the winner nonetheless are vulnerable to the chair blocking a victory. Beyond that, we see that among our studied systems no one system offers the best protection against destructive control. Rather, the choice of a preference aggregation system will depend closely on which types of control one wishes to be protected against. We also find concrete cases where the complexity of or susceptibility to control varies dramatically based on the choice among natural tie-handling rules.


On Approximating Optimal Weighted Lobbying, and Frequency of Correctness versus Average-Case Polynomial Time

arXiv.org Artificial Intelligence

We investigate issues related to two hard problems related to voting, the optimal weighted lobbying problem and the winner problem for Dodgson elections. Regarding the former, Christian et al. [CFRS06] showed that optimal lobbying is intractable in the sense of parameterized complexity. We provide an efficient greedy algorithm that achieves a logarithmic approximation ratio for this problem and even for a more general variant--optimal weighted lobbying. We prove that essentially no better approximation ratio than ours can be proven for this greedy algorithm. The problem of determining Dodgson winners is known to be complete for parallel access to NP [HHR97]. Homan and Hemaspaandra [HH06] proposed an efficient greedy heuristic for finding Dodgson winners with a guaranteed frequency of success, and their heuristic is a ``frequently self-knowingly correct algorithm.'' We prove that every distributional problem solvable in polynomial time on the average with respect to the uniform distribution has a frequently self-knowingly correct polynomial-time algorithm. Furthermore, we study some features of probability weight of correctness with respect to Procaccia and Rosenschein's junta distributions [PR07].


Spines of Random Constraint Satisfaction Problems: Definition and Connection with Computational Complexity

arXiv.org Artificial Intelligence

We study the connection between the order of phase transitions in combinatorial problems and the complexity of decision algorithms for such problems. We rigorously show that, for a class of random constraint satisfaction problems, a limited connection between the two phenomena indeed exists. Specifically, we extend the definition of the spine order parameter of Bollobas et al. to random constraint satisfaction problems, rigorously showing that for such problems a discontinuity of the spine is associated with a $2^{Ω(n)}$ resolution complexity (and thus a $2^{Ω(n)}$ complexity of DPLL algorithms) on random instances. The two phenomena have a common underlying cause: the emergence of ``large'' (linear size) minimally unsatisfiable subformulas of a random formula at the satisfiability phase transition. We present several further results that add weight to the intuition that random constraint satisfaction problems with a sharp threshold and a continuous spine are ``qualitatively similar to random 2-SAT''. Finally, we argue that it is the spine rather than the backbone parameter whose continuity has implications for the decision complexity of combinatorial problems, and we provide experimental evidence that the two parameters can behave in a different manner.


Causes and Explanations: A Structural-Model Approach, Part I: Causes

arXiv.org Artificial Intelligence

We propose a new definition of actual cause, using structural equations to model counterfactuals. We show that the definition yields a plausible and elegant account of causation that handles well examples which have caused problems for other definitions and resolves major difficulties in the traditional account.


Manipulability of Single Transferable Vote

arXiv.org Artificial Intelligence

For many voting rules, it is NP-hard to compute a successful manipulation. However, NP-hardness only bounds the worst-case complexity. Recent theoretical results suggest that manipulation may often be easy in practice. We study empirically the cost of manipulating the single transferable vote (STV) rule. This was one of the first rules shown to be NP-hard to manipulate. It also appears to be one of the harder rules to manipulate since it involves multiple rounds and since, unlike many other rules, it is NP-hard for a single agent to manipulate without weights on the votes or uncertainty about how the other agents have voted. In almost every election in our experiments, it was easy to compute how a single agent could manipulate the election or to prove that manipulation by a single agent was impossible. It remains an interesting open question if manipulation by a coalition of agents is hard to compute in practice.


Remote Monitoring of Activity, Location, and Exertion Levels

AAAI Conferences

The purpose of this study was to develop and test a platform that would assist the Environmental Protection Agency (EPA), and the scientific community at large, in the generation of a human activity and energy expenditure database of sufficient detail to accurately predict human exposures and dose to various pollutants. The monitoring system developed is easily extendable to the collection of other health-related data. Our protocol tested the use of a digital voice recorder to collect activity/location diary data assuming it to be a less burdensome and a more reliable method than using paper and pencil diaries or hand-held computers. We expected the data to be more complete and reliable than retrospective reports (diaries filled out at the end of day) because the recorders are easy to use, the diary entries are made as the events occur, and we expected that participants would be more likely to complete the study because of the reduced burden. The data collection plan was also expected to show that the cost of the transcription of the diary can be reduced substantially by using speech and language processing to translate the digital diaries into the EPA’s Comprehensive Human Activity Database (CHAD).