Goto

Collaborating Authors

 Search


Learning and Detecting Patterns in Multi-Attributed Network Data

AAAI Conferences

Network analysis is a growing field across many domains, including computer vision, social media marketing, transportation networks, and intelligence analysis. The growing use of digital communication devices and platforms, as well as persistent surveillance sensors, has resulted in explosion of the quantity of data and stretched the abilities of current technologies to process this data and draw meaningful conclusions. Current tools either require significant levels of manual intervention (e.g., to prepare the data, to define patterns, or to draw conclusions from data) or are unable to generalize to new data sources and analysis needs. In this paper, we present automated solutions to two major problems in network analysis: (a) finding patterns in the network data that contains high levels of noise and irrelevant information; and (b) learning repetitive patterns and dependencies between entities and attributes. Our modeling framework represents network data using multi-attributed graphs that can encode various discrete and continuous features and relationships between network entities. The pattern search and learning model is based on probabilistic multi-attributed graph matching, and implemented using distributed message passing algorithms. Our algorithms achieved high accuracy rates in learning and finding patterns in the data, are flexible to new domains and data types, and scale to large datasets using the Map-Reduce framework.


Transforming Graph Data for Statistical Relational Learning

Journal of Artificial Intelligence Research

Relational data representations have become an increasingly important topic due to the recent proliferation of network datasets (e.g., social, biological, information networks) and a corresponding increase in the application of Statistical Relational Learning (SRL) algorithms to these domains. In this article, we examine and categorize techniques for transforming graph-based relational data to improve SRL algorithms. In particular, appropriate transformations of the nodes, links, and/or features of the data can dramatically affect the capabilities and results of SRL algorithms. We introduce an intuitive taxonomy for data representation transformations in relational domains that incorporates link transformation and node transformation as symmetric representation tasks. More specifically, the transformation tasks for both nodes and links include (i) predicting their existence, (ii) predicting their label or type, (iii) estimating their weight or importance, and (iv) systematically constructing their relevant features. We motivate our taxonomy through detailed examples and use it to survey competing approaches for each of these tasks. We also discuss general conditions for transforming links, nodes, and features. Finally, we highlight challenges that remain to be addressed.


A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing

Journal of Artificial Intelligence Research

Dual decomposition, and more generally Lagrangian relaxation, is a classical method for combinatorial optimization; it has recently been applied to several inference problems in natural language processing (NLP). This tutorial gives an overview of the technique. We describe example algorithms, describe formal guarantees for the method, and describe practical issues in implementing the algorithms. While our examples are predominantly drawn from the NLP literature, the material should be of general relevance to inference problems in machine learning. A central theme of this tutorial is that Lagrangian relaxation is naturally applied in conjunction with a broad class of combinatorial algorithms, allowing inference in models that go significantly beyond previous work on Lagrangian relaxation for inference in graphical models.


Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy

arXiv.org Artificial Intelligence

Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a simple approach to parallel best-first search that asynchronously distributes and schedules work among processors based on a hash function of the search state. We use this approach to parallelize the A* algorithm in an optimal sequential version of the Fast Downward planner, as well as a 24-puzzle solver. The scaling behavior of HDA* is evaluated experimentally on a shared memory, multicore machine with 8 cores, a cluster of commodity machines using up to 64 cores, and large-scale high-performance clusters, using up to 2400 processors. We show that this approach scales well, allowing the effective utilization of large amounts of distributed memory to optimally solve problems which require terabytes of RAM. We also compare HDA* to Transposition-table Driven Scheduling (TDS), a hash-based parallelization of IDA*, and show that, in planning, HDA* significantly outperforms TDS. A simple hybrid which combines HDA* and TDS to exploit strengths of both algorithms is proposed and evaluated.


Improved Local Search in Artificial Bee Colony using Golden Section Search

arXiv.org Artificial Intelligence

Artificial bee colony (ABC), an optimization algorithm is a recent addition to the family of population based search algorithm. ABC has taken its inspiration from the collective intelligent foraging behavior of honey bees. In this study we have incorporated golden section search mechanism in the structure of basic ABC to improve the global convergence and prevent to stick on a local solution. The proposed variant is termed as ILS-ABC. Comparative numerical results with the state-of-art algorithms show the performance of the proposal when applied to the set of unconstrained engineering design problems. The simulated results show that the proposed variant can be successfully applied to solve real life problems.


A Distance-Based Branch and Bound Feature Selection Algorithm

arXiv.org Machine Learning

There is no known efficient method for selecting k Gaussian features from n which achieve the lowest Bayesian classification error. We show an example of how greedy algorithms faced with this task are led to give results that are not optimal. This motivates us to propose a more robust approach. We present a Branch and Bound algorithm for finding a subset of k independent Gaussian features which minimizes the naive Bayesian classification error. Our algorithm uses additive monotonic distance measures to produce bounds for the Bayesian classification error in order to exclude many feature subsets from evaluation, while still returning an optimal solution. We test our method on synthetic data as well as data obtained from gene expression profiling.


Exploiting Locality in Searching the Web

arXiv.org Artificial Intelligence

Published experiments on spidering the Web suggest that, given training data in the form of a (relatively small) subgraph of the Web containing a subset of a selected class of target pages, it is possible to conduct a directed search and find additional target pages significantly faster (with fewer page retrievals) than by performing a blind or uninformed random or systematic search, e.g., breadth-first search. If true, this claim motivates a number of practical applications. Unfortunately, these experiments were carried out in specialized domains or under conditions that are difficult to replicate. We present and apply an experimental framework designed to reexamine and resolve the basic claims of the earlier work, so that the supporting experiments can be replicated and built upon. We provide high-performance tools for building experimental spiders, make use of the ground truth and static nature of the WT10g TREC Web corpus, and rely on simple well understand machine learning techniques to conduct our experiments. In this paper, we describe the basic framework, motivate the experimental design, and report on our findings supporting and qualifying the conclusions of the earlier research.


Systematic vs. Non-systematic Algorithms for Solving the MPE Task

arXiv.org Artificial Intelligence

The paper explores the power of two systematic Branch and Bound search algorithms that exploit partition-based heuristics, BBBT (a new algorithm for which the heuristic information is constructed during search and allows dynamic variable/value ordering) and its predecessor BBMB (for which the heuristic information is pre-compiled) and compares them against a number of popular local search algorithms for the MPE problem as well as against the recently popular iterative belief propagation algorithms. We show empirically that the new Branch and Bound algorithm, BBBT demonstrates tremendous pruning of the search space far beyond its predecessor, BBMB which translates to impressive time saving for some classes of problems. Second, when viewed as approximation schemes, BBBT/BBMB together are highly competitive with the best known SLS algorithms and are superior, especially when the domain sizes increase beyond 2. The results also show that the class of belief propagation algorithms can outperform SLS, but they are quite inferior to BBMBIBBBT. As far as we know, BBBT/BBMB are currently among the best performing algorithms for solving the MPE task.


Solving MAP Exactly using Systematic Search

arXiv.org Artificial Intelligence

MAP is the problem of finding a most probable instantiation of a set of variables in a Bayesian network given some evidence. Unlike computing posterior probabilities, or MPE (a special case of MAP), the time and space complexity of structural solutions for MAP are not only exponential in the network treewidth, but in a larger parameter known as the "constrained" treewidth. In practice, this means that computing MAP can be orders of magnitude more expensive than computing posterior probabilities or MPE. This paper introduces a new, simple upper bound on the probability of a MAP solution, which admits a tradeoff between the bound quality and the time needed to compute it. The bound is shown to be generally much tighter than those of other methods of comparable complexity. We use this proposed upper bound to develop a branch-and-bound search algorithm for solving MAP exactly. Experimental results demonstrate that the search algorithm is able to solve many problems that are far beyond the reach of any structure-based method for MAP. For example, we show that the proposed algorithm can compute MAP exactly and efficiently for some networks whose constrained treewidth is more than 40.


An Axiomatic Approach to Robustness in Search Problems with Multiple Scenarios

arXiv.org Artificial Intelligence

This paper is devoted to the search of robust solutions in state space graphs when costs depend on scenarios. We first present axiomatic requirements for preference compatibility with the intuitive idea of robustness.This leads us to propose the Lorenz dominance rule as a basis for robustness analysis. Then, after presenting complexity results about the determination of robust solutions, we propose a new sophistication of A* specially designed to determine the set of robust paths in a state space graph. The behavior of the algorithm is illustrated on a small example. Finally, an axiomatic justification of the refinement of robustness by an OWA criterion is provided.