Goto

Collaborating Authors

 Genre


Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics

Journal of Artificial Intelligence Research

The ability to associate images with natural language sentences that describe what is depicted in them is a hallmark of image understanding, and a prerequisite for applications such as sentence-based image search. In analogy to image search, we propose to frame sentence-based image annotation as the task of ranking a given pool of captions. We introduce a new benchmark collection for sentence-based image description and search, consisting of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events. We introduce a number of systems that perform quite well on this task, even though they are only based on features that can be obtained with minimal supervision. Our results clearly indicate the importance of training on multiple captions per image, and of capturing syntactic (word order-based) and semantic features of these captions. We also perform an in-depth comparison of human and automatic evaluation metrics for this task, and propose strategies for collecting human judgments cheaply and on a very large scale, allowing us to augment our collection with additional relevance judgments of which captions describe which image. Our analysis shows that metrics that consider the ranked list of results for each query image or sentence are significantly more robust than metrics that are based on a single response per query. Moreover, our study suggests that the evaluation of ranking-based image description systems may be fully automated.


Inconsistency of Pitman-Yor process mixtures for the number of components

arXiv.org Machine Learning

In population genetics, determining the "population structure" is an important step in the analysis of sampled data. As an illustrative example, consider the impala, a species of antelope in southern Africa. Impalas are divided into two subspecies: the common impala occupying much of the eastern half of the region, and the black-faced impala inhabiting a small area in the west. While common impalas are abundant, the number of black-faced impalas has been decimated by drought, poaching, and declining resources due to human and livestock expansion. To assist conservation efforts, Lorenzen, Arctander and Siegismund (2006) collected samples from 216 impalas, and analyzed the genetic variation between/within the two subspecies. A key part of their analysis consisted of inferring the population structure -- that is, partitioning the data into distinct populations, and in particular, determining how many such populations there are. To infer the impala population structure, Lorenzen et al. employed a widely-used tool called Structure (Pritchard, Stephens and Donnelly, 2000) which, in the simplest version, models the data as a finite mixture, with each component in the mixture corresponding to a dis-Supported in part by NSF grant DMS-1007593 and DARPA contract FA8650-11-1-715.


Separable Approximations and Decomposition Methods for the Augmented Lagrangian

arXiv.org Machine Learning

In this paper we study decomposition methods based on separable approximations for minimizing the augmented Lagrangian. In particular, we study and compare the Diagonal Quadratic Approximation Method (DQAM) of Mulvey and Ruszczy\'{n}ski and the Parallel Coordinate Descent Method (PCDM) of Richt\'arik and Tak\'a\v{c}. We show that the two methods are equivalent for feasibility problems up to the selection of a single step-size parameter. Furthermore, we prove an improved complexity bound for PCDM under strong convexity, and show that this bound is at least $8(L'/\bar{L})(\omega-1)^2$ times better than the best known bound for DQAM, where $\omega$ is the degree of partial separability and $L'$ and $\bar{L}$ are the maximum and average of the block Lipschitz constants of the gradient of the quadratic penalty appearing in the augmented Lagrangian.


Mixtures of Common Skew-t Factor Analyzers

arXiv.org Machine Learning

A mixture of common skew-t factor analyzers model is introduced for model-based clustering of high-dimensional data. By assuming common component factor loadings, this model allows clustering to be performed in the presence of a large number of mixture components or when the number of dimensions is too large to be well-modelled by the mixtures of factor analyzers model or a variant thereof. Furthermore, assuming that the component densities follow a skew-t distribution allows robust clustering of skewed data. The alternating expectation-conditional maximization algorithm is employed for parameter estimation. We demonstrate excellent clustering performance when our model is applied to real and simulated data.This paper marks the first time that skewed common factors have been used.


Derivation of Upper Bounds on Optimization Time of Population-Based Evolutionary Algorithm on a Function with Fitness Plateaus Using Elitism Levels Traverse Mechanism

arXiv.org Artificial Intelligence

In this article a tool for the analysis of population-based EAs is used to derive asymptotic upper bounds on the optimization time of the algorithm solving Royal Roads problem, a test function with plateaus of fitness. In addition to this, limiting distribution of a certain subset of the population is approximated.


A Hypergraph-Partitioned Vertex Programming Approach for Large-scale Consensus Optimization

arXiv.org Artificial Intelligence

In modern data science problems, techniques for extracting value from big data require performing large-scale optimization over heterogenous, irregularly structured data. Much of this data is best represented as multi-relational graphs, making vertex programming abstractions such as those of Pregel and GraphLab ideal fits for modern large-scale data analysis. In this paper, we describe a vertex-programming implementation of a popular consensus optimization technique known as the alternating direction of multipliers (ADMM). ADMM consensus optimization allows elegant solution of complex objectives such as inference in rich probabilistic models. We also introduce a novel hypergraph partitioning technique that improves over state-of-the-art partitioning techniques for vertex programming and significantly reduces the communication cost by reducing the number of replicated nodes up to an order of magnitude. We implemented our algorithm in GraphLab and measure scaling performance on a variety of realistic bipartite graph distributions and a large synthetic voter-opinion analysis application. In our experiments, we are able to achieve a 50% improvement in runtime over the current state-of-the-art GraphLab partitioning scheme.


A New Algorithm of Speckle Filtering using Stochastic Distances

arXiv.org Machine Learning

This paper presents a new approach for filter design based on stochastic distances and tests between distributions. A window is defined around each pixel, overlapping samples are compared and only those which pass a goodness-of-fit test are used to compute the filtered value. The technique is applied to intensity SAR data with homogeneous regions using the Gamma model. The proposal is compared with the Lee's filter using a protocol based on Monte Carlo. Among the criteria used to quantify the quality of filters, we employ the equivalent number of looks, line and edge preservation. Moreover, we also assessed the filters by the Universal Image Quality Index and the Pearson's correlation on edges regions.


Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

arXiv.org Machine Learning

We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called "path coding" penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs.


Heuristic Search When Time Matters

Journal of Artificial Intelligence Research

In many applications of shortest-path algorithms, it is impractical to find a provably optimal solution; one can only hope to achieve an appropriate balance between search time and solution cost that respects the user's preferences. Preferences come in many forms; we consider utility functions that linearly trade-off search time and solution cost. Many natural utility functions can be expressed in this form. For example, when solution cost represents the makespan of a plan, equally weighting search time and plan makespan minimizes the time from the arrival of a goal until it is achieved. Current state-of-the-art approaches to optimizing utility functions rely on anytime algorithms, and the use of extensive training data to compute a termination policy. We propose a more direct approach, called Bugsy, that incorporates the utility function directly into the search, obviating the need for a separate termination policy. We describe a new method based on off-line parameter tuning and a novel benchmark domain for planning under time pressure based on platform-style video games. We then present what we believe to be the first empirical study of applying anytime monitoring to heuristic search, and we compare it with our proposals. Our results suggest that the parameter tuning technique can give the best performance if a representative set of training instances is available. If not, then Bugsy is the algorithm of choice, as it performs well and does not require any off-line training. This work extends the tradition of research on metareasoning for search by illustrating the benefits of embedding lightweight reasoning about time into the search algorithm itself.


Acyclicity Notions for Existential Rules and Their Application to Query Answering in Ontologies

Journal of Artificial Intelligence Research

Answering conjunctive queries (CQs) over a set of facts extended with existential rules is a prominent problem in knowledge representation and databases. This problem can be solved using the chase algorithm, which extends the given set of facts with fresh facts in order to satisfy the rules. If the chase terminates, then CQs can be evaluated directly in the resulting set of facts. The chase, however, does not terminate necessarily, and checking whether the chase terminates on a given set of rules and facts is undecidable. Numerous acyclicity notions were proposed as sufficient conditions for chase termination. In this paper, we present two new acyclicity notions called model-faithful acyclicity (MFA) and model-summarising acyclicity (MSA). Furthermore, we investigate the landscape of the known acyclicity notions and establish a complete taxonomy of all notions known to us. Finally, we show that MFA and MSA generalise most of these notions. Existential rules are closely related to the Horn fragments of the OWL 2 ontology language; furthermore, several prominent OWL 2 reasoners implement CQ answering by using the chase to materialise all relevant facts. In order to avoid termination problems, many of these systems handle only the OWL 2 RL profile of OWL 2; furthermore, some systems go beyond OWL 2 RL, but without any termination guarantees. In this paper we also investigate whether various acyclicity notions can provide a principled and practical solution to these problems. On the theoretical side, we show that query answering for acyclic ontologies is of lower complexity than for general ontologies. On the practical side, we show that many of the commonly used OWL 2 ontologies are MSA, and that the number of facts obtained by materialisation is not too large. Our results thus suggest that principled development of materialisation-based OWL 2 reasoners is practically feasible.