Goto

Collaborating Authors

 Search


The IBaCoP Planning System: Instance-Based Configured Portfolios

Journal of Artificial Intelligence Research

Sequential planning portfolios are very powerful in exploiting the complementary strength of different automated planners. The main challenge of a portfolio planner is to define which base planners to run, to assign the running time for each planner and to decide in what order they should be carried out to optimize a planning metric. Portfolio configurations are usually derived empirically from training benchmarks and remain fixed for an evaluation phase. In this work, we create a per-instance configurable portfolio, which is able to adapt itself to every planning task. The proposed system pre-selects a group of candidate planners using a Pareto-dominance filtering approach and then it decides which planners to include and the time assigned according to predictive models. These models estimate whether a base planner will be able to solve the given problem and, if so, how long it will take. We define different portfolio strategies to combine the knowledge generated by the models. The experimental evaluation shows that the resulting portfolios provide an improvement when compared with non-informed strategies. One of the proposed portfolios was the winner of the Sequential Satisficing Track of the International Planning Competition held in 2014.


Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

arXiv.org Machine Learning

We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is described and analyzed in the context of online combinatorial optimization under semi-bandit feedback, where a learner sequentially selects its actions from a combinatorial decision set so as to minimize its cumulative loss. In particular, we show that the well-known Follow-the-Perturbed-Leader (FPL) prediction method coupled with Geometric Resampling yields the first computationally efficient reduction from offline to online optimization in this setting. We provide a thorough theoretical analysis for the resulting algorithm, showing that its performance is on par with previous, inefficient solutions. Our main contribution is showing that, despite the relatively large variance induced by the GR procedure, our performance guarantees hold with high probability rather than only in expectation. As a side result, we also improve the best known regret bounds for FPL in online combinatorial optimization with full feedback, closing the perceived performance gap between FPL and exponential weights in this setting.


Fragment by copy and trim

#artificialintelligence

This is part of a series of essays on how to fragment a molecular graph using RDKit. These are meant to describe the low-level steps that go into fragmentation, and the ways to think about testing. How do you tell if an algorithm is correct? Sometimes you can inspect the code. More often you have test cases where you know the correct answer.


Fast k-NN search

arXiv.org Machine Learning

Efficient index structures for fast approximate nearest neighbor queries are required in many applications such as recommendation systems. In high-dimensional spaces, many conventional methods suffer from excessive usage of memory and slow response times. We propose a method where multiple random projection trees are combined by a novel voting scheme. The key idea is to exploit the redundancy in a large number of candidate sets obtained by independently generated random projections in order to reduce the number of expensive exact distance evaluations. The method is straightforward to implement using sparse projections which leads to a reduced memory footprint and fast index construction. Furthermore, it enables grouping of the required computations into big matrix multiplications, which leads to additional savings due to cache effects and low-level parallelization. We demonstrate by extensive experiments on a wide variety of data sets that the method is faster than existing partitioning tree or hashing based approaches, making it the fastest available technique on high accuracy levels.


Are Machine Learning Search Algorithms To Blame For Stereotypes?

#artificialintelligence

Do machine-learning algorithms processing search engine queries bring on prejudice, discrimination and stereotyping in query results? The paper submitted to the International Conference on Social Informatics scheduled for publication analyzes how Google and Bing represent female beauty in their image search results, particularly when it comes to different age and racial groups. For nearly every country analyzed, white women appear more in the "beautiful" results, and black and Asian women appear in the "ugly" ones, per The Washington Post, which initially pointed to the study. Searches for "ugly" women return images of those about 60% white and 20% black between the ages of 30 to 50.


Are Machine Learning Search Algorithms To Blame For Stereotypes?

#artificialintelligence

Do machine-learning algorithms processing search engine queries bring on prejudice, discrimination and stereotyping in query results? Search results have been known to highlight these negative attributes in the past. Now researchers at Brazil's Universidade Federal de Minas Gerais suggest it could be true when it comes to female physical attractiveness in images available across the Web. The paper submitted to the International Conference on Social Informatics scheduled for publication analyzes how Google and Bing represent female beauty in their image search results, particularly when it comes to different age and racial groups. They then passed the more than 2,000 images through a program, which estimates subject age, race and gender with an estimated 90% accuracy.


Computational Biology in the 21st Century

Communications of the ACM

Computational biologists answer biological and biomedical questions by using computation in support of--or in place of--laboratory procedures, hoping to obtain more accurate answers at a greatly reduced cost. The past two decades have seen unprecedented technological progress with regard to generating biological data; next-generation sequencing, mass spectrometry, microarrays, cryo-electron microscopy, and other high-throughput approaches have led to an explosion of data. However, this explosion is a mixed blessing. On the one hand, the scale and scope of data should allow new insights into genetic and infectious diseases, cancer, basic biology, and even human migration patterns. On the other hand, researchers are generating datasets so massive that it has become difficult to analyze them to discover patterns that give clues to the underlying biological processes. Certainly, computers are getting faster and more economical; the amount of processing available per dollar of computer hardware is more or less doubling every year or two; a similar claim can be made about storage capacity (Figure 1). In 2002, when the first human genome was sequenced, the growth in computing power was still matching the growth rate of genomic data. However, the sequencing technology used for the Human Genome Project--Sanger sequencing--was supplanted around 2004, with the advent of what is now known as next-generation sequencing. The material costs to sequence a genome have plummeted in the past decade, to the point where a whole human genome can be sequenced for less than US 1,000.


Linear Regression with an Unknown Permutation: Statistical and Computational Limits

arXiv.org Machine Learning

Consider a noisy linear observation model with an unknown permutation, based on observing $y = \Pi^* A x^* + w$, where $x^* \in \mathbb{R}^d$ is an unknown vector, $\Pi^*$ is an unknown $n \times n$ permutation matrix, and $w \in \mathbb{R}^n$ is additive Gaussian noise. We analyze the problem of permutation recovery in a random design setting in which the entries of the matrix $A$ are drawn i.i.d. from a standard Gaussian distribution, and establish sharp conditions on the SNR, sample size $n$, and dimension $d$ under which $\Pi^*$ is exactly and approximately recoverable. On the computational front, we show that the maximum likelihood estimate of $\Pi^*$ is NP-hard to compute, while also providing a polynomial time algorithm when $d =1$.


Efficient Dodgson-Score Calculation Using Heuristics and Parallel Computing

arXiv.org Artificial Intelligence

Conflict of interest is the permanent companion of any population of agents (computational or biological). For that reason, the ability to compromise is of paramount importance, making voting a key element of societal mechanisms. One of the voting procedures most often discussed in the literature and, due to its intuitiveness, also conceptually quite appealing is Charles Dodgson's scoring rule, basically using the respective closeness to being a Condorcet winner for evaluating competing alternatives. In this paper, we offer insights on the practical limits of algorithms computing the exact Dodgson scores from a number of votes. While the problem itself is theoretically intractable, this work proposes and analyses five different solutions which try distinct approaches to practically solve the issue in an effective manner. Additionally, three of the discussed procedures can be run in parallel which has the potential of drastically reducing the problem size.


Time-Bounded Best-First Search for Reversible and Non-reversible Search Graphs

Journal of Artificial Intelligence Research

Time-Bounded A* is a real-time, single-agent, deterministic search algorithm that expands states of a graph in the same order as A* does, but that unlike A* interleaves search and action execution. Known to outperform state-of-the-art real-time search algorithms based on Korf's Learning Real-Time A* (LRTA*) in some benchmarks, it has not been studied in detail and is sometimes not considered as a ``true'' real-time search algorithm since it fails in non-reversible problems even it the goal is still reachable from the current state. In this paper we propose and study Time-Bounded Best-First Search (TB(BFS)) a straightforward generalization of the time-bounded approach to any best-first search algorithm. Furthermore, we propose Restarting Time-Bounded Weighted A* (TB_R(WA*)), an algorithm that deals more adequately with non-reversible search graphs, eliminating ``backtracking moves'' and incorporating search restarts and heuristic learning. In non-reversible problems we prove that TB(BFS) terminates and we deduce cost bounds for the solutions returned by Time-Bounded Weighted A* (TB(WA*)), an instance of TB(BFS). Furthermore, we prove TB_R(WA*), under reasonable conditions, terminates. We evaluate TB(WA) in both grid pathfinding and the 15-puzzle. In addition, we evaluate TB_R(WA*) on the racetrack problem. We compare our algorithms to LSS-LRTWA*, a variant of LRTA* that can exploit lookahead search and a weighted heuristic. A general observation is that the performance of both TB(WA*) and TB_R(WA*) improves as the weight parameter is increased. In addition, our time-bounded algorithms almost always outperform LSS-LRTWA* by a significant margin.