Search
Fast k-Nearest Neighbour Search via Dynamic Continuous Indexing
Existing methods for retrieving k-nearest neighbours suffer from the curse of dimensionality. We argue this is caused in part by inherent deficiencies of space partitioning, which is the underlying strategy used by most existing methods. We devise a new strategy that avoids partitioning the vector space and present a novel randomized algorithm that runs in time linear in dimensionality of the space and sub-linear in the intrinsic dimensionality and the size of the dataset and takes space constant in dimensionality of the space and linear in the size of the dataset. The proposed algorithm allows fine-grained control over accuracy and speed on a per-query basis, automatically adapts to variations in data density, supports dynamic updates to the dataset and is easy-to-implement. We show appealing theoretical properties and demonstrate empirically that the proposed algorithm outperforms locality-sensitivity hashing (LSH) in terms of approximation quality, speed and space efficiency.
Comparison Based Nearest Neighbor Search
Haghiri, Siavash, Ghoshdastidar, Debarghya, von Luxburg, Ulrike
We consider machine learning in a comparison-based setting where we are given a set of points in a metric space, but we have no access to the actual distances between the points. Instead, we can only ask an oracle whether the distance between two points $i$ and $j$ is smaller than the distance between the points $i$ and $k$. We are concerned with data structures and algorithms to find nearest neighbors based on such comparisons. We focus on a simple yet effective algorithm that recursively splits the space by first selecting two random pivot points and then assigning all other points to the closer of the two (comparison tree). We prove that if the metric space satisfies certain expansion conditions, then with high probability the height of the comparison tree is logarithmic in the number of points, leading to efficient search performance. We also provide an upper bound for the failure probability to return the true nearest neighbor. Experiments show that the comparison tree is competitive with algorithms that have access to the actual distance values, and needs less triplet comparisons than other competitors.
Propagators and Solvers for the Algebra of Modular Systems
Bogaerts, Bart, Ternovska, Eugenia, Mitchell, David
Complex artifacts are, of necessity, constructed by assembling simpler components. Software systems use libraries of reusable components, and often access multiple remote services. In this paper, we consider systems that can be formalized as solving the model expansion task for some class of finite structures. A wide range of problem solving and query answering systems are so accounted for. We present a method for automatically generating a solver for a complex system from a declarative definition of that system in terms of simpler modules, together with solvers for those modules. The work is motivated primarily by "knowledge-intensive" computing contexts, where the individual modules are defined in (possibly different) declarative languages, such as logical theories or logic programs, but can be applied anywhere the model expansion formalization can. The Algebra of Modular Systems (AMS) [48, 49], provides a way to define a complex module in terms of a collection of other modules, in purely semantic terms. Formally, each module in this algebra represents a class of structures, and a "solver" for the module solves the model expansion task for that class. That is, a solver for module M takes as input a structure A for a part of the vocabulary of M, and returns either a set of expansions of A that are in M, or the empty set.
Verizon to begin installing a new piece of search-based bloatware on Android phones
Anyone who has bought an Android phone through a carrier knows what a frustrating experience it can be. Aside from the lock-in due to a multi-year contract, carrier-specific phones are loaded with all sorts of apps and services that we don't want and, in many cases, can't get rid of. And now Verizon is looking to add another. According to TechCrunch (which is owned by Verizon), Big Red has partnered with Android app maker Evie to bring a new search-based launcher to the carrier's versions of Android phones. Like the highly rated Evie Launcher already for sale in the Play Store, AppFlash (as it is called) will "help users find content and services across different apps -- and Evie is working with Verizon to make this the default experience on customers' Android devices, popping up whenever they swipe to the left of their home screen," according to the report.
A Neural Probabilistic Structured-Prediction Method for Transition-Based Natural Language Processing
Zhou, Hao, Zhang, Yue, Cheng, Chuan, Huang, Shujian, Dai, Xinyu, Chen, Jiajun
We propose a neural probabilistic structured-prediction method for transition-based natural language processing, which integrates beam search and contrastive learning. The method uses a global optimization model, which can leverage arbitrary features over non-local context. Beam search is used for efficient heuristic decoding, and contrastive learning is performed for adjusting the model according to search errors. When evaluated on both chunking and dependency parsing tasks, the proposed method achieves significant accuracy improvements over the locally normalized greedy baseline on the two tasks, respectively.
Combinatorial Multi-armed Bandits for Real-Time Strategy Games
Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called "naive sampling", based on a variant of the Multi-armed Bandit problem called "Combinatorial Multi-armed Bandits" (CMAB). We analyze the theoretical properties of several variants of naive sampling, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, naive sampling outperforms the other sampling strategies.
Google tells invisible army of 'quality raters' to flag Holocaust denial
Google is using a 10,000-strong army of independent contractors to flag "offensive or upsetting" content, in order to ensure that queries like "did the Holocaust happen" don't push users to misinformation, propaganda and hate speech. The review of search terms is being done by the company's "quality raters", a little-known corps of worldwide contractors that Google uses to assess the quality of its systems. The raters are given searches based on real queries to conduct, and are asked to score the results on whether they meet the needs of users. These contractors, introduced to the company's review process in 2013, work from a huge manual describing every potential problem they could find with a given search query: whether or not it meets the user's expectations, whether the result offered is low or high quality, and whether it's spam, porn or illegal. In a new update to the rating system, rolled out on Tuesday, Google introduced another flag raters could use: the "upsetting-offensive" mark.
Numerical Integration and Dynamic Discretization in Heuristic Search Planning over Hybrid Domains
Ramirez, Miquel, Scala, Enrico, Haslum, Patrik, Thiebaux, Sylvie
In this paper we look into the problem of planning over hybrid domains, where change can be both discrete and instantaneous, or continuous over time. In addition, it is required that each state on the trajectory induced by the execution of plans complies with a given set of global constraints. We approach the computation of plans for such domains as the problem of searching over a deterministic state model. In this model, some of the successor states are obtained by solving numerically the so-called initial value problem over a set of ordinary differential equations (ODE) given by the current plan prefix. These equations hold over time intervals whose duration is determined dynamically, according to whether zero crossing events take place for a set of invariant conditions. The resulting planner, FS+, incorporates these features together with effective heuristic guidance. FS+ does not impose any of the syntactic restrictions on process effects often found on the existing literature on Hybrid Planning. A key concept of our approach is that a clear separation is struck between planning and simulation time steps. The former is the time allowed to observe the evolution of a given dynamical system before committing to a future course of action, whilst the later is part of the model of the environment. FS+ is shown to be a robust planner over a diverse set of hybrid domains, taken from the existing literature on hybrid planning and systems.
Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors?
Fu, Wei, Nair, Vivek, Menzies, Tim
Context: One of the black arts of data mining is learning the magic parameters which control the learners. In software analytics, at least for defect prediction, several methods, like grid search and differential evolution (DE), have been proposed to learn these parameters, which has been proved to be able to improve the performance scores of learners. Objective: We want to evaluate which method can find better parameters in terms of performance score and runtime cost. Methods: This paper compares grid search to differential evolution, which is an evolutionary algorithm that makes extensive use of stochastic jumps around the search space. Results: We find that the seemingly complete approach of grid search does no better, and sometimes worse, than the stochastic search. When repeated 20 times to check for conclusion validity, DE was over 210 times faster than grid search to tune Random Forests on 17 testing data sets with F-Measure Conclusions: These results are puzzling: why does a quick partial search be just as effective as a much slower, and much more, extensive search? To answer that question, we turned to the theoretical optimization literature. Bergstra and Bengio conjecture that grid search is not more effective than more randomized searchers if the underlying search space is inherently low dimensional. This is significant since recent results show that defect prediction exhibits very low intrinsic dimensionality-- an observation that explains why a fast method like DE may work as well as a seemingly more thorough grid search. This suggests, as a future research direction, that it might be possible to peek at data sets before doing any optimization in order to match the optimization algorithm to the problem at hand.