# Search

### Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search

We present a learning-based approach to computing solutions for certain NP-hard problems. Our approach combines deep learning techniques with useful algorithmic elements from classic heuristics. The central component is a graph convolutional network that is trained to estimate the likelihood, for each vertex in a graph, of whether this vertex is part of the optimal solution. The network is designed and trained to synthesize a diverse set of solutions, which enables rapid exploration of the solution space via tree search. The presented approach is evaluated on four canonical NP-hard problems and five datasets, which include benchmark satisfiability problems and real social network graphs with up to a hundred thousand nodes. Experimental results demonstrate that the presented approach substantially outperforms recent deep learning work, and performs on par with highly optimized state-of-the-art heuristic solvers for some NP-hard problems. Experiments indicate that our approach generalizes across datasets, and scales to graphs that are orders of magnitude larger than those used during training.

### Tic Tac Toe - Creating Unbeatable AI – Towards Data Science

In today's article, I am going to show you how to create an unbeatable AI agent that plays the classic Tic Tac Toe game. You will learn the concept of the Minimax algorithm that is widely and successfully used across the fields like Artificial Intelligence, Economics, Game Theory, Statistics or even Philosophy. Before we go into the AI part, let's make sure that we understand the game. I recommend you to play the game yourself, feel free to check out my iOS Tic Tac Toe app.

### Active Learning and CSI acquisition for mmWave Initial Alignment

Millimeter wave (mmWave) communication with large antenna arrays is a promising technique to enable extremely high data rates due to large available bandwidth. Given the knowledge of an optimal directional beamforming vector, large antenna arrays have been shown to overcome both the severe signal attenuation in mmWave. However, fundamental limits and achievable learning of an optimal beamforming vector remain. This paper considers the problem of adaptive and sequential optimization of the beamforming vectors during the initial access phase of communication. With a single-path channel model, the problem is reduced to actively learning the Angle-of-Arrival (AoA) of the signal sent from the user to the Base Station (BS). Drawing on the recent results in the design of a hierarchical beamforming codebook [1], sequential measurement dependent noisy search [2], and active learning from an imperfect labeler [3], an adaptive and sequential alignment algorithm is proposed. For any given resolution and error probability of the estimated AoA, an upper bound on the expected search time of the proposed algorithm is derived via the Extrinsic Jensen Shannon Divergence. The upper bound demonstrates that the search time of the proposed algorithm asymptotically matches the performance of the noiseless bisection search up to a constant factor characterizing the AoA acquisition rate. Furthermore, the acquired AoA error probability decays exponentially fast with the search time with an exponent that is a decreasing function of the acquisition rate.Numerically, the proposed algorithm is compared with prior work where a significant improvement of the system communication rate is observed. Most notably, in the relevant regime of low (- 10dB to 5dB) raw SNR, this establishes the first practically viable solution for initial access and, hence, the first demonstration of stand-alone mmWave communication.

### A Scalable Heuristic for Fastest-Path Computation on Very Large Road Maps

Fastest-path queries between two points in a very large road map is an increasingly important primitive in modern transportation and navigation systems, thus very efficient computation of these paths is critical for system performance and throughput. We present a method to compute an effective heuristic for the fastest path travel time between two points on a road map, which can be used to significantly accelerate the classical A* algorithm when computing fastest paths. Our method is based on two hierarchical sets of separators of the map represented by two binary trees. A preprocessing step computes a short vector of values per road junction based on the separator trees, which is then stored with the map and used to efficiently compute the heuristic at the online query stage. We demonstrate experimentally that this method scales well to any map size, providing a better quality heuristic, thus more efficient A* search, for fastest path queries between points at all distances - especially small and medium range - relative to other known heuristics.

### On the Network Visibility Problem

Social media is an attention economy where users are constantly competing for attention in their followers' feeds. Users are likely to elicit greater attention from their followers, their audience, if their posts remain visible at the top of their followers' feeds for a longer period of time. However, this depends on the rate at which their followers receive information in their feeds, which in turn depends on the users their followers follow. Then, who should follow whom to maximize the visibility each user achieve? In this paper, we represent users' posts and feeds using the framework of temporal point processes. Under this representation, the problem reduces to optimizing a non-submodular nondecreasing set function under matroid constraints. Then, we show that the set function satisfies a novel property, $\xi$-submodularity, which allows a simple and efficient greedy algorithm to enjoy theoretical guarantees. In particular, we prove that the greedy algorithm offers a $(1/\xi + 1)$ approximation factor, where $\xi$ is the strong submodularity ratio, a new measure of approximate submodularity that we are able to bound in our problem. Experiments on both synthetic and real data gathered from Twitter show that our greedy algorithm is able to consistently outperform several baselines.

### Low-rank semidefinite programming for the MAX2SAT problem

This paper proposes a new algorithm for solving MAX2SAT problems based on combining search methods with semidefinite programming approaches. Semidefinite programming techniques are well-known as a theoretical tool for approximating maximum satisfiability problems, but their application has traditionally been very limited by their speed and randomized nature. Our approach overcomes this difficult by using a recent approach to low-rank semidefinite programming, specialized to work in an incremental fashion suitable for use in an exact search algorithm. The method can be used both within complete or incomplete solver, and we demonstrate on a variety of problems from recent competitions. Our experiments show that the approach is faster (sometimes by orders of magnitude) than existing state-of-the-art complete and incomplete solvers, representing a substantial advance in search methods specialized for MAX2SAT problems.

### A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play

Computers can beat humans at increasingly complex games, including chess and Go. However, these programs are typically constructed for a particular game, exploiting its properties, such as the symmetries of the board on which it is played. Silver et al. developed a program called AlphaZero, which taught itself to play Go, chess, and shogi (a Japanese version of chess) (see the Editorial, and the Perspective by Campbell). AlphaZero managed to beat state-of-the-art programs specializing in these three games. The ability of AlphaZero to adapt to various game rules is a notable step toward achieving a general game-playing system.

### FoldingZero: Protein Folding from Scratch in Hydrophobic-Polar Model

De novo protein structure prediction from amino acid sequence is one of the most challenging problems in computational biology. As one of the extensively explored mathematical models for protein folding, Hydrophobic-Polar (HP) model enables thorough investigation of protein structure formation and evolution. Although HP model discretizes the conformational space and simplifies the folding energy function, it has been proven to be an NP-complete problem. In this paper, we propose a novel protein folding framework FoldingZero, self-folding a de novo protein 2D HP structure from scratch based on deep reinforcement learning. FoldingZero features the coupled approach of a two-head (policy and value heads) deep convolutional neural network (HPNet) and a regularized Upper Confidence Bounds for Trees (R-UCT). It is trained solely by a reinforcement learning algorithm, which improves HPNet and R-UCT iteratively through iterative policy optimization. Without any supervision and domain knowledge, FoldingZero not only achieves comparable results, but also learns the latent folding knowledge to stabilize the structure. Without exponential computation, FoldingZero shows promising potential to be adopted for real-world protein properties prediction.

### Game Tree Search in a Robust Multistage Optimization Framework: Exploiting Pruning Mechanisms

We investigate pruning in search trees of so-called quantified integer linear programs (QIPs). QIPs consist of a set of linear inequalities and a minimax objective function, where some variables are existentially and others are universally quantified. They can be interpreted as two-person zero-sum games between an existential and a universal player on the one hand, or multistage optimization problems under uncertainty on the other hand. Solutions are so-called winning strategies for the existential player that specify how to react on moves of the universal player - i.e. certain assignments of universally quantified variables - to certainly win the game. QIPs can be solved with the help of game tree search that is enhanced with non-chronological back-jumping. We develop and theoretically substantiate pruning techniques based upon (algebraic) properties similar to pruning mechanisms known from linear programming and quantified boolean formulas. The presented Strategic Copy-Pruning mechanism allows to \textit{implicitly} deduce the existence of a strategy in linear time (by static examination of the QIP-matrix) without explicitly traversing the strategy itself. We show that the implementation of our findings can massively speed up the search process.

### Single-Agent Policy Tree Search With Guarantees

We introduce two novel tree search algorithms that use a policy to guide search. The first algorithm is a best-first enumeration that uses a cost function that allows us to prove an upper bound on the number of nodes to be expanded before reaching a goal state. We show that this best-first algorithm is particularly well suited for `needle-in-a-haystack' problems. The second algorithm is based on sampling and we prove an upper bound on the expected number of nodes it expands before reaching a set of goal states. We show that this algorithm is better suited for problems where many paths lead to a goal. We validate these tree search algorithms on 1,000 computer-generated levels of Sokoban, where the policy used to guide the search comes from a neural network trained using A3C. Our results show that the policy tree search algorithms we introduce are competitive with a state-of-the-art domain-independent planner that uses heuristic search.