Search
Principled Neural Architecture Learning - Intel AI
A neural architecture, which is the structure and connectivity of the network, is typically either hand-crafted or searched by optimizing some specific objective criterion (e.g., classification accuracy). Since the space of all neural architectures is huge, search methods are usually heuristic and do not guarantee finding the optimal architecture, with respect to the objective criterion. In addition, these search methods might require a large number of supervised training iterations and use a high amount of computational resources, rendering the solution infeasible for many applications. Moreover, optimizing for a specific criterion might result in a model that is suboptimal for other useful criteria such as model size, representation of uncertainty and robustness to adversarial attacks. Thus, the resulting architectures of most strategies used today, whether hand crafting or heuristic searches, are densely connected networks, which are not an optimal solution for the objective they were created to achieve, let alone other objectives.
Discovering Reliable Correlations in Categorical Data
Mandros, Panagiotis, Boley, Mario, Vreeken, Jilles
In many scientific tasks we are interested in discovering whether there exist any correlations in our data. This raises many questions, such as how to reliably and interpretably measure correlation between a multivariate set of attributes, how to do so without having to make assumptions on distribution of the data or the type of correlation, and, how to efficiently discover the top-most reliably correlated attribute sets from data. In this paper we answer these questions for discovery tasks in categorical data. In particular, we propose a corrected-for-chance, consistent, and efficient estimator for normalized total correlation, by which we obtain a reliable, naturally interpretable, non-parametric measure for correlation over multivariate sets. For the discovery of the top-k correlated sets, we derive an effective algorithmic framework based on a tight bounding function. This framework offers exact, approximate, and heuristic search. Empirical evaluation shows that already for small sample sizes the estimator leads to low-regret optimization outcomes, while the algorithms are shown to be highly effective for both large and high-dimensional data. Through two case studies we confirm that our discovery framework identifies interesting and meaningful correlations.
Automating Agential Reasoning: Proof-Calculi and Syntactic Decidability for STIT Logics
This work provides proof-search algorithms and automated counter-model extraction for a class of STIT logics. With this, we answer an open problem concerning syntactic decision procedures and cut-free calculi for STIT logics. A new class of cut-free complete labelled sequent calculi G3LdmL^m_n, for multi-agent STIT with at most n-many choices, is introduced. We refine the calculi G3LdmL^m_n through the use of propagation rules and demonstrate the admissibility of their structural rules, resulting in auxiliary calculi Ldm^m_nL. In the single-agent case, we show that the refined calculi Ldm^m_nL derive theorems within a restricted class of (forestlike) sequents, allowing us to provide proof-search algorithms that decide single-agent STIT logics. We prove that the proof-search algorithms are correct and terminate.
Improving a State-of-the-Art Heuristic for the Minimum Latency Problem with Data Mining
Recently, hybrid metaheuristics have become a trend in operations research. A successful example combines the Greedy Randomized Adaptive Search Procedures (GRASP) and data mining techniques, where frequent patterns found in high-quality solutions can lead to an efficient exploration of the search space, along with a significant reduction of computational time. In this work, a GRASP-based state-of-the-art heuristic for the Minimum Latency Problem (MLP) is improved by means of data mining techniques for two MLP variants. Computational experiments showed that the approaches with data mining were able to match or improve the solution quality for a large number of instances, together with a substantial reduction of running time. In addition, 88 new cost values of solutions are introduced into the literature. To support our results, tests of statistical significance, impact of using mined patterns, equal time comparisons and time-to-target plots are provided.
Feature Gradients: Scalable Feature Selection via Discrete Relaxation
In this paper we introduce Feature Gradients, a gradient-based search algorithm for feature selection. Our approach extends a recent result on the estimation of learnability in the sublinear data regime by showing that the calculation can be performed iteratively (i.e., in mini-batches) and in linear time and space with respect to both the number of features D and the sample size N . This, along with a discrete-to-continuous relaxation of the search domain, allows for an efficient, gradient-based search algorithm among feature subsets for very large datasets. Crucially, our algorithm is capable of finding higher-order correlations between features and targets for both the N > D and N < D regimes, as opposed to approaches that do not consider such interactions and/or only consider one regime. We provide experimental demonstration of the algorithm in small and large sample-and feature-size settings.
On the Minimax Optimality of Estimating the Wasserstein Metric
We study the minimax optimal rate for estimating the Wasserstein-$1$ metric between two unknown probability measures based on $n$ i.i.d. empirical samples from them. We show that estimating the Wasserstein metric itself between probability measures, is not significantly easier than estimating the probability measures under the Wasserstein metric. We prove that the minimax optimal rates for these two problems are multiplicatively equivalent, up to a $\log \log (n)/\log (n)$ factor.
On the Bounds of Function Approximations
Within machine learning, the subfield of Neural Architecture Search (NAS) has recently garnered research attention due to its ability to improve upon human-designed models. However, the computational requirements for finding an exact solution to this problem are often intractable, and the design of the search space still requires manual intervention. In this paper we attempt to establish a formalized framework from which we can better understand the computational bounds of NAS in relation to its search space. For this, we first reformulate the function approximation problem in terms of sequences of functions, and we call it the Function Approximation (FA) problem; then we show that it is computationally infeasible to devise a procedure that solves FA for all functions to zero error, regardless of the search space. We show also that such error will be minimal if a specific class of functions is present in the search space. Subsequently, we show that machine learning as a mathematical problem is a solution strategy for FA, albeit not an effective one, and further describe a stronger version of this approach: the Approximate Architectural Search Problem (a-ASP), which is the mathematical equivalent of NAS. We leverage the framework from this paper and results from the literature to describe the conditions under which a-ASP can potentially solve FA as well as an exhaustive search, but in polynomial time.
Nearest Neighbor Search-Based Bitwise Source Separation Using Discriminant Winner-Take-All Hashing
We propose an iteration-free source separation algorithm based on Winner-Take-All (WTA) hash codes, which is a faster, yet accurate alternative to a complex machine learning model for single-channel source separation in a resource-constrained environment. We first generate random permutations with WTA hashing to encode the shape of the multidimensional audio spectrum to a reduced bitstring representation. A nearest neighbor search on the hash codes of an incoming noisy spectrum as the query string results in the closest matches among the hashed mixture spectra. Using the indices of the matching frames, we obtain the corresponding ideal binary mask vectors for denoising. Since both the training data and the search operation are bitwise, the procedure can be done efficiently in hardware implementations. Experimental results show that the WTA hash codes are discriminant and provide an affordable dictionary search mechanism that leads to a competent performance compared to a comprehensive model and oracle masking.
Artificial Intelligence Search, NLP & Automation
Another significant requirement is the need to find an efficient method for reducing the amount of computational searching for a match or a solution. Considerable important work has been done on the problem of pruning a search space without affecting the result of the search. One technique is to compare the value of completing a particular branch versus another. Of course, the measurement of value is a problem. As real-time applications become more important, search methods must become even more efficient in order for an Al system to run in real-time. There has been an increasing amount of work on the problem of language understanding.
Exploring the Performance of Deep Residual Networks in Crazyhouse Chess
Crazyhouse is a chess variant that incorporates all of the classical chess rules, but allows users to drop pieces captured from the opponent as a normal move. Until 2018, all competitive computer engines for this board game made use of an alpha-beta pruning algorithm with a hand-crafted evaluation function for each position. Previous machine learning-based algorithms for just regular chess, such as NeuroChess and Giraffe, took hand-crafted evaluation features as input rather than a raw board representation. More recent projects, such as AlphaZero, reached massive success but required massive computational resources in order to reach its final strength. This paper describes the development of SixtyFour, an engine designed to compete in the chess variant of Crazyhouse with limited hardware. This specific variant poses a multitude of significant challenges due to its large branching factor, state-space complexity, and the multiple move types a player can make. We propose the novel creation of a neural network-based evaluation function for Crazyhouse. More importantly, we evaluate the effectiveness of an ensemble model, which allows the training time and datasets to be easily distributed on regular CPU hardware commodity. Early versions of the network have attained a playing level comparable to a strong amateur on online servers.