Search
To Beam Or Not To Beam: That is a Question of Cooperation for Language GANs
Scialom, Thomas, Dray, Paul-Alexis, Lamprier, Sylvain, Piwowarski, Benjamin, Staiano, Jacopo
Due to the discrete nature of words, language GANs require to be optimized from rewards provided by discriminator networks, via reinforcement learning methods. This is a much harder setting than for continuous tasks, which enjoy gradient flows from discriminators to generators, usually leading to dramatic learning instabilities. However, we claim that this can be solved by making discriminator and generator networks cooperate to produce output sequences during training. These cooperative outputs, inherently built to obtain higher discrimination scores, not only provide denser rewards for training, but also form a more compact artificial set for discriminator training, hence improving its accuracy and stability. In this paper, we show that our SelfGAN framework, built on this cooperative principle, outperforms Teacher Forcing and obtains state-of-the-art results on two challenging tasks, Summarization and Question Generation.
Meta-Learning for Symbolic Hyperparameter Defaults
Gijsbers, Pieter, Pfisterer, Florian, van Rijn, Jan N., Bischl, Bernd, Vanschoren, Joaquin
Hyperparameter optimization in machine learning (ML) deals with the problem of empirically learning an optimal algorithm configuration from data, usually formulated as a black-box optimization problem. In this work, we propose a zero-shot method to meta-learn symbolic default hyperparameter configurations that are expressed in terms of the properties of the dataset. This enables a much faster, but still data-dependent, configuration of the ML algorithm, compared to standard hyperparameter optimization approaches. In the past, symbolic and static default values have usually been obtained as hand-crafted heuristics. We propose an approach of learning such symbolic configurations as formulas of dataset properties from a large set of prior evaluations on multiple datasets by optimizing over a grammar of expressions using an evolutionary algorithm. We evaluate our method on surrogate empirical performance models as well as on real data across 6 ML algorithms on more than 100 datasets and demonstrate that our method indeed finds viable symbolic defaults.
The Complexity of Sparse Tensor PCA
We study the problem of sparse tensor principal component analysis: given a tensor $\pmb Y = \pmb W + \lambda x^{\otimes p}$ with $\pmb W \in \otimes^p\mathbb{R}^n$ having i.i.d. Gaussian entries, the goal is to recover the $k$-sparse unit vector $x \in \mathbb{R}^n$. The model captures both sparse PCA (in its Wigner form) and tensor PCA. For the highly sparse regime of $k \leq \sqrt{n}$, we present a family of algorithms that smoothly interpolates between a simple polynomial-time algorithm and the exponential-time exhaustive search algorithm. For any $1 \leq t \leq k$, our algorithms recovers the sparse vector for signal-to-noise ratio $\lambda \geq \tilde{\mathcal{O}} (\sqrt{t} \cdot (k/t)^{p/2})$ in time $\tilde{\mathcal{O}}(n^{p+t})$, capturing the state-of-the-art guarantees for the matrix settings (in both the polynomial-time and sub-exponential time regimes). Our results naturally extend to the case of $r$ distinct $k$-sparse signals with disjoint supports, with guarantees that are independent of the number of spikes. Even in the restricted case of sparse PCA, known algorithms only recover the sparse vectors for $\lambda \geq \tilde{\mathcal{O}}(k \cdot r)$ while our algorithms require $\lambda \geq \tilde{\mathcal{O}}(k)$. Finally, by analyzing the low-degree likelihood ratio, we complement these algorithmic results with rigorous evidence illustrating the trade-offs between signal-to-noise ratio and running time. This lower bound captures the known lower bounds for both sparse PCA and tensor PCA. In this general model, we observe a more intricate three-way trade-off between the number of samples $n$, the sparsity $k$, and the tensor power $p$.
Google will update its search algorithm to discourage slander extortion schemes
Extortionists thrive on posting online slander that costs a fortune to take down, and Google wants to put a stop to it. The company told the New York Times that it's changing its search algorithm to prevent slander sites from surfacing when you look for a person's name. The company also recently implemented a "known victims" system that will suppress these sites if you report attacks. While some measures are already in place, others are slated to take effect in the months ahead. Google VP David Graff was quick to warn this wouldn't be a "perfect solution," but hoped that it would have a "significant and positive" effect.
Visual scoping operations for physical assembly
Binder, Felix J, Mattar, Marcelo M, Kirsh, David, Fan, Judith E
Planning is hard. The use of subgoals can make planning more tractable, but selecting these subgoals is computationally costly. What algorithms might enable us to reap the benefits of planning using subgoals while minimizing the computational overhead of selecting them? We propose visual scoping, a strategy that interleaves planning and acting by alternately defining a spatial region as the next subgoal and selecting actions to achieve it. We evaluated our visual scoping algorithm on a variety of physical assembly problems against two baselines: planning all subgoals in advance and planning without subgoals. We found that visual scoping achieves comparable task performance to the subgoal planner while requiring only a fraction of the total computational cost. Together, these results contribute to our understanding of how humans might make efficient use of cognitive resources to solve complex planning problems.
NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning
Chen, Zeming, Gao, Qiyue, Moss, Lawrence S.
Deep learning (DL) based language models achieve high performance on various benchmarks for Natural Language Inference (NLI). And at this time, symbolic approaches to NLI are receiving less attention. Both approaches (symbolic and DL) have their advantages and weaknesses. However, currently, no method combines them in a system to solve the task of NLI. To merge symbolic and deep learning methods, we propose an inference framework called NeuralLog, which utilizes both a monotonicity-based logical inference engine and a neural network language model for phrase alignment. Our framework models the NLI task as a classic search problem and uses the beam search algorithm to search for optimal inference paths. Experiments show that our joint logic and neural inference system improves accuracy on the NLI task and can achieve state-of-art accuracy on the SICK and MED datasets.
Fast and More Powerful Selective Inference for Sparse High-order Interaction Model
Das, Diptesh, Duy, Vo Nguyen Le, Hanada, Hiroyuki, Tsuda, Koji, Takeuchi, Ichiro
Automated high-stake decision-making such as medical diagnosis requires models with high interpretability and reliability. As one of the interpretable and reliable models with good prediction ability, we consider Sparse High-order Interaction Model (SHIM) in this study. However, finding statistically significant high-order interactions is challenging due to the intrinsic high dimensionality of the combinatorial effects. Another problem in data-driven modeling is the effect of "cherry-picking" a.k.a. selection bias. Our main contribution is to extend the recently developed parametric programming approach for selective inference to high-order interaction models. Exhaustive search over the cherry tree (all possible interactions) can be daunting and impractical even for a small-sized problem. We introduced an efficient pruning strategy and demonstrated the computational efficiency and statistical power of the proposed method using both synthetic and real data.
Vector Symbolic Architectures as a Computing Framework for Nanoscale Hardware
Kleyko, Denis, Davies, Mike, Frady, E. Paxon, Kanerva, Pentti, Kent, Spencer J., Olshausen, Bruno A., Osipov, Evgeny, Rabaey, Jan M., Rachkovskij, Dmitri A., Rahimi, Abbas, Sommer, Friedrich T.
This article reviews recent progress in the development of the computing framework Vector Symbolic Architectures (also known as Hyperdimensional Computing). This framework is well suited for implementation in stochastic, nanoscale hardware and it naturally expresses the types of cognitive operations required for Artificial Intelligence (AI). We demonstrate in this article that the ring-like algebraic structure of Vector Symbolic Architectures offers simple but powerful operations on high-dimensional vectors that can support all data structures and manipulations relevant in modern computing. In addition, we illustrate the distinguishing feature of Vector Symbolic Architectures, "computing in superposition," which sets it apart from conventional computing. This latter property opens the door to efficient solutions to the difficult combinatorial search problems inherent in AI applications. Vector Symbolic Architectures are Turing complete, as we show, and we see them acting as a framework for computing with distributed representations in myriad AI settings. This paper serves as a reference for computer architects by illustrating techniques and philosophy of VSAs for distributed computing and relevance to emerging computing hardware, such as neuromorphic computing.
Efficient Active Search for Combinatorial Optimization Problems
Hottung, André, Kwon, Yeong-Dae, Tierney, Kevin
Recently numerous machine learning based methods for combinatorial optimization problems have been proposed that learn to construct solutions in a sequential decision process via reinforcement learning. While these methods can be easily combined with search strategies like sampling and beam search, it is not straightforward to integrate them into a high-level search procedure offering strong search guidance. Bello et al. (2016) propose active search, which adjusts the weights of a (trained) model with respect to a single instance at test time using reinforcement learning. While active search is simple to implement, it is not competitive with state-of-the-art methods because adjusting all model weights for each test instance is very time and memory intensive. Instead of updating all model weights, we propose and evaluate three efficient active search strategies that only update a subset of parameters during the search. The proposed methods offer a simple way to significantly improve the search performance of a given model and outperform state-of-the-art machine learning based methods on combinatorial problems, even surpassing the well-known heuristic solver LKH3 on the capacitated vehicle routing problem. Finally, we show that (efficient) active search enables learned models to effectively solve instances that are much larger than those seen during training.
Planning for Novelty: Width-Based Algorithms for Common Problems in Control, Planning and Reinforcement Learning
Width-based algorithms search for solutions through a general definition of state novelty. These algorithms have been shown to result in state-of-the-art performance in classical planning, and have been successfully applied to model-based and model-free settings where the dynamics of the problem are given through simulation engines. Width-based algorithms performance is understood theoretically through the notion of planning width, providing polynomial guarantees on their runtime and memory consumption. To facilitate synergies across research communities, this paper summarizes the area of width-based planning, and surveys current and future research directions.