Problem Solving
On the Relationship Between Variational Inference and Auto-Associative Memory
Annabi, Louis, Pitti, Alexandre, Quoy, Mathias
In this article, we propose a variational inference formulation of auto-associative memories, allowing us to combine perceptual inference and memory retrieval into the same mathematical framework. In this formulation, the prior probability distribution onto latent representations is made memory dependent, thus pulling the inference process towards previously stored representations. We then study how different neural network approaches to variational inference can be applied in this framework. We compare methods relying on amortized inference such as Variational Auto Encoders and methods relying on iterative inference such as Predictive Coding and suggest combining both approaches to design new auto-associative memory models. We evaluate the obtained algorithms on the CIFAR10 and CLEVR image datasets and compare them with other associative memory models such as Hopfield Networks, End-to-End Memory Networks and Neural Turing Machines.
MICO: A Multi-alternative Contrastive Learning Framework for Commonsense Knowledge Representation
Su, Ying, Wang, Zihao, Fang, Tianqing, Zhang, Hongming, Song, Yangqiu, Zhang, Tong
Commonsense reasoning tasks such as commonsense knowledge graph completion and commonsense question answering require powerful representation learning. In this paper, we propose to learn commonsense knowledge representation by MICO, a Multi-alternative contrastve learning framework on COmmonsense knowledge graphs (MICO). MICO generates the commonsense knowledge representation by contextual interaction between entity nodes and relations with multi-alternative contrastive learning. In MICO, the head and tail entities in an $(h,r,t)$ knowledge triple are converted to two relation-aware sequence pairs (a premise and an alternative) in the form of natural language. Semantic representations generated by MICO can benefit the following two tasks by simply comparing the distance score between the representations: 1) zero-shot commonsense question answering task; 2) inductive commonsense knowledge graph completion task. Extensive experiments show the effectiveness of our method.
Multi-step Planning for Automated Hyperparameter Optimization with OptFormer
Unlike myopic HPO methods, planning based approaches fundamentally require building models of the future to assess the impact of a current decision on later timesteps. Though these methods also rely on a GP as a surrogate model, each point in multi-step planning involves fantasizing/imagining an updated GP posterior ( ft 1 xt),…,( ft h xt, xt 1,…, xt h 1) based on simulated choices from lookaheads {( xt, yt),…,( xt h 1, yt h 1)} (Lam et al., 2016; Jiang et al., 2020). Note that we use xt to represent a fantasized decision, while xt is the actual choice made at timestep t. Whilst multi-step planning is promising, constructing the posterior of a GP model requires matrix inversion which is a compute-intensive operation (Cormen et al., 2022). Even outside of this limitation, traditional planning based approaches are compute intensive due to (i) poor scaling behavior of the search tree--O(qh) where q is the number of choices at each decision point for each lookahead step (Lam et al., 2016; Lam and Willcox, 2017)--which forces most methods to explore short horizons, typically h {1,2}, and (ii) nested expectation and maximization: marginalizing future observation yt j,j h and global search on the acquisition function to obtain query xt j at every lookahead step.
BLOX: Macro Neural Architecture Search Benchmark and Algorithms
Chau, Thomas Chun Pong, Dudziak, Łukasz, Wen, Hongkai, Lane, Nicholas Donald, Abdelfattah, Mohamed S
Neural architecture search (NAS) has been successfully used to design numerous high-performance neural networks. However, NAS is typically compute-intensive, so most existing approaches restrict the search to decide the operations and topological structure of a single block only, then the same block is stacked repeatedly to form an end-to-end model. Although such an approach reduces the size of search space, recent studies show that a macro search space, which allows blocks in a model to be different, can lead to better performance. To provide a systematic study of the performance of NAS algorithms on a macro search space, we release Blox - a benchmark that consists of 91k unique models trained on the CIFAR-100 dataset. The dataset also includes runtime measurements of all the models on a diverse set of hardware platforms. We perform extensive experiments to compare existing algorithms that are well studied on cell-based search spaces, with the emerging blockwise approaches that aim to make NAS scalable to much larger macro search spaces.
Policy Gradient With Serial Markov Chain Reasoning
Cetin, Edoardo, Celiktutan, Oya
We introduce a new framework that performs decision-making in reinforcement learning (RL) as an iterative reasoning process. We model agent behavior as the steady-state distribution of a parameterized reasoning Markov chain (RMC), optimized with a new tractable estimate of the policy gradient. We perform action selection by simulating the RMC for enough reasoning steps to approach its steady-state distribution. We show our framework has several useful properties that are inherently missing from traditional RL. For instance, it allows agent behavior to approximate any continuous distribution over actions by parameterizing the RMC with a simple Gaussian transition function. Moreover, the number of reasoning steps to reach convergence can scale adaptively with the difficulty of each action selection decision and can be accelerated by re-using past solutions. Our resulting algorithm achieves state-of-the-art performance in popular Mujoco and DeepMind Control benchmarks, both for proprioceptive and pixel-based tasks.
Non-Axiomatic Term Logic: A Computational Theory of Cognitive Symbolic Reasoning
This paper presents Non-Axiomatic Term Logic (NATL) as a theoretical computational framework of humanlike symbolic reasoning in artificial intelligence. NATL unites a discrete syntactic system inspired from Aristotle's term logic and a continuous semantic system based on the modern idea of distributed representations, or embeddings. This paper positions the proposed approach in the phylogeny and the literature of logic, and explains the framework. As it is yet no more than a theory and it requires much further elaboration to implement it, no quantitative evaluation is presented. Instead, qualitative analyses of arguments using NATL, some applications to possible cognitive science/robotics-related research, and remaining issues towards a machinery implementation are discussed.
A Rotated Hyperbolic Wrapped Normal Distribution for Hierarchical Representation Learning
Cho, Seunghyuk, Lee, Juyong, Park, Jaesik, Kim, Dongwoo
We present a rotated hyperbolic wrapped normal distribution (RoWN), a simple yet effective alteration of a hyperbolic wrapped normal distribution (HWN). The HWN expands the domain of probabilistic modeling from Euclidean to hyperbolic space, where a tree can be embedded with arbitrary low distortion in theory. In this work, we analyze the geometric properties of the diagonal HWN, a standard choice of distribution in probabilistic modeling. The analysis shows that the distribution is inappropriate to represent the data points at the same hierarchy level through their angular distance with the same norm in the Poincar\'e disk model. We then empirically verify the presence of limitations of HWN, and show how RoWN, the proposed distribution, can alleviate the limitations on various hierarchical datasets, including noisy synthetic binary tree, WordNet, and Atari 2600 Breakout. The code is available at https://github.com/ml-postech/RoWN.
How Well Do Multi-hop Reading Comprehension Models Understand Date Information?
Ho, Xanh, Sugawara, Saku, Aizawa, Akiko
Several multi-hop reading comprehension datasets have been proposed to resolve the issue of reasoning shortcuts by which questions can be answered without performing multi-hop reasoning. However, the ability of multi-hop models to perform step-by-step reasoning when finding an answer to a comparison question remains unclear. It is also unclear how questions about the internal reasoning process are useful for training and evaluating question-answering (QA) systems. To evaluate the model precisely in a hierarchical manner, we first propose a dataset, \textit{HieraDate}, with three probing tasks in addition to the main question: extraction, reasoning, and robustness. Our dataset is created by enhancing two previous multi-hop datasets, HotpotQA and 2WikiMultiHopQA, focusing on multi-hop questions on date information that involve both comparison and numerical reasoning. We then evaluate the ability of existing models to understand date information. Our experimental results reveal that the multi-hop models do not have the ability to subtract two dates even when they perform well in date comparison and number subtraction tasks. Other results reveal that our probing questions can help to improve the performance of the models (e.g., by +10.3 F1) on the main QA task and our dataset can be used for data augmentation to improve the robustness of the models.
TropeTwist: Trope-based Narrative Structure Generation
Games are complex, multi-faceted systems that share common elements This paper presents TropeTwist, a preliminar system that uses and underlying narratives, such as the conflict between a Tropes [21, 54] extracted from TvTropes [26, 46] as patterns and fundamental hero and a big bad enemy or pursuing a goal that requires overcoming units, which when combined can compose structures further challenges. However, identifying and describing these elements representing other composed tropes. Common narrative structures together is non-trivial as they might differ in certain properties can be identified and defined using TropeTwist. TropeTwist and how players might encounter the narratives. Likewise, generating can define generic aspects of a story, leading to the identification of narratives also pose difficulties when encoding, interpreting, events, roles, and narrative elements, as well as a novel way to form and evaluating them. To address this, we present TropeTwist, a narratives. As a proof-of-concept, we built, analyzed, and described trope-based system that can describe narrative structures in games structurally three game examples shown in figure 1, top row. in a more abstract and generic level, allowing the definition of We propose graph grammars as indirect encoding of narrative games' narrative structures and their generation using interconnected graphs and the use of the Multi-dimensional Archive of Phenotypic tropes, called narrative graphs. To demonstrate the system, Elites (MAP-Elites) [40] to generate novel variations (shown we represent the narrative structure of three different games. in figure 1, bottom row) using the proof-of-concept examples as We use MAP-Elites to generate and evaluate novel quality-diverse roots. Simultaneously, we propose metrics to evaluate the resulting narrative graphs encoded as graph grammars, using these three narrative graphs' coherence, cohesion, and interestingness.
Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems
Zhou, Fan, Dong, Haoyu, Liu, Qian, Cheng, Zhoujun, Han, Shi, Zhang, Dongmei
Numerical reasoning over natural language has been a long-standing goal for the research community. However, cutting-edge language models have proven difficult to reliably generalize to a broad range of numbers, although they have shown proficiency in reasoning over common and simple numbers. In this paper, we propose a novel method to elicit and exploit the numerical reasoning knowledge hidden in pre-trained language models using simple anchor numbers. Concretely, we first leverage simple numbers as anchors to probe the implicitly inferred arithmetic expressions from language models, and then explicitly apply the expressions on complex numbers to get corresponding answers. To inversely elicit arithmetic expressions, we transform and formulate the task as an analytically solvable linear system. Experimental results on several numerical reasoning benchmarks demonstrate that our approach significantly improves numerical reasoning capabilities of existing LMs. More importantly, our approach is training-free and simply works in the inference phase, making it highly portable and achieving consistent performance benefits across a variety of language models (GPT-3, T5, BART, etc) in all zero-shot, few-shot, and fine-tuning scenarios. Language Models (LMs) have demonstrated great success on a wide range of natural language tasks (Devlin et al., 2018; Brown et al., 2020b; Chowdhery et al., 2022), and recent works even explore to use LMs as a general-purpose interface for diverse modalities (Hao et al., 2022; Xie et al., 2022; He et al., 2022).