Problem Solving
Kinodynamic FMT* with Dimensionality Reduction Heuristics and Neural Network Controllers
Zheng, Dongliang, Tsiotras, Panagiotis
This paper proposes a new sampling-based kinodynamic motion planning algorithm, called FMT*PFF, for nonlinear systems. It exploits the novel idea of dimensionality reduction using partial-final-state-free (PFF) optimal controllers.With the proposed dimensionality reduction heuristic, the search space is restricted within a subspace, thus faster convergence is achieved compared to a regular kinodynamic FMT*. The dimensionality reduction heuristic can be viewed as a sampling strategy and asymptotic optimality is preserved when combined with uniform full-state sampling. Another feature of FMT*PFF is the ability to deal with a steering function with inexact steering, which is vital when using learning-based steering functions. Learning-based methods allow us to solve the steering problem for nonlinear systems efficiently. However, learning-based methods often fail to reach the exact goal state. For nonlinear systems, we train a neural network controller using supervised learning to generate the steering commands. We show that FMT*PFF with a learning-based steering function is efficient and generates dynamically feasible motion plans. We compare our algorithm with previous algorithms and show superior performance in various simulations.
Multi-Predict: Few Shot Predictors For Efficient Neural Architecture Search
Akhauri, Yash, Abdelfattah, Mohamed S.
Many hardware-aware neural architecture search (NAS) methods have been developed to optimize the topology of neural networks (NN) with the joint objectives of higher accuracy and lower latency. Recently, both accuracy and latency predictors have been used in NAS with great success, achieving high sample efficiency and accurate modeling of hardware (HW) device latency respectively. However, a new accuracy predictor needs to be trained for every new NAS search space or NN task, and a new latency predictor needs to be additionally trained for every new HW device. In this paper, we explore methods to enable multi-task, multi-search-space, and multi-HW adaptation of accuracy and latency predictors to reduce the cost of NAS. We introduce a novel search-space independent NN encoding based on zero-cost proxies that achieves sample-efficient prediction on multiple tasks and NAS search spaces, improving the end-to-end sample efficiency of latency and accuracy predictors by over an order of magnitude in multiple scenarios. For example, our NN encoding enables multi-search-space transfer of latency predictors from NASBench-201 to FBNet (and vice-versa) in under 85 HW measurements, a 400$\times$ improvement in sample efficiency compared to a recent meta-learning approach. Our method also improves the total sample efficiency of accuracy predictors by over an order of magnitude. Finally, we demonstrate the effectiveness of our method for multi-search-space and multi-task accuracy prediction on 28 NAS search spaces and tasks.
Visual Question Answering: A Survey on Techniques and Common Trends in Recent Literature
de Faria, Ana Clรกudia Akemi Matsuki, Bastos, Felype de Castro, da Silva, Josรฉ Victor Nogueira Alves, Fabris, Vitor Lopes, Uchoa, Valeska de Sousa, Neto, Dรฉcio Gonรงalves de Aguiar, Santos, Claudio Filipi Goncalves dos
Visual Question Answering (VQA) is a multi-disciplinary artificial intelligence research problem that has attracted the attention of researchers from computer vision, natural language processing, knowledge representation, and other machine learning communities. To solve that question, VQA is a task of generating natural language answers when a question in natural language is asked related to an image. In recent years, visual question answering as a result of the flourish in this field, datasets, metrics, and models have been proposed, and the scope of research has been expanded. Although artificial intelligence has solved several different problems, such as image classification and natural language processing (NLP), it is hard to model a problem which needs different types of data. For instance, mixing computer vision with NLP to retrieve some information about an image from a question has tricked researchers for several years.
A Survey on Machine Learning Solutions for Graph Pattern Extraction
Yow, Kai Siong, Liao, Ningyi, Luo, Siqiang, Cheng, Reynold, Ma, Chenhao, Han, Xiaolin
A subgraph is constructed by using a subset of vertices and edges of a given graph. There exist many graph properties that are hereditary for subgraphs. Hence, researchers from different communities have paid a great deal of attention in studying numerous subgraph problems, on top of the ordinary graph problems. Many algorithms are proposed in studying subgraph problems, where one common approach is by extracting the patterns and structures of a given graph. Due to the complex structures of certain types of graphs and to improve overall performances of the existing frameworks, machine learning techniques have recently been employed in dealing with various subgraph problems. In this article, we present a comprehensive review on five well known subgraph problems that have been tackled by using machine learning methods. They are subgraph isomorphism (both counting and matching), maximum common subgraph, community detection and community search problems. We provide an outline of each proposed method, and examine its designs and performances. We also explore non-learning-based algorithms for each problem and a brief discussion is given. We then suggest some promising research directions in this area, hoping that relevant subgraph problems can be tackled by using a similar strategy. Since there is a huge growth in employing machine learning techniques in recent years, we believe that this survey will serve as a good reference point to relevant research communities.
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
Wang, Boshi, Min, Sewon, Deng, Xiang, Shen, Jiaming, Wu, You, Zettlemoyer, Luke, Sun, Huan
Chain-of-Thought (CoT) prompting can dramatically improve the multi-step reasoning abilities of large language models (LLMs). CoT explicitly encourages the LLM to generate intermediate rationales for solving a problem, by providing a series of reasoning steps in the demonstrations. Despite its success, there is still little understanding of what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that CoT reasoning is possible even with invalid demonstrations - prompting with invalid reasoning steps can achieve over 80-90% of the performance obtained using CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are much more important for effective CoT reasoning. Overall, these findings both deepen our understanding of CoT prompting, and open up new questions regarding LLMs' capability to learn to reason in context.
ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance
Roy, Kaushik, Gaur, Manas, Soltani, Misagh, Rawte, Vipula, Kalyan, Ashwin, Sheth, Amit
Current Virtual Mental Health Assistants (VMHAs) provide counseling and suggestive care. They refrain from patient diagnostic assistance because they lack training in safety-constrained and specialized clinical process knowledge. In this work, we define Proknow as an ordered set of information that maps to evidence-based guidelines or categories of conceptual understanding to experts in a domain. We also introduce a new dataset of diagnostic conversations guided by safety constraints and Proknow that healthcare professionals use. We develop a method for natural language question generation (NLG) that collects diagnostic information from the patient interactively. We demonstrate the limitations of using state-of-the-art large-scale language models (LMs) on this dataset. Our algorithm models the process knowledge through explicitly modeling safety, knowledge capture, and explainability. LMs augmented with ProKnow guided method generated 89% safer questions in the depression and anxiety domain. The Explainability of the generated question is assessed by computing similarity with concepts in depression and anxiety knowledge bases. Overall, irrespective of the type of LMs augmented with our ProKnow, we achieved an average 82% improvement over simple pre-trained LMs on safety, explainability, and process-guided question generation. We qualitatively and quantitatively evaluate the efficacy of the proposed ProKnow-guided methods by introducing three new evaluation metrics for safety, explainability, and process knowledge adherence.
Teaching Small Language Models to Reason
Magister, Lucie Charlotte, Mallinson, Jonathan, Adamek, Jakub, Malmi, Eric, Severyn, Aliaksei
Chain of thought prompting successfully improves the reasoning capabilities of large language models, achieving state of the art results on a range of datasets. However, these reasoning capabilities only appear to emerge in models with a size of over 100 billion parameters. In this paper, we explore the transfer of such reasoning capabilities to models with less than 100 billion parameters via knowledge distillation. Specifically, we finetune a student model on the chain of thought outputs generated by a larger teacher model. Our experiments show that the proposed method improves task performance across arithmetic, commonsense and symbolic reasoning datasets. For example, the accuracy of T5 XXL on GSM8K improves from 8.11% to 21.99% when finetuned on PaLM-540B generated chains of thought.
Explanation Graph Generation via Generative Pre-training over Synthetic Graphs
Cui, Han, Li, Shangzhan, Zhang, Yu, Shi, Qi
The generation of explanation graphs is a significant task that aims to produce explanation graphs in response to user input, revealing the internal reasoning process. This task is challenging due to the significant discrepancy between unstructured user queries and structured explanation graphs. Current research commonly fine-tunes a text-based pre-trained language model on a small downstream dataset that is annotated with labeled graphs. However, due to the limited scale of available datasets, this approach may prove to be insufficient in bridging the gap between natural language text and structured graphs. In this paper, to alleviate the above limitations, we propose a novel pre-trained framework EG3P(for Explanation Graph Generation via Generative Pre-training over synthetic graphs) for the explanation graph generation task. Specifically, we first propose a text-to-graph generative task to pre-train the model with the goal of bridging the text-graph gap. Additionally, we propose an automatic corpus synthesis strategy for synthesizing a large scale of high-quality corpus, reducing the reliance on costly manual annotation methods. Experimental results on ExplaGraphs show the effectiveness of EG3P that our model surpasses all baseline systems with remarkable margins. Besides, further analysis demonstrates that EG3P is able to generate better explanation graphs on actual reasoning tasks such as CommonsenseQA and OpenbookQA.
AQE: Argument Quadruplet Extraction via a Quad-Tagging Augmented Generative Approach
Guo, Jia, Cheng, Liying, Zhang, Wenxuan, Kok, Stanley, Li, Xin, Bing, Lidong
Argument mining involves multiple sub-tasks that automatically identify argumentative elements, such as claim detection, evidence extraction, stance classification, etc. However, each subtask alone is insufficient for a thorough understanding of the argumentative structure and reasoning process. To learn a complete view of an argument essay and capture the interdependence among argumentative components, we need to know what opinions people hold (i.e., claims), why those opinions are valid (i.e., supporting evidence), which source the evidence comes from (i.e., evidence type), and how those claims react to the debating topic (i.e., stance). In this work, we for the first time propose a challenging argument quadruplet extraction task (AQE), which can provide an all-in-one extraction of four argumentative components, i.e., claims, evidence, evidence types, and stances. To support this task, we construct a large-scale and challenging dataset. However, there is no existing method that can solve the argument quadruplet extraction. To fill this gap, we propose a novel quad-tagging augmented generative approach, which leverages a quadruplet tagging module to augment the training of the generative framework. The experimental results on our dataset demonstrate the empirical superiority of our proposed approach over several strong baselines.
Multi-View Masked World Models for Visual Robotic Manipulation
Seo, Younggyo, Kim, Junsu, James, Stephen, Lee, Kimin, Shin, Jinwoo, Abbeel, Pieter
Visual robotic manipulation research and applications often use multiple cameras, or views, to better perceive the world. How else can we utilize the richness of multi-view data? In this paper, we investigate how to learn good representations with multi-view data and utilize them for visual robotic manipulation. Specifically, we train a multi-view masked autoencoder which reconstructs pixels of randomly masked viewpoints and then learn a world model operating on the representations from the autoencoder. We demonstrate the effectiveness of our method in a range of scenarios, including multi-view control and single-view control with auxiliary cameras for representation learning. We also show that the multi-view masked autoencoder trained with multiple randomized viewpoints enables training a policy with strong viewpoint randomization and transferring the policy to solve real-robot tasks without camera calibration and an adaptation procedure. Video demonstrations are available at: https://sites.google.com/view/mv-mwm.