Instructional Material
Automatic and Human-AI Interactive Text Generation
Dou, Yao, Laban, Philippe, Gardent, Claire, Xu, Wei
In this tutorial, we focus on text-to-text generation, a class of natural language generation (NLG) tasks, that takes a piece of text as input and then generates a revision that is improved according to some specific criteria (e.g., readability or linguistic styles), while largely retaining the original meaning and the length of the text. This includes many useful applications, such as text simplification, paraphrase generation, style transfer, etc. In contrast to text summarization and open-ended text completion (e.g., story), the text-to-text generation tasks we discuss in this tutorial are more constrained in terms of semantic consistency and targeted language styles. This level of control makes these tasks ideal testbeds for studying the ability of models to generate text that is both semantically adequate and stylistically appropriate. Moreover, these tasks are interesting from a technical standpoint, as they require complex combinations of lexical and syntactical transformations, stylistic control, and adherence to factual knowledge, -- all at once. With a special focus on text simplification and revision, this tutorial aims to provide an overview of the state-of-the-art natural language generation research from four major aspects -- Data, Models, Human-AI Collaboration, and Evaluation -- and to discuss and showcase a few significant and recent advances: (1) the use of non-retrogressive approaches; (2) the shift from fine-tuning to prompting with large language models; (3) the development of new learnable metric and fine-grained human evaluation framework; (4) a growing body of studies and datasets on non-English languages; (5) the rise of HCI+NLP+Accessibility interdisciplinary research to create real-world writing assistant systems.
Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning
Michel, Nicolas, Chierchia, Giovanni, Negrel, Romain, Bercher, Jean-François
We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We propose to use the angular Gaussian distribution, which corresponds to a Gaussian projected on the unit-sphere and derive the associated loss function. We also consider the von Mises-Fisher distribution, which is the conditional of a Gaussian in the unit-sphere. The learned representations are pushed toward fixed directions, which are the prior means of the Gaussians; allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://t.ly/SQTj.
Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models
Yan, An, Wang, Yu, Zhong, Yiwu, He, Zexue, Karypis, Petros, Wang, Zihan, Dong, Chengyu, Gentili, Amilcare, Hsu, Chun-Nan, Shang, Jingbo, McAuley, Julian
Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.
CITING: Large Language Models Create Curriculum for Instruction Tuning
Feng, Tao, Wang, Zifeng, Sun, Jimeng
The recent advancement of large language models (LLMs) has been achieved through a combo of instruction tuning and human alignment. However, building manually crafted instruction datasets and performing human alignment become the bottleneck for scaling the development of LLMs. In this paper, we exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors. Specifically, we employ a teacher LLM to create a curriculum for instruction tuning of the student LLM, namely Curriculum Instruction TunING (CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics for evaluating the answers corresponding to various types of questions, and (2) the student LLM learns to follow the rubrics and perform self-correction from the revision made by the teacher. We further iteratively carry out it to embody the procedure of CITING. We compare CITING to a series of state-of-the-art baselines on four datasets. Our method demonstrates strong improvement in terms of articulate, in-depth, and comprehensive by GPT-4 evaluation. Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.
Transforming Transformers for Resilient Lifelong Learning
Savadikar, Chinmay, Dai, Michelle, Wu, Tianfu
Lifelong learning without catastrophic forgetting (i.e., resiliency) remains an open problem for deep neural networks. The prior art mostly focuses on convolutional neural networks. With the increasing dominance of Transformers in deep learning, it is a pressing need to study lifelong learning with Transformers. Due to the complexity of training Transformers in practice, for lifelong learning, a question naturally arises: Can Transformers be learned to grow in a task aware way, that is to be dynamically transformed by introducing lightweight learnable plastic components to the architecture, while retaining the parameter-heavy, but stable components at streaming tasks? To that end, motivated by the lifelong learning capability maintained by the functionality of Hippocampi in human brain, we explore what would be, and how to implement, Artificial Hippocampi (ArtiHippo) in Transformers. We present a method to identify, and learn to grow, ArtiHippo in Vision Transformers (ViTs) for resilient lifelong learning in four aspects: (i) Where to place ArtiHippo to enable plasticity while preserving the core function of ViTs at streaming tasks? (ii) How to represent and realize ArtiHippo to ensure expressivity and adaptivity for tackling tasks of different nature in lifelong learning? (iii) How to learn to grow ArtiHippo to exploit task synergies (i.e., the learned knowledge) and overcome catastrophic forgetting? (iv) How to harness the best of our proposed ArtiHippo and prompting-based approaches? In experiments, we test the proposed method on the challenging Visual Domain Decathlon (VDD) benchmark and the 5-Dataset benchmark under the task-incremental lifelong learning setting. It obtains consistently better performance than the prior art with sensible ArtiHippo learned continually. To our knowledge, it is the first attempt of lifelong learning with ViTs on the challenging VDD benchmark.
A Review of Digital Learning Environments for Teaching Natural Language Processing in K-12 Education
Tian, Xiaoyi, Boyer, Kristy Elizabeth
To ensure that citizens, including young people, become responsible users and creators of intelligent solutions, there has been increasing attention towards incorporating artificial intelligence (AI) curricula, including NLP, into 21st century computing education [11, 46, 54]. Children are growing up with NLP-powered applications, making them ideal learning platforms. For example, conversational agents have been demonstrated to enhance engagement in reading [57], foster language learning [16, 56] and promote story comprehension and engagement [55]. Despite the prevalence of NLP in young people's lives, there is limited research on teaching them NLP concepts. Traditionally, AI and NLP concepts have been taught primarily in higher education [3, 32, 40]. However, research indicates that children are capable of grasping AI and NLP concepts from a young age [10, 20].
Grasping AI: experiential exercises for designers
Murray-Rust, Dave, Lupetti, Maria Luce, Nicenboim, Iohanna, van der Hoog, Wouter
Artificial intelligence (AI) and machine learning (ML) are increasingly integrated into the functioning of physical and digital products, creating unprecedented opportunities for interaction and functionality. However, there is a challenge for designers to ideate within this creative landscape, balancing the possibilities of technology with human interactional concerns. We investigate techniques for exploring and reflecting on the interactional affordances, the unique relational possibilities, and the wider social implications of AI systems. We introduced into an interaction design course (n=100) nine 'AI exercises' that draw on more than human design, responsible AI, and speculative enactment to create experiential engagements around AI interaction design. We find that exercises around metaphors and enactments make questions of training and learning, privacy and consent, autonomy and agency more tangible, and thereby help students be more reflective and responsible on how to design with AI and its complex properties in both their design process and outcomes.
BeBOP -- Combining Reactive Planning and Bayesian Optimization to Solve Robotic Manipulation Tasks
Styrud, Jonathan, Mayr, Matthias, Hellsten, Erik, Krueger, Volker, Smith, Christian
Robotic systems for manipulation tasks are increasingly expected to be easy to configure for new tasks. While in the past, robot programs were often written statically and tuned manually, the current, faster transition times call for robust, modular and interpretable solutions that also allow a robotic system to learn how to perform a task. We propose the method Behavior-based Bayesian Optimization and Planning (BeBOP) that combines two approaches for generating behavior trees: we build the structure using a reactive planner and learn specific parameters with Bayesian optimization. The method is evaluated on a set of robotic manipulation benchmarks and is shown to outperform state-of-the-art reinforcement learning algorithms by being up to 46 times faster while simultaneously being less dependent on reward shaping. We also propose a modification to the uncertainty estimate for the random forest surrogate models that drastically improves the results.
The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice
Delgado, Fernando, Yang, Stephen, Madaio, Michael, Yang, Qian
Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints.
Energy-guided Entropic Neural Optimal Transport
Mokrov, Petr, Korotin, Alexander, Kolesov, Alexander, Gushchin, Nikita, Burnaev, Evgeny
Energy-based models (EBMs) are known in the Machine Learning community for decades. Since the seminal works devoted to EBMs dating back to the noughties, there have been a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN-based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present a novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. From the theoretical perspective, we prove generalization bounds for our technique. In practice, we validate its applicability in toy 2D and image domains. To showcase the scalability, we empower our method with a pre-trained StyleGAN and apply it to high-res AFHQ $512\times 512$ unpaired I2I translation. For simplicity, we choose simple short- and long-run EBMs as a backbone of our Energy-guided Entropic OT approach, leaving the application of more sophisticated EBMs for future research. Our code is publicly available.