AITopics

2310.03878

Genre:

Overview (0.73)
Instructional Material > Course Syllabus & Notes (0.53)

Technology: Information Technology > Artificial Intelligence > Natural Language > Generation (0.93)

Michel, Nicolas, Chierchia, Giovanni, Negrel, Romain, Bercher, Jean-François

Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning

arXiv.org Artificial IntelligenceOct-5-2023

We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We propose to use the angular Gaussian distribution, which corresponds to a Gaussian projected on the unit-sphere and derive the associated loss function. We also consider the von Mises-Fisher distribution, which is the conditional of a Gaussian in the unit-sphere. The learned representations are pushed toward fixed directions, which are the prior means of the Gaussians; allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://t.ly/SQTj.

boundary, continual learning, learning, (15 more...)

2306.03364

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > France (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre:

Research Report (1.00)
Instructional Material > Online (0.71)

Industry: Education > Educational Setting (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceOct-4-2023

Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models

Yan, An, Wang, Yu, Zhong, Yiwu, He, Zexue, Karypis, Petros, Wang, Zihan, Dong, Chengyu, Gentili, Amilcare, Hsu, Chun-Nan, Shang, Jingbo, McAuley, Julian

Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.

concept bottleneck model, interpretable medical image classifier

2310.03182

Genre:

Instructional Material > Online (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Artificial IntelligenceOct-3-2023

CITING: Large Language Models Create Curriculum for Instruction Tuning

Feng, Tao, Wang, Zifeng, Sun, Jimeng

The recent advancement of large language models (LLMs) has been achieved through a combo of instruction tuning and human alignment. However, building manually crafted instruction datasets and performing human alignment become the bottleneck for scaling the development of LLMs. In this paper, we exploit the idea of leveraging AI models in lieu of humans as the teacher to train student LLMs. Our method is inspired by how human students refine their writing skills by following the rubrics and learning from the revisions offered by their tutors. Specifically, we employ a teacher LLM to create a curriculum for instruction tuning of the student LLM, namely Curriculum Instruction TunING (CITING). It encompasses two main steps: (1) the teacher LLM crafts the rubrics for evaluating the answers corresponding to various types of questions, and (2) the student LLM learns to follow the rubrics and perform self-correction from the revision made by the teacher. We further iteratively carry out it to embody the procedure of CITING. We compare CITING to a series of state-of-the-art baselines on four datasets. Our method demonstrates strong improvement in terms of articulate, in-depth, and comprehensive by GPT-4 evaluation. Specifically, it achieves an average winning rate of 79.4% over SFT, 73.4% over RLHF, 78.1% over RRHF, and 76.3% over RAFT, respectively.

citing, instruction tuning, language model create curriculum

2310.02527

Genre:

Instructional Material > Course Syllabus & Notes (0.60)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.53)

Savadikar, Chinmay, Dai, Michelle, Wu, Tianfu

Transforming Transformers for Resilient Lifelong Learning

arXiv.org Artificial IntelligenceOct-3-2023

Lifelong learning without catastrophic forgetting (i.e., resiliency) remains an open problem for deep neural networks. The prior art mostly focuses on convolutional neural networks. With the increasing dominance of Transformers in deep learning, it is a pressing need to study lifelong learning with Transformers. Due to the complexity of training Transformers in practice, for lifelong learning, a question naturally arises: Can Transformers be learned to grow in a task aware way, that is to be dynamically transformed by introducing lightweight learnable plastic components to the architecture, while retaining the parameter-heavy, but stable components at streaming tasks? To that end, motivated by the lifelong learning capability maintained by the functionality of Hippocampi in human brain, we explore what would be, and how to implement, Artificial Hippocampi (ArtiHippo) in Transformers. We present a method to identify, and learn to grow, ArtiHippo in Vision Transformers (ViTs) for resilient lifelong learning in four aspects: (i) Where to place ArtiHippo to enable plasticity while preserving the core function of ViTs at streaming tasks? (ii) How to represent and realize ArtiHippo to ensure expressivity and adaptivity for tackling tasks of different nature in lifelong learning? (iii) How to learn to grow ArtiHippo to exploit task synergies (i.e., the learned knowledge) and overcome catastrophic forgetting? (iv) How to harness the best of our proposed ArtiHippo and prompting-based approaches? In experiments, we test the proposed method on the challenging Visual Domain Decathlon (VDD) benchmark and the 5-Dataset benchmark under the task-incremental lifelong learning setting. It obtains consistently better performance than the prior art with sensible ArtiHippo learned continually. To our knowledge, it is the first attempt of lifelong learning with ViTs on the challenging VDD benchmark.

artihippo, learning, lifelong learning, (15 more...)

2303.0825

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > North Carolina (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Instructional Material (1.00)

Industry: Education > Educational Setting > Continuing Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tian, Xiaoyi, Boyer, Kristy Elizabeth

A Review of Digital Learning Environments for Teaching Natural Language Processing in K-12 Education

To ensure that citizens, including young people, become responsible users and creators of intelligent solutions, there has been increasing attention towards incorporating artificial intelligence (AI) curricula, including NLP, into 21st century computing education [11, 46, 54]. Children are growing up with NLP-powered applications, making them ideal learning platforms. For example, conversational agents have been demonstrated to enhance engagement in reading [57], foster language learning [16, 56] and promote story comprehension and engagement [55]. Despite the prevalence of NLP in young people's lives, there is limited research on teaching them NLP concepts. Traditionally, AI and NLP concepts have been taught primarily in higher education [3, 32, 40]. However, research indicates that children are capable of grasping AI and NLP concepts from a young age [10, 20].

digital learning environment, nlp, student, (14 more...)

2310.01603

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Denmark (0.04)
Asia > Indonesia (0.04)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Setting > K-12 Education (1.00)
Education > Educational Setting > Higher Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Murray-Rust, Dave, Lupetti, Maria Luce, Nicenboim, Iohanna, van der Hoog, Wouter

Grasping AI: experiential exercises for designers

Artificial intelligence (AI) and machine learning (ML) are increasingly integrated into the functioning of physical and digital products, creating unprecedented opportunities for interaction and functionality. However, there is a challenge for designers to ideate within this creative landscape, balancing the possibilities of technology with human interactional concerns. We investigate techniques for exploring and reflecting on the interactional affordances, the unique relational possibilities, and the wider social implications of AI systems. We introduced into an interaction design course (n=100) nine 'AI exercises' that draw on more than human design, responsible AI, and speculative enactment to create experiential engagements around AI interaction design. We find that exercises around metaphors and enactments make questions of training and learning, privacy and consent, autonomy and agency more tangible, and thereby help students be more reflective and responsible on how to design with AI and its complex properties in both their design process and outcomes.

computing machinery, interaction, student, (15 more...)

2310.01282

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
(8 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (1.00)
Energy (0.92)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Styrud, Jonathan, Mayr, Matthias, Hellsten, Erik, Krueger, Volker, Smith, Christian

BeBOP -- Combining Reactive Planning and Bayesian Optimization to Solve Robotic Manipulation Tasks

Robotic systems for manipulation tasks are increasingly expected to be easy to configure for new tasks. While in the past, robot programs were often written statically and tuned manually, the current, faster transition times call for robust, modular and interpretable solutions that also allow a robotic system to learn how to perform a task. We propose the method Behavior-based Bayesian Optimization and Planning (BeBOP) that combines two approaches for generating behavior trees: we build the structure using a reactive planner and learn specific parameters with Bayesian optimization. The method is evaluated on a set of robotic manipulation benchmarks and is shown to outperform state-of-the-art reinforcement learning algorithms by being up to 46 times faster while simultaneously being less dependent on reward shaping. We also propose a modification to the uncertainty estimate for the random forest surrogate models that drastically improves the results.

bebop, international conference, optimization, (11 more...)

2310.00971

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Delgado, Fernando, Yang, Stephen, Madaio, Michael, Yang, Qian

The Participatory Turn in AI Design: Theoretical Foundations and the Current State of Practice

Despite the growing consensus that stakeholders affected by AI systems should participate in their design, enormous variation and implicit disagreements exist among current approaches. For researchers and practitioners who are interested in taking a participatory approach to AI design and development, it remains challenging to assess the extent to which any participatory approach grants substantive agency to stakeholders. This article thus aims to ground what we dub the "participatory turn" in AI design by synthesizing existing theoretical literature on participation and through empirical investigation and critique of its current practices. Specifically, we derive a conceptual framework through synthesis of literature across technology design, political theory, and the social sciences that researchers and practitioners can leverage to evaluate approaches to participation in AI design. Additionally, we articulate empirical findings concerning the current state of participatory practice in AI design based on an analysis of recently published research and semi-structured interviews with 12 AI researchers and practitioners. We use these empirical findings to understand the current state of participatory practice and subsequently provide guidance to better align participatory goals and methods in a way that accounts for practical constraints.

participation, proceedings, stakeholder, (14 more...)

2310.00907

Country:

North America > United States > New York > New York County > New York City (0.15)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.05)
(21 more...)

Genre:

Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)
Instructional Material > Course Syllabus & Notes (0.67)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education (1.00)
Government (0.93)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Collaboration (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(4 more...)

Mokrov, Petr, Korotin, Alexander, Kolesov, Alexander, Gushchin, Nikita, Burnaev, Evgeny

Energy-guided Entropic Neural Optimal Transport

arXiv.org Machine LearningOct-2-2023

Energy-based models (EBMs) are known in the Machine Learning community for decades. Since the seminal works devoted to EBMs dating back to the noughties, there have been a lot of efficient methods which solve the generative modelling problem by means of energy potentials (unnormalized likelihood functions). In contrast, the realm of Optimal Transport (OT) and, in particular, neural OT solvers is much less explored and limited by few recent works (excluding WGAN-based approaches which utilize OT as a loss function and do not model OT maps themselves). In our work, we bridge the gap between EBMs and Entropy-regularized OT. We present a novel methodology which allows utilizing the recent developments and technical improvements of the former in order to enrich the latter. From the theoretical perspective, we prove generalization bounds for our technique. In practice, we validate its applicability in toy 2D and image domains. To showcase the scalability, we empower our method with a pre-trained StyleGAN and apply it to high-res AFHQ $512\times 512$ unpaired I2I translation. For simplicity, we choose simple short- and long-run EBMs as a backbone of our Energy-guided Entropic OT approach, leaving the application of more sophisticated EBMs for future research. Our code is publicly available.

artificial intelligence, eot, machine learning, (17 more...)

arXiv.org Machine Learning

2304.06094

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.46)

Industry: Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)