Goto

Collaborating Authors

 Instructional Material


Just Say the Name: Online Continual Learning with Category Names Only via Data Generation

arXiv.org Artificial Intelligence

In real-world scenarios, extensive manual annotation for continual learning is impractical due to prohibitive costs. Although prior arts, influenced by large-scale webly supervised training, suggest leveraging web-scraped data in continual learning, this poses challenges such as data imbalance, usage restrictions, and privacy concerns. Addressing the risks of continual webly supervised training, we present an online continual learning framework - Generative Name only Continual Learning (G-NoCL). The proposed G-NoCL uses a set of generators G along with the learner. When encountering new concepts (i.e., classes), G-NoCL employs the novel sample complexity-guided data ensembling technique DIverSity and COmplexity enhancing ensemBlER (DISCOBER) to optimally sample training data from generated data. Through extensive experimentation, we demonstrate superior performance of DISCOBER in G-NoCL online CL benchmarks, covering both In-Distribution (ID) and Out-of-Distribution (OOD) generalization evaluations, compared to naive generator-ensembling, web-supervised, and manually annotated data.


ChatGPT in Data Visualization Education: A Student Perspective

arXiv.org Artificial Intelligence

Unlike traditional educational chatbots that rely on pre-programmed responses, large-language model-driven chatbots, such as ChatGPT, demonstrate remarkable versatility and have the potential to serve as a dynamic resource for addressing student needs from understanding advanced concepts to solving complex problems. This work explores the impact of such technology on student learning in an interdisciplinary, project-oriented data visualization course. Throughout the semester, students engaged with ChatGPT across four distinct projects, including data visualizations and implementing them using a variety of tools including Tableau, D3, and Vega-lite. We collected conversation logs and reflection surveys from the students after each assignment. In addition, we conducted interviews with selected students to gain deeper insights into their overall experiences with ChatGPT. Our analysis examined the advantages and barriers of using ChatGPT, students' querying behavior, the types of assistance sought, and its impact on assignment outcomes and engagement. Based on the findings, we discuss design considerations for an educational solution that goes beyond the basic interface of ChatGPT, specifically tailored for data visualization education.


The Machine Ethics podcast: Good tech with Eleanor Drage and Kerry McInerney

AIHub

Hosted by Ben Byford, The Machine Ethics Podcast brings together interviews with academics, authors, business leaders, designers and engineers on the subject of autonomous algorithms, artificial intelligence, machine learning, and technology's impact on society. This episode we're chatting with Eleanor and Kerry on good technology and if it's even possible, that technology is political, watering down regulation, the magic of AI, the value of human creativity, how Feminism, Aboriginal, and mixed race studies can help AI development, the performative nature of tech, and moreโ€ฆ Dr Kerry McInerney (nรฉe Mackereth) is a Research Fellow at the Leverhulme Centre for the Future of Intelligence at the University of Cambridge, where she co-leads the Global Politics of AI project on how AI is impacting international relations. She is also a Research Fellow at the AI Now Institute (a leading AI policy thinktank in New York), an AHRC/BBC New Generation Thinker (2023), one of the 100 Brilliant Women in AI Ethics (2022), and one of Computing's Rising Stars 30 (2023). Kerry is the co-editor of the collection Feminist AI: Critical Perspectives on Algorithms, Data, and Intelligent Machines (2023, Oxford University Press), the collection The Good Robot: Why Technology Needs Feminism (2024, Bloomsbury Academic), and the co-author of the forthcoming book Reprogram: Why Big Tech is Broken and How Feminism Can Fix It (2026, Princeton University Press). Dr Eleanor Drage is a Senior Research Fellow at the University of Cambridge Centre for the Future of Intelligence, and teaches AI professionals about AI ethics on a Masters course at Cambridge.


Overcoming Knowledge Barriers: Online Imitation Learning from Observation with Pretrained World Models

arXiv.org Artificial Intelligence

Incorporating the successful paradigm of pretraining and finetuning from Computer Vision and Natural Language Processing into decision-making has become increasingly popular in recent years. In this paper, we study Imitation Learning from Observation with pretrained models and find existing approaches such as BCO and AIME face knowledge barriers, specifically the Embodiment Knowledge Barrier (EKB) and the Demonstration Knowledge Barrier (DKB), greatly limiting their performance. The EKB arises when pretrained models lack knowledge about unseen observations, leading to errors in action inference. The DKB results from policies trained on limited demonstrations, hindering adaptability to diverse scenarios. We thoroughly analyse the underlying mechanism of these barriers and propose AIME-v2 upon AIME as a solution. AIME-v2 uses online interactions with data-driven regulariser to alleviate the EKB and mitigates the DKB by introducing a surrogate reward function to enhance policy training. Experimental results on tasks from the DeepMind Control Suite and Meta-World benchmarks demonstrate the effectiveness of these modifications in improving both sample-efficiency and converged performance. The study contributes valuable insights into resolving knowledge barriers for enhanced decision-making in pretraining-based approaches. Code will be available at https://github.com/argmax-ai/aime-v2.


Foundations of Multisensory Artificial Intelligence

arXiv.org Artificial Intelligence

Building multisensory AI systems that learn from multiple sensory inputs such as text, speech, video, real-world sensors, wearable devices, and medical data holds great promise for impact in many scientific areas with practical benefits, such as in supporting human health and well-being, enabling multimedia content processing, and enhancing real-world autonomous agents. By synthesizing a range of theoretical frameworks and application domains, this thesis aims to advance the machine learning foundations of multisensory AI. In the first part, we present a theoretical framework formalizing how modalities interact with each other to give rise to new information for a task. These interactions are the basic building blocks in all multimodal problems, and their quantification enables users to understand their multimodal datasets, design principled approaches to learn these interactions, and analyze whether their model has succeeded in learning. In the second part, we study the design of practical multimodal foundation models that generalize over many modalities and tasks, which presents a step toward grounding large language models to real-world sensory modalities. We introduce MultiBench, a unified large-scale benchmark across a wide range of modalities, tasks, and research areas, followed by the cross-modal attention and multimodal transformer architectures that now underpin many of today's multimodal foundation models. Scaling these architectures on MultiBench enables the creation of general-purpose multisensory AI systems, and we discuss our collaborative efforts in applying these models for real-world impact in affective computing, mental health, cancer prognosis, and robotics. Finally, we conclude this thesis by discussing how future work can leverage these ideas toward more general, interactive, and safe multisensory AI.


Machine Learning for Quantum Computing Specialists

arXiv.org Artificial Intelligence

Quantum machine learning (QML) is a promising early use case for quantum computing. There has been progress in the last five years from theoretical studies and numerical simulations to proof of concepts. Use cases demonstrated on contemporary quantum devices include classifying medical images and items from the Iris dataset, classifying and generating handwritten images, toxicity screening, and learning a probability distribution. Potential benefits of QML include faster training and identification of feature maps not found classically. Although, these examples lack the scale for commercial exploitation, and it may be several years before QML algorithms replace the classical solutions, QML is an exciting area. This article is written for those who already have a sound knowledge of quantum computing and now wish to gain a basic overview of the terminology and some applications of classical machine learning ready to study quantum machine learning. The reader will already understand the relevant relevant linear algebra, including Hilbert spaces, a vector space with an inner product.


Medical Speech Symptoms Classification via Disentangled Representation

arXiv.org Artificial Intelligence

Intent is defined for understanding spoken language in existing works. Both textual features and acoustic features involved in medical speech contain intent, which is important for symptomatic diagnosis. In this paper, we propose a medical speech classification model named DRSC that automatically learns to disentangle intent and content representations from textual-acoustic data for classification. The intent representations of the text domain and the Mel-spectrogram domain are extracted via intent encoders, and then the reconstructed text feature and the Mel-spectrogram feature are obtained through two exchanges. After combining the intent from two domains into a joint representation, the integrated intent representation is fed into a decision layer for classification. Experimental results show that our model obtains an average accuracy rate of 95% in detecting 25 different medical symptoms.


Computational Job Market Analysis with Natural Language Processing

arXiv.org Artificial Intelligence

[Abridged Abstract] Recent technological advances underscore labor market dynamics, yielding significant consequences for employment prospects and increasing job vacancy data across platforms and languages. Aggregating such data holds potential for valuable insights into labor market demands, new skills emergence, and facilitating job matching for various stakeholders. However, despite prevalent insights in the private sector, transparent language technology systems and data for this domain are lacking. This thesis investigates Natural Language Processing (NLP) technology for extracting relevant information from job descriptions, identifying challenges including scarcity of training data, lack of standardized annotation guidelines, and shortage of effective extraction methods from job ads. We frame the problem, obtaining annotated data, and introducing extraction methodologies. Our contributions include job description datasets, a de-identification dataset, and a novel active learning algorithm for efficient model training. We propose skill extraction using weak supervision, a taxonomy-aware pre-training methodology adapting multilingual language models to the job market domain, and a retrieval-augmented model leveraging multiple skill extraction datasets to enhance overall performance. Finally, we ground extracted information within a designated taxonomy.


VIEW: Visual Imitation Learning with Waypoints

arXiv.org Artificial Intelligence

Robots can use Visual Imitation Learning (VIL) to learn everyday tasks from video demonstrations. However, translating visual observations into actionable robot policies is challenging due to the high-dimensional nature of video data. This challenge is further exacerbated by the morphological differences between humans and robots, especially when the video demonstrations feature humans performing tasks. To address these problems we introduce Visual Imitation lEarning with Waypoints (VIEW), an algorithm that significantly enhances the sample efficiency of human-to-robot VIL. VIEW achieves this efficiency using a multi-pronged approach: extracting a condensed prior trajectory that captures the demonstrator's intent, employing an agent-agnostic reward function for feedback on the robot's actions, and utilizing an exploration algorithm that efficiently samples around waypoints in the extracted trajectory. VIEW also segments the human trajectory into grasp and task phases to further accelerate learning efficiency. Through comprehensive simulations and real-world experiments, VIEW demonstrates improved performance compared to current state-of-the-art VIL methods. VIEW enables robots to learn a diverse range of manipulation tasks involving multiple objects from arbitrarily long video demonstrations. Additionally, it can learn standard manipulation tasks such as pushing or moving objects from a single video demonstration in under 30 minutes, with fewer than 20 real-world rollouts. Code and videos here: https://collab.me.vt.edu/view/


Leveraging Prompts in LLMs to Overcome Imbalances in Complex Educational Text Data

arXiv.org Artificial Intelligence

In this paper, we explore the potential of Large Language Models (LLMs) with assertions to mitigate imbalances in educational datasets. Traditional models often fall short in such contexts, particularly due to the complexity and nuanced nature of the data. This issue is especially prominent in the education sector, where cognitive engagement levels among students show significant variation in their open responses. To test our hypothesis, we utilized an existing technology for assertion-based prompt engineering through an 'Iterative - ICL PE Design Process' comparing traditional Machine Learning (ML) models against LLMs augmented with assertions (N=135). Further, we conduct a sensitivity analysis on a subset (n=27), examining the variance in model performance concerning classification metrics and cognitive engagement levels in each iteration. Our findings reveal that LLMs with assertions significantly outperform traditional ML models, particularly in cognitive engagement levels with minority representation, registering up to a 32% increase in F1-score. Additionally, our sensitivity study indicates that incorporating targeted assertions into the LLM tested on the subset enhances its performance by 11.94%. This improvement primarily addresses errors stemming from the model's limitations in understanding context and resolving lexical ambiguities in student responses.