Instructional Material
MBot: A Modular Ecosystem for Scalable Robotics Education
Gaskell, Peter, Pavlasek, Jana, Gao, Tom, Narula, Abhishek, Lewis, Stanley, Jenkins, Odest Chadwicke
The Michigan Robotics MBot is a low-cost mobile robot platform that has been used to train over 1,400 students in autonomous navigation since 2014 at the University of Michigan and our collaborating colleges. The MBot platform was designed to meet the needs of teaching robotics at scale to match the growth of robotics as a field and an academic discipline. Transformative advancements in robot navigation over the past decades have led to a significant demand for skilled roboticists across industry and academia. This demand has sparked a need for robotics courses in higher education, spanning all levels of undergraduate and graduate experiences. Incorporating real robot platforms into such courses and curricula is effective for conveying the unique challenges of programming embodied agents in real-world environments and sparking student interest. However, teaching with real robots remains challenging due to the cost of hardware and the development effort involved in adapting existing hardware for a new course. In this paper, we describe the design and evolution of the MBot platform, and the underlying principals of scalability and flexibility which are keys to its success.
Label Delay in Continual Learning
Csaba, Botos, Zhang, Wenxuan, Müller, Matthias, Lim, Ser-Nam, Elhoseiny, Mohamed, Torr, Philip, Bibi, Adel
Online continual learning, the process of training models on streaming data, has gained increasing attention in recent years. However, a critical aspect often overlooked is the label delay, where new data may not be labeled due to slow and costly annotation processes. We introduce a new continual learning framework with explicit modeling of the label delay between data and label streams over time steps. In each step, the framework reveals both unlabeled data from the current time step $t$ and labels delayed with $d$ steps, from the time step $t-d$. In our extensive experiments amounting to 1060 GPU days, we show that merely augmenting the computational resources is insufficient to tackle this challenge. Our findings underline a notable performance decline when solely relying on labeled data when the label delay becomes significant. More surprisingly, when using state-of-the-art SSL and TTA techniques to utilize the newer, unlabeled data, they fail to surpass the performance of a na\"ive method that simply trains on the delayed supervised stream. To this end, we introduce a simple, efficient baseline that rehearses from the labeled memory samples that are most similar to the new unlabeled samples. This method bridges the accuracy gap caused by label delay without significantly increasing computational complexity. We show experimentally that our method is the least affected by the label delay factor and in some cases successfully recovers the accuracy of the non-delayed counterpart. We conduct various ablations and sensitivity experiments, demonstrating the effectiveness of our approach.
Resource-constrained knowledge diffusion processes inspired by human peer learning
Beikihassan, Ehsan, Hoover, Amy K., Koutis, Ioannis, Parviz, Ali, Aghaieabiane, Niloofar
We consider a setting where a population of artificial learners is given, and the objective is to optimize aggregate measures of performance, under constraints on training resources. The problem is motivated by the study of peer learning in human educational systems. In this context, we study natural knowledge diffusion processes in networks of interacting artificial learners. By `natural', we mean processes that reflect human peer learning where the students' internal state and learning process is mostly opaque, and the main degree of freedom lies in the formation of peer learning groups by a coordinator who can potentially evaluate the learners before assigning them to peer groups. Among else, we empirically show that such processes indeed make effective use of the training resources, and enable the design of modular neural models that have the capacity to generalize without being prone to overfitting noisy labels.
Improving Plasticity in Online Continual Learning via Collaborative Learning
Wang, Maorong, Michel, Nicolas, Xiao, Ling, Yamasaki, Toshihiko
Online Continual Learning (CL) solves the problem of learning the ever-emerging new classification tasks from a continuous data stream. Unlike its offline counterpart, in online CL, the training data can only be seen once. Most existing online CL research regards catastrophic forgetting (i.e., model stability) as almost the only challenge. In this paper, we argue that the model's capability to acquire new knowledge (i.e., model plasticity) is another challenge in online CL. While replay-based strategies have been shown to be effective in alleviating catastrophic forgetting, there is a notable gap in research attention toward improving model plasticity. To this end, we propose Collaborative Continual Learning (CCL), a collaborative learning based strategy to improve the model's capability in acquiring new concepts. Additionally, we introduce Distillation Chain (DC), a novel collaborative learning scheme to boost the training of the models. We adapted CCL-DC to existing representative online CL works. Extensive experiments demonstrate that even if the learners are well-trained with state-of-the-art online CL methods, our strategy can still improve model plasticity dramatically, and thereby improve the overall performance by a large margin.
Adversarial Attacks and Defenses on 3D Point Cloud Classification: A Survey
Naderi, Hanieh, Bajić, Ivan V.
Deep learning has successfully solved a wide range of tasks in 2D vision as a dominant AI technique. Recently, deep learning on 3D point clouds is becoming increasingly popular for addressing various tasks in this field. Despite remarkable achievements, deep learning algorithms are vulnerable to adversarial attacks. These attacks are imperceptible to the human eye but can easily fool deep neural networks in the testing and deployment stage. To encourage future research, this survey summarizes the current progress on adversarial attack and defense techniques on point cloud classification.This paper first introduces the principles and characteristics of adversarial attacks and summarizes and analyzes adversarial example generation methods in recent years. Additionally, it provides an overview of defense strategies, organized into data-focused and model-focused methods. Finally, it presents several current challenges and potential future research directions in this domain.
How to Build an AI Tutor that Can Adapt to Any Course and Provide Accurate Answers Using Large Language Model and Retrieval-Augmented Generation
The advent of artificial intelligence (AI) has instigated a transformational wave across various sectors, with education standing as a salient beneficiary. AI's unrivaled capacity to enable personalized and adaptive learning experiences has propelled intelligent tutoring systems to the forefront of modern educational paradigms (Kasneci et al., 2023). These systems, powered by AI, offer individualized feedback and interactive learning modules designed to cater to each student's distinct learning needs. Nonetheless, the challenge of developing AI tutors capable of delivering consistently accurate and dependable responses across diverse academic disciplines persists. A notable hindrance to the reliability of AI in educational applications is the occurrence of'information hallucination', a phenomenon where AI-generated responses, while appearing valid, deviate from factual accuracy (Nye et al., 2023). Such inconsistencies can undermine confidence in AI-centric educational systems (Kasneci et al., 2023). Furthermore, the customization of these systems to align with specific course content necessitates access to current and pertinent educational materials, a task often complicated by the multifaceted nature of academic disciplines. To tackle these challenges, this paper introduces AI Tutor, a web application developed upon the sophisticated infrastructure of large language models (LLMs) and retrieval-augmented generation (RAG). AI Tutor is engineered to deliver accurate, contextually relevant responses by intelligently assimilating information from course-specific materials (Lewis et al., 2020).
AIhub monthly digest: November 2023 – deconstructing sentiment analysis, few-shot learning for medical images, and Angry Birds structure generation
Welcome to our November 2023 monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, find out about recent events, and more. This month, we deconstruct sentiment analysis, find out about few-shot learning in medical imaging, investigate rare events, and look forward to our science communication training session at NeurIPS. In their paper The Sentiment Problem: A Critical Survey towards Deconstructing Sentiment Analysis, Pranav Venkit, Mukund Srinath, Sanjana Gautam, Saranya Venkatraman, Vipul Gupta, Rebecca Passonneau and Shomir Wilson present a review of the sociotechnical aspects of sentiment analysis. In this interview, Pranav and Mukund tell us more about sentiment analysis, how they went about surveying the literature, and recommendations for researchers in the field. Deep learning models employed in medical imaging are limited by the lack of annotated images.
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
Robey, Alexander, Wong, Eric, Hassani, Hamed, Pappas, George J.
Over the last year, large language models (LLMs) have emerged as a groundbreaking technology that has the potential to fundamentally reshape how people interact with AI. Central to the fervor surrounding these models is the credibility and authenticity of the text they generate, which is largely attributable to the fact that LLMs are trained on vast text corpora sourced directly from the Internet. And while this practice exposes LLMs to a wealth of knowledge, such corpora tend to engender a double-edged sword, as they often contain objectionable content including hate speech, malware, and false information [1]. Indeed, the propensity of LLMs to reproduce this objectionable content has invigorated the field of AI alignment [2-4], wherein various mechanisms are used to "align" the output text generated by LLMs with ethical and legal standards [5-7]. At face value, efforts to align LLMs have reduced the propagation of toxic content: Publicly-available chatbots will now rarely output text that is clearly objectionable [8]. Yet, despite this encouraging progress, in recent months a burgeoning literature has identified numerous failure modes--commonly referred to as jailbreaks--that bypass the alignment mechanisms and safety guardrails implemented on modern LLMs [9, 10]. The pernicious nature of such jailbreaks, which are often difficult to detect or mitigate [11, 12], pose a significant barrier to the widespread deployment of LLMs, given that the text generated by these models may influence educational policy [13], medical diagnoses [14, 15], and business decisions [16]. Among the jailbreaks discovered so far, a notable category concerns adversarial prompting, wherein an attacker fools a targeted LLM into outputting objectionable content by modifying prompts passed as input to that LLM [17, 18]. Of particular concern is the recent work of [19], which shows that highly-performant LLMs, including GPT, Claude, and PaLM, can be jailbroken by appending adversarially-chosen characters onto various prompts.
Exo2EgoDVC: Dense Video Captioning of Egocentric Procedural Activities Using Web Instructional Videos
Ohkawa, Takehiko, Yagi, Takuma, Nishimura, Taichi, Furuta, Ryosuke, Hashimoto, Atsushi, Ushiku, Yoshitaka, Sato, Yoichi
We propose a novel benchmark for cross-view knowledge transfer of dense video captioning, adapting models from web instructional videos with exocentric views to an egocentric view. While dense video captioning (predicting time segments and their captions) is primarily studied with exocentric videos (e.g., YouCook2), benchmarks with egocentric videos are restricted due to data scarcity. To overcome the limited video availability, transferring knowledge from abundant exocentric web videos is demanded as a practical approach. However, learning the correspondence between exocentric and egocentric views is difficult due to their dynamic view changes. The web videos contain mixed views focusing on either human body actions or close-up hand-object interactions, while the egocentric view is constantly shifting as the camera wearer moves. This necessitates the in-depth study of cross-view transfer under complex view changes. In this work, we first create a real-life egocentric dataset (EgoYC2) whose captions are shared with YouCook2, enabling transfer learning between these datasets assuming their ground-truth is accessible. To bridge the view gaps, we propose a view-invariant learning method using adversarial training in both the pre-training and fine-tuning stages. While the pre-training is designed to learn invariant features against the mixed views in the web videos, the view-invariant fine-tuning further mitigates the view gaps between both datasets. We validate our proposed method by studying how effectively it overcomes the view change problem and efficiently transfers the knowledge to the egocentric domain. Our benchmark pushes the study of the cross-view transfer into a new task domain of dense video captioning and will envision methodologies to describe egocentric videos in natural language.
Text2Tree: Aligning Text Representation to the Label Tree Hierarchy for Imbalanced Medical Classification
Yan, Jiahuan, Gao, Haojun, Kai, Zhang, Liu, Weize, Chen, Danny, Wu, Jian, Chen, Jintai
Deep learning approaches exhibit promising performances on various text tasks. However, they are still struggling on medical text classification since samples are often extremely imbalanced and scarce. Different from existing mainstream approaches that focus on supplementary semantics with external medical information, this paper aims to rethink the data challenges in medical texts and present a novel framework-agnostic algorithm called Text2Tree that only utilizes internal label hierarchy in training deep learning models. We embed the ICD code tree structure of labels into cascade attention modules for learning hierarchy-aware label representations. Two new learning schemes, Similarity Surrogate Learning (SSL) and Dissimilarity Mixup Learning (DML), are devised to boost text classification by reusing and distinguishing samples of other labels following the label representation hierarchy, respectively. Experiments on authoritative public datasets and real-world medical records show that our approach stably achieves superior performances over classical and advanced imbalanced classification methods.