Instructional Material
Improving Online Continual Learning Performance and Stability with Temporal Ensembles
Soutif--Cormerais, Albin, Carta, Antonio, Van de Weijer, Joost
Neural networks are very effective when trained on large datasets for a large number of iterations. However, when they are trained on non-stationary streams of data and in an online fashion, their performance is reduced (1) by the online setup, which limits the availability of data, (2) due to catastrophic forgetting because of the non-stationary nature of the data. Furthermore, several recent works (Caccia et al., 2022; Lange et al., 2023) showed that replay methods used in continual learning suffer from the stability gap, encountered when evaluating the model continually (rather than only on task boundaries). In this article, we study the effect of model ensembling as a way to improve performance and stability in online continual learning. We notice that naively ensembling models coming from a variety of training tasks increases the performance in online continual learning considerably. Starting from this observation, and drawing inspirations from semi-supervised learning ensembling methods, we use a lightweight temporal ensemble that computes the exponential moving average of the weights (EMA) at test time, and show that it can drastically increase the performance and stability when used in combination with several methods from the literature. Learning neural networks with backpropagation has been proven capable of good generalization properties even when using overparametrized networks (Krizhevsky et al., 2017). However, these good learning properties mainly occur when the data is provided in an independant and identically distributed manner. When learning on a stream which distribution varies over time, neural networks are known to suffer from catastrophic forgetting (McCloskey & Cohen, 1989; Goodfellow et al., 2014; Kirkpatrick et al., 2017), and tend to forget knowledge acquired in previous learning tasks. The field of continual learning aims to address this problem. Generally, incremental learning separates the learning into distinct tasks (identified by a task-ID) that are encountered sequentially by the agent. A variety of settings have been introduced in continual learning in order to evaluate several aspects of the continual learning agent; taskincremental learning (De Lange et al., 2021; van de Ven & Tolias, 2018), and class-incremental learning (Masana et al., 2022; Belouadah et al., 2021) are among the most popular. In this paper, we focus on the more challenging class-incremental setting, where the learner does not have access to the task-ID at inference time.
Can We Trust AI-Generated Educational Content? Comparative Analysis of Human and AI-Generated Learning Resources
Denny, Paul, Khosravi, Hassan, Hellas, Arto, Leinonen, Juho, Sarsa, Sami
As an increasing number of students move to online learning platforms that deliver personalized learning experiences, there is a great need for the production of high-quality educational content. Large language models (LLMs) appear to offer a promising solution to the rapid creation of learning materials at scale, reducing the burden on instructors. In this study, we investigated the potential for LLMs to produce learning resources in an introductory programming context, by comparing the quality of the resources generated by an LLM with those created by students as part of a learnersourcing activity. Using a blind evaluation, students rated the correctness and helpfulness of resources generated by AI and their peers, after both were initially provided with identical exemplars. Our results show that the quality of AI-generated resources, as perceived by students, is equivalent to the quality of resources generated by their peers. This suggests that AI-generated resources may serve as viable supplementary material in certain contexts. Resources generated by LLMs tend to closely mirror the given exemplars, whereas student-generated resources exhibit greater variety in terms of content length and specific syntax features used. The study highlights the need for further research exploring different types of learning resources and a broader range of subject areas, and understanding the long-term impact of AI-generated resources on learning outcomes.
Transformers in Time-series Analysis: A Tutorial
Ahmed, Sabeen, Nielsen, Ian E., Tripathi, Aakash, Siddiqui, Shamoon, Rasool, Ghulam, Ramachandran, Ravi P.
Transformers belong to a class of machine learning models that use self-attention or the scaled dot-product operation as their primary learning mechanism. Transformers were initially proposed for neural machine translation - one of the most challenging natural language processing (NLP) tasks [1]. Recently, Transformers have been successfully employed to tackle various problems in machine learning and achieve state-of-the-art performance [2]. Apart from classical NLP tasks, examples from other areas include image classification [3], object detection and segmentation [4], image and language generation [5], sequential decision-making in reinforcement learning [6], multi-modal (text, speech, and image) data processing [7], and analysis of tabular and time-series data [8]. This tutorial paper focuses on time-series analysis using Transformers. Time-series data consist of ordered samples, observations, or features recorded sequentially over time. Time-series datasets often arise naturally in many real-world applications where data is recorded over a fixed sampling interval. Examples include stock prices, digitized speech signals, traffic measurements, sensor data for weather patterns, biomedical measurements, and various kinds of population data recorded over time.
Harvard announces it will teach students using an artificial intelligence instructor next semester
Ivy League students at one of America's most expensive colleges will be taught by AI next year. The teachers of Harvard University's popular intro-level coding course are'experimenting' with a ChatGPT-powered teaching assistant. Professor David Malan, who runs the course, justified plans for the introduction of the'CS50 bot' by noting that the course has often deployed new software in its syllabus. A ChatGPT AI teacher, he said, was simply an'evolution of that tradition', he said in a statement. 'Our own hope is that, through AI, we can eventually approximate a 1:1 teacher:student ratio for every student in CS50... providing them with software-based tools that, 24/7, can support their learning at a pace and in a style that works best for them individually.'
Inter-case Predictive Process Monitoring: A candidate for Quantum Machine Learning?
Hill, Stefan, Fitzek, David, Delfmann, Patrick, Corea, Carl
Regardless of the domain, forecasting the future behaviour of a running process instance is a question of interest for decision makers, especially when multiple instances interact. Fostered by the recent advances in machine learning research, several methods have been proposed to predict the next activity, outcome or remaining time of a process automatically. Still, building a model with high predictive power requires both - intrinsic knowledge of how to extract meaningful features from the event log data and a model that captures complex patterns in data. This work builds upon the recent progress in inter-case Predictive Process Monitoring (PPM) and comprehensively benchmarks the impact of inter-case features on prediction accuracy. Moreover, it includes quantum machine learning models, which are expected to provide an advantage over classical models with a scaling amount of feature dimensions. The evaluation on real-world training data from the BPI challenge shows that the inter-case features provide a significant boost by more than four percent in accuracy and quantum algorithms are indeed competitive in a handful of feature configurations. Yet, as quantum hardware is still in its early stages of development, this paper critically discusses these findings in the light of runtime, noise and the risk to overfit on the training data. Finally, the implementation of an open-source plugin demonstrates the technical feasibility to connect a state-of-the-art workflow engine such as Camunda to an IBM quantum computing cloud service.
More for Less: Compact Convolutional Transformers Enable Robust Medical Image Classification with Limited Data
Transformers are very powerful tools for a variety of tasks across domains, from text generation to image captioning. However, transformers require substantial amounts of training data, which is often a challenge in biomedical settings, where high quality labeled data can be challenging or expensive to obtain. This study investigates the efficacy of Compact Convolutional Transformers (CCT) for robust medical image classification with limited data, addressing a key issue faced by conventional Vision Transformers - their requirement for large datasets. A hybrid of transformers and convolutional layers, CCTs demonstrate high accuracy on modestly sized datasets. We employed a benchmark dataset of peripheral blood cell images of eight distinct cell types, each represented by approximately 2,000 low-resolution (28x28x3 pixel) samples. Despite the dataset size being smaller than those typically used with Vision Transformers, we achieved a commendable classification accuracy of 92.49% and a micro-average ROC AUC of 0.9935. The CCT also learned quickly, exceeding 80% validation accuracy after five epochs. Analysis of per-class precision, recall, F1, and ROC showed that performance was strong across cell types. Our findings underscore the robustness of CCTs, indicating their potential as a solution to data scarcity issues prevalent in biomedical imaging. We substantiate the applicability of CCTs in data-constrained areas and encourage further work on CCTs.
The Integer Linear Programming Inference Cookbook
Effective decision-making requires the use of knowledge. This has been a clear, and long-standing principle in AI research, as reflected, for example, in the seminal early work on knowledge and AI--summarized by Brachman and Levesque (1985)--and the thriving Knowledge Representation and Reasoning and the Uncertainty in AI communities. However, the message has been somewhat diluted as data-driven statistical learning has become increasingly pervasive across AI. Nevertheless, the idea that reasoning and learning need to work together (Khardon and Roth, 1996; Roth, 1996) and that knowledge representation is a crucial bridge between them has not been lost. One area where the link between learning, representation, and reasoning has been shown to be essential and has been studied extensively is Natural Language Processing (NLP), and in particular, the area of Structured Output Prediction within NLP. In structured problems, there is a need to assign values to multiple random variables that are interrelated. Examples include extracting multiple relations among entities in a document, where a the two arguments for a relation such as born-in cannot refer to people, or co-reference resolution, where gender agreement must be maintained when determining that a specific pronoun refers to a given entity. In these, and many other such problems, it is natural to represent knowledge as Boolean functions over propositional variables. These functions would express knowledge, for example, of the form "if the relation between two entities is born-in, then its arguments must be a person and a location" (formalized as functions such as x
Vision Through the Veil: Differential Privacy in Federated Learning for Medical Image Classification
Nampalle, Kishore Babu, Singh, Pradeep, Narayan, Uppala Vivek, Raman, Balasubramanian
The proliferation of deep learning applications in healthcare calls for data aggregation across various institutions, a practice often associated with significant privacy concerns. This concern intensifies in medical image analysis, where privacy-preserving mechanisms are paramount due to the data being sensitive in nature. Federated learning, which enables cooperative model training without direct data exchange, presents a promising solution. Nevertheless, the inherent vulnerabilities of federated learning necessitate further privacy safeguards. This study addresses this need by integrating differential privacy, a leading privacy-preserving technique, into a federated learning framework for medical image classification. We introduce a novel differentially private federated learning model and meticulously examine its impacts on privacy preservation and model performance. Our research confirms the existence of a trade-off between model accuracy and privacy settings. However, we demonstrate that strategic calibration of the privacy budget in differential privacy can uphold robust image classification performance while providing substantial privacy protection.
Harnessing LLMs in Curricular Design: Using GPT-4 to Support Authoring of Learning Objectives
Sridhar, Pragnya, Doyle, Aidan, Agarwal, Arav, Bogart, Christopher, Savelka, Jaromir, Sakr, Majd
We evaluated the capability of a generative pre-trained transformer (GPT-4) to automatically generate high-quality learning objectives (LOs) in the context of a practically oriented university course on Artificial Intelligence. Discussions of opportunities (e.g., content generation, explanation) and risks (e.g., cheating) of this emerging technology in education have intensified, but to date there has not been a study of the models' capabilities in supporting the course design and authoring of LOs. LOs articulate the knowledge and skills learners are intended to acquire by engaging with a course. To be effective, LOs must focus on what students are intended to achieve, focus on specific cognitive processes, and be measurable. Thus, authoring high-quality LOs is a challenging and time consuming (i.e., expensive) effort. We evaluated 127 LOs that were automatically generated based on a carefully crafted prompt (detailed guidelines on high-quality LOs authoring) submitted to GPT-4 for conceptual modules and projects of an AI Practitioner course. We analyzed the generated LOs if they follow certain best practices such as beginning with action verbs from Bloom's taxonomy in regards to the level of sophistication intended. Our analysis showed that the generated LOs are sensible, properly expressed (e.g., starting with an action verb), and that they largely operate at the appropriate level of Bloom's taxonomy, respecting the different nature of the conceptual modules (lower levels) and projects (higher levels). Our results can be leveraged by instructors and curricular designers wishing to take advantage of the state-of-the-art generative models to support their curricular and course design efforts.
Screw and Lie Group Theory in Multibody Kinematics -- Motion Representation and Recursive Kinematics of Tree-Topology Systems
After three decades of computational multibody system (MBS) dynamics, current research is centered at the development of compact and user friendly yet computationally efficient formulations for the analysis of complex MBS. The key to this is a holistic geometric approach to the kinematics modeling observing that the general motion of rigid bodies as well as the relative motion due to technical joints are screw motions. Moreover, screw theory provides the geometric setting and Lie group theory the analytic foundation for an intuitive and compact MBS modeling. The inherent frame invariance of this modeling approach gives rise to very efficient recursive $O\left( n\right) $ algorithms, for which the so-called 'spatial operator algebra' is one example, and allows for use of readily available geometric data. In this paper three variants for describing the configuration of tree-topology MBS in terms of relative coordinates, i.e. joint variables, are presented: the standard formulation using body-fixed joint frames, a formulation without joint frames, and a formulation without either joint or body-fixed reference frames. This allows for describing the MBS kinematics without introducing joint reference frames and therewith rendering the use of restrictive modeling convention, such as Denavit-Hartenberg parameters, redundant. Four different definitions of twists are recalled and the corresponding recursive expressions are derived. The corresponding Jacobians and their factorization are derived. The aim of this paper is to motivate the use of Lie group modeling and to provide a review of the different formulations for the kinematics of tree-topology MBS in terms of relative (joint) coordinates from the unifying perspective of screw and Lie group theory.