Instructional Material
ReliCD: A Reliable Cognitive Diagnosis Framework with Confidence Awareness
Zhang, Yunfei, Qin, Chuan, Shen, Dazhong, Ma, Haiping, Zhang, Le, Zhang, Xingyi, Zhu, Hengshu
During the past few decades, cognitive diagnostics modeling has attracted increasing attention in computational education communities, which is capable of quantifying the learning status and knowledge mastery levels of students. Indeed, the recent advances in neural networks have greatly enhanced the performance of traditional cognitive diagnosis models through learning the deep representations of students and exercises. Nevertheless, existing approaches often suffer from the issue of overconfidence in predicting students' mastery levels, which is primarily caused by the unavoidable noise and sparsity in realistic student-exercise interaction data, severely hindering the educational application of diagnostic feedback. To address this, in this paper, we propose a novel Reliable Cognitive Diagnosis(ReliCD) framework, which can quantify the confidence of the diagnosis feedback and is flexible for different cognitive diagnostic functions. Specifically, we first propose a Bayesian method to explicitly estimate the state uncertainty of different knowledge concepts for students, which enables the confidence quantification of diagnostic feedback. In particular, to account for potential differences, we suggest modeling individual prior distributions for the latent variables of different ability concepts using a pre-trained model. Additionally, we introduce a logical hypothesis for ranking confidence levels. Along this line, we design a novel calibration loss to optimize the confidence parameters by modeling the process of student performance prediction. Finally, extensive experiments on four real-world datasets clearly demonstrate the effectiveness of our ReliCD framework.
ChatEd: A Chatbot Leveraging ChatGPT for an Enhanced Learning Experience in Higher Education
Wang, Kevin, Ramos, Jason, Lawrence, Ramon
With the rapid evolution of Natural Language Processing (NLP), Large Language Models (LLMs) like ChatGPT have emerged as powerful tools capable of transforming various sectors. Their vast knowledge base and dynamic interaction capabilities represent significant potential in improving education by operating as a personalized assistant. However, the possibility of generating incorrect, biased, or unhelpful answers are a key challenge to resolve when deploying LLMs in an education context. This work introduces an innovative architecture that combines the strengths of ChatGPT with a traditional information retrieval based chatbot framework to offer enhanced student support in higher education. Our empirical evaluations underscore the high promise of this approach.
Building Efficient Universal Classifiers with Natural Language Inference
Laurer, Moritz, van Atteveldt, Wouter, Casas, Andreu, Welbers, Kasper
Generative Large Language Models (LLMs) have become the mainstream choice for fewshot and zeroshot learning thanks to the universality of text generation. Many users, however, do not need the broad capabilities of generative LLMs when they only want to automate a classification task. Smaller BERT-like models can also learn universal tasks, which allow them to do any text classification task without requiring fine-tuning (zeroshot classification) or to learn new tasks with only a few examples (fewshot), while being significantly more efficient than generative LLMs. This paper (1) explains how Natural Language Inference (NLI) can be used as a universal classification task that follows similar principles as instruction fine-tuning of generative LLMs, (2) provides a step-by-step guide with reusable Jupyter notebooks for building a universal classifier, and (3) shares the resulting universal classifier that is trained on 33 datasets with 389 diverse classes. Parts of the code we share has been used to train our older zeroshot classifiers that have been downloaded more than 55 million times via the Hugging Face Hub as of December 2023. Our new classifier improves zeroshot performance by 9.4%.
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math
Wang, Zengzhi, Xia, Rui, Liu, Pengfei
High-quality, large-scale corpora are the cornerstone of building foundation models. In this work, we introduce \textsc{MathPile}, a diverse and high-quality math-centric corpus comprising about 9.5 billion tokens. Throughout its creation, we adhered to the principle of ``\emph{less is more}'', firmly believing in the supremacy of data quality over quantity, even in the pre-training phase. Our meticulous data collection and processing efforts included a complex suite of preprocessing, prefiltering, language identification, cleaning, filtering, and deduplication, ensuring the high quality of our corpus. Furthermore, we performed data contamination detection on downstream benchmark test sets to eliminate duplicates. We hope our \textsc{MathPile} can help to enhance the mathematical reasoning abilities of language models. We plan to open-source different versions of \mathpile with the scripts used for processing, to facilitate future developments in this field.
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion
Lu, Guansong, Guo, Yuanfan, Han, Jianhua, Niu, Minzhe, Zeng, Yihan, Xu, Songcen, Huang, Zeyi, Zhong, Zhao, Zhang, Wei, Xu, Hang
Current large-scale diffusion models represent a giant leap forward in conditional image synthesis, capable of interpreting diverse cues like text, human poses, and edges. However, their reliance on substantial computational resources and extensive data collection remains a bottleneck. On the other hand, the integration of existing diffusion models, each specialized for different controls and operating in unique latent spaces, poses a challenge due to incompatible image resolutions and latent space embedding structures, hindering their joint use. Addressing these constraints, we present "PanGu-Draw", a novel latent diffusion model designed for resource-efficient text-to-image synthesis that adeptly accommodates multiple control signals. We first propose a resource-efficient Time-Decoupling Training Strategy, which splits the monolithic text-to-image model into structure and texture generators. Each generator is trained using a regimen that maximizes data utilization and computational efficiency, cutting data preparation by 48% and reducing training resources by 51%. Secondly, we introduce "Coop-Diffusion", an algorithm that enables the cooperative use of various pre-trained diffusion models with different latent spaces and predefined resolutions within a unified denoising process. This allows for multi-control image synthesis at arbitrary resolutions without the necessity for additional data or retraining. Empirical validations of Pangu-Draw show its exceptional prowess in text-to-image and multi-control image generation, suggesting a promising direction for future model training efficiencies and generation versatility. The largest 5B T2I PanGu-Draw model is released on the Ascend platform. Project page: $\href{https://pangu-draw.github.io}{this~https~URL}$
Matrix Decomposition and Applications
In 1954, Alston S. Householder published Principles of Numerical Analysis, one of the first modern treatments on matrix decomposition that favored a (block) LU decomposition-the factorization of a matrix into the product of lower and upper triangular matrices. And now, matrix decomposition has become a core technology in machine learning, largely due to the development of the back propagation algorithm in fitting a neural network. The sole aim of this survey is to give a self-contained introduction to concepts and mathematical tools in numerical linear algebra and matrix analysis in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of the Euclidean space, Hermitian space, Hilbert space, and things in the complex domain. We refer the reader to literature in the field of linear algebra for a more detailed introduction to the related fields.
Foundations of Reinforcement Learning and Interactive Decision Making
Foster, Dylan J., Rakhlin, Alexander
When we say interactive decision making, we are thinking of problems such as: Medical treatment: based on a patient's medical history and vital signs, we need to decide what treatment will lead to the most positive outcome. Controlling a robot: based on sensor signals, we need to decide what signals to send to a robot's actuators in order to navigate to a goal. For both problems, we (the learner/agent) are interacting with an unknown environment. In the robotics example, we do not necessarily a-priori know how the signals we send to our robot's actuators change its configuration, or what the landscape it's trying to navigate looks like. However, because we are able to actively control the agent, we can learn to model the environment on the fly as we make decisions and collect data, which will reduce uncertainty and allow us to make better decisions in the future. The crux of the interactive decision making problem is to make decisions in a way that balances (i) exploring the environment to reduce our uncertainty and (ii) maximizing our overall performance (e.g., reaching a goal state as fast as possible). Figure 1 depicts an idealized interactive decision making setting, which we will return to throughout this course.
Adapting Large Language Models for Education: Foundational Capabilities, Potentials, and Challenges
Li, Qingyao, Fu, Lingyue, Zhang, Weiming, Chen, Xianyu, Yu, Jingwei, Xia, Wei, Zhang, Weinan, Tang, Ruiming, Yu, Yong
Online education platforms, leveraging the internet to distribute education resources, seek to provide convenient education but often fall short in real-time communication with students. They often struggle to offer personalized education resources due to the challenge of addressing the diverse obstacles students encounter throughout their learning journey. Recently, the emergence of large language models (LLMs), such as ChatGPT, offers the possibility for resolving this issue by comprehending individual requests. Although LLMs have been successful in various fields, creating an LLM-based education system is still challenging for the wide range of educational skills required. This paper reviews the recently emerged LLM researches related to educational capabilities, including mathematics, writing, programming, reasoning, and knowledge-based question answering, with the aim to explore their potential in constructing the next-generation intelligent education system. Based on the current development status, we further outline two approaches for an LLM-based education system: a unified approach and a mixture-of-expert (MoE) approach. Finally, we explore the challenges and future directions, providing new research opportunities and perspectives on adapting LLMs for education.
Disentangled Continual Learning: Separating Memory Edits from Model Updates
Dziadzio, Sebastian, Yıldız, Çağatay, van de Ven, Gido M., Trzciński, Tomasz, Tuytelaars, Tinne, Bethge, Matthias
To mitigate this is hindered by catastrophic forgetting, the tendency issue, continual learning methods employ strategies such as of neural networks to overwrite existing knowledge when (i) regularization, which aims to preserve existing knowledge learning a new task. Existing continual learning methods by limiting the plasticity of selected network weights alleviate this problem through regularisation, parameter [15, 17, 26, 36], (ii) parameter isolation or dynamic architectures, isolation, or rehearsal, and are typically evaluated on benchmarks which effectively solve each task with a dedicated consisting of a handful of tasks. We propose a novel model [6, 33], or (iii) replay, which augments the training conceptual approach to continual classification that aims data with stored samples from past tasks [4, 12, 30, 32]. to disentangle class-specific information that needs to be Most continual learning methods are evaluated on image memorised from the class-agnostic knowledge that encapsulates classification benchmarks in which a discriminative model generalization. We store the former in a buffer that is transferred across tasks that typically involve disjoint sets can be easily pruned or updated when new categories arrive, of classes. We argue that this purely discriminative learning while the latter is represented with a neural network that framework is not conducive to positive forward or backward generalizes across tasks. We show that the class-agnostic transfer. Supervised classification networks tend to preserve network does not suffer from catastrophic forgetting and by only the features that are relevant for predicting the output leveraging it to perform classification, we improve accuracy labels in the training data [11, 35].
A Comprehensive Overview of Large Language Models
Naveed, Humza, Khan, Asad Ullah, Qiu, Shi, Saqib, Muhammad, Anwar, Saeed, Usman, Muhammad, Akhtar, Naveed, Barnes, Nick, Mian, Ajmal
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in natural language processing tasks and beyond. This success of LLMs has led to a large influx of research contributions in this direction. These works encompass diverse topics such as architectural innovations, better training strategies, context length improvements, fine-tuning, multi-modal LLMs, robotics, datasets, benchmarking, efficiency, and more. With the rapid development of techniques and regular breakthroughs in LLM research, it has become considerably challenging to perceive the bigger picture of the advances in this direction. Considering the rapidly emerging plethora of literature on LLMs, it is imperative that the research community is able to benefit from a concise yet comprehensive overview of the recent developments in this field. This article provides an overview of the existing literature on a broad range of LLM-related concepts. Our self-contained comprehensive overview of LLMs discusses relevant background concepts along with covering the advanced topics at the frontier of research in LLMs. This review article is intended to not only provide a systematic survey but also a quick comprehensive reference for the researchers and practitioners to draw insights from extensive informative summaries of the existing works to advance the LLM research.