Instructional Material
AGI: Artificial General Intelligence for Education
Latif, Ehsan, Mai, Gengchen, Nyaaba, Matthew, Wu, Xuansheng, Liu, Ninghao, Lu, Guoyu, Li, Sheng, Liu, Tianming, Zhai, Xiaoming
Artificial general intelligence (AGI) has gained global recognition as a future technology due to the emergence of breakthrough large language models and chatbots such as GPT-4 and ChatGPT, respectively. Compared to conventional AI models, typically designed for a limited range of tasks, demand significant amounts of domain-specific data for training and may not always consider intricate interpersonal dynamics in education. AGI, driven by the recent large pre-trained models, represents a significant leap in the capability of machines to perform tasks that require human-level intelligence, such as reasoning, problem-solving, decision-making, and even understanding human emotions and social interactions. This position paper reviews AGI's key concepts, capabilities, scope, and potential within future education, including achieving future educational goals, designing pedagogy and curriculum, and performing assessments. It highlights that AGI can significantly improve intelligent tutoring systems, educational assessment, and evaluation procedures. AGI systems can adapt to individual student needs, offering tailored learning experiences. They can also provide comprehensive feedback on student performance and dynamically adjust teaching methods based on student progress. The paper emphasizes that AGI's capabilities extend to understanding human emotions and social interactions, which are critical in educational settings. The paper discusses that ethical issues in education with AGI include data bias, fairness, and privacy and emphasizes the need for codes of conduct to ensure responsible AGI use in academic settings like homework, teaching, and recruitment. We also conclude that the development of AGI necessitates interdisciplinary collaborations between educators and AI engineers to advance research and application efforts.
Real Customization or Just Marketing: Are Customized Versions of Chat GPT Useful?
Garrido-Merchán, Eduardo C., Arroyo-Barrigüete, Jose L., Borrás-Pala, Francisco, Escobar-Torres, Leandro, de Ibarreta, Carlos Martínez, Ortiz-Lozano, Jose María, Rua-Vieites, Antonio
Large Language Models (LLMs), as the case of OpenAI ChatGPT-4 Turbo, are revolutionizing several industries, including higher education. In this context, LLMs can be personalized through a fine-tuning process to meet the student demands on every particular subject, like statistics. Recently, OpenAI has launched the possibility to fine-tune their model with a natural language web interface, enabling the possibility to create customized GPT version deliberately conditioned to meet the demands of a specific task. The objective of this research is to assess the potential of the customized GPTs that have recently been launched by OpenAI. After developing a Business Statistics Virtual Professor (BSVP), tailored for students at the Universidad Pontificia Comillas, its behavior was evaluated and compared with that of ChatGPT-4 Turbo. The results lead to several conclusions. Firstly, a substantial modification in the style of communication was observed. Following the instructions it was trained with, BSVP provided responses in a more relatable and friendly tone, even incorporating a few minor jokes. Secondly, and this is a matter of relevance, when explicitly asked for something like, "I would like to practice a programming exercise similar to those in R practice 4," BSVP was capable of providing a far superior response: having access to contextual documentation, it could fulfill the request, something beyond ChatGPT-4 Turbo's capabilities. On the downside, the response times were generally higher. Lastly, regarding overall performance, quality, depth, and alignment with the specific content of the course, no statistically significant differences were observed in the responses between BSVP and ChatGPT-4 Turbo. It appears that customized assistants trained with prompts present advantages as virtual aids for students, yet they do not constitute a substantial improvement over ChatGPT-4 Turbo.
Certifying LLM Safety against Adversarial Prompting
Kumar, Aounon, Agarwal, Chirag, Srinivas, Suraj, Li, Aaron Jiaxun, Feizi, Soheil, Lakkaraju, Himabindu
Large language models (LLMs) released for public use incorporate guardrails to ensure their output is safe, often referred to as "model alignment." An aligned language model should decline a user's request to produce harmful content. However, such safety measures are vulnerable to adversarial attacks, which add maliciously designed token sequences to a harmful prompt to bypass the model's safety guards. In this work, we introduce erase-and-check, the first framework to defend against adversarial prompts with verifiable safety guarantees. We defend against three attack modes: i) adversarial suffix, which appends an adversarial sequence at the end of the prompt; ii) adversarial insertion, where the adversarial sequence is inserted anywhere in the middle of the prompt; and iii) adversarial infusion, where adversarial tokens are inserted at arbitrary positions in the prompt, not necessarily as a contiguous block. Our experimental results demonstrate that this procedure can obtain strong certified safety guarantees on harmful prompts while maintaining good empirical performance on safe prompts. For example, against adversarial suffixes of length 20, it certifiably detects 92% of harmful prompts and labels 94% of safe prompts correctly using the open-source language model Llama 2 as the safety filter. We further improve the filter's performance, in terms of accuracy and speed, by replacing Llama 2 with a DistilBERT safety classifier fine-tuned on safe and harmful prompts. Additionally, we propose two efficient empirical defenses: i) RandEC, a randomized version of erase-and-check that evaluates the safety filter on a small subset of the erased subsequences, and ii) GradEC, a gradient-based version that optimizes the erased tokens to remove the adversarial sequence. The code for our experiments is available at https://github.com/aounon/certified-llm-safety.
chatGPT for generating questions and assessments based on accreditations
This research aims to take advantage of artificial intelligence techniques in producing students assessment that is compatible with the different academic accreditations of the same program. The possibility of using generative artificial intelligence technology was studied to produce an academic accreditation compliant test the National Center for Academic Accreditation of Kingdom of Saudi Arabia and Accreditation Board for Engineering and Technology. A novel method was introduced to map the verbs used to create the questions introduced in the tests. The method allows a possibility of using the generative artificial intelligence technology to produce and check the validity of questions that measure educational outcomes. A questionnaire was distributed to ensure that the use of generative artificial intelligence to create exam questions is acceptable by the faculty members, as well as to ask about the acceptance of assistance in validating questions submitted by faculty members and amending them in accordance with academic accreditations. The questionnaire was distributed to faculty members of different majors in the Kingdom of Saudi Arabias universities. one hundred twenty responses obtained with eight five percentile approval percentage for generate complete exam questions by generative artificial intelligence . Whereas ninety eight percentage was the approval percentage for editing and improving already existed questions.
Student Mastery or AI Deception? Analyzing ChatGPT's Assessment Proficiency and Evaluating Detection Strategies
Wang, Kevin, Akins, Seth, Mohammed, Abdallah, Lawrence, Ramon
Generative AI systems such as ChatGPT have a disruptive effect on learning and assessment. Computer science requires practice to develop skills in problem solving and programming that are traditionally developed using assignments. Generative AI has the capability of completing these assignments for students with high accuracy, which dramatically increases the potential for academic integrity issues and students not achieving desired learning outcomes. This work investigates the performance of ChatGPT by evaluating it across three courses (CS1,CS2,databases). ChatGPT completes almost all introductory assessments perfectly. Existing detection methods, such as MOSS and JPlag (based on similarity metrics) and GPTzero (AI detection), have mixed success in identifying AI solutions. Evaluating instructors and teaching assistants using heuristics to distinguish between student and AI code shows that their detection is not sufficiently accurate. These observations emphasize the need for adapting assessments and improved detection methods.
Designing Optimal Behavioral Experiments Using Machine Learning
Valentin, Simon, Kleinegesse, Steven, Bramley, Neil R., Seriès, Peggy, Gutmann, Michael U., Lucas, Christopher G.
Computational models are powerful tools for understanding human cognition and behavior. They let us express our theories clearly and precisely, and offer predictions that can be subtle and often counter-intuitive. However, this same richness and ability to surprise means our scientific intuitions and traditional tools are ill-suited to designing experiments to test and compare these models. To avoid these pitfalls and realize the full potential of computational modeling, we require tools to design experiments that provide clear answers about what models explain human behavior and the auxiliary assumptions those models must make. Bayesian optimal experimental design (BOED) formalizes the search for optimal experimental designs by identifying experiments that are expected to yield informative data. In this work, we provide a tutorial on leveraging recent advances in BOED and machine learning to find optimal experiments for any kind of model that we can simulate data from, and show how by-products of this procedure allow for quick and straightforward evaluation of models and their parameters against real experimental data. As a case study, we consider theories of how people balance exploration and exploitation in multi-armed bandit decision-making tasks. We validate the presented approach using simulations and a real-world experiment. As compared to experimental designs commonly used in the literature, we show that our optimal designs more efficiently determine which of a set of models best account for individual human behavior, and more efficiently characterize behavior given a preferred model. At the same time, formalizing a scientific question such that it can be adequately addressed with BOED can be challenging and we discuss several potential caveats and pitfalls that practitioners should be aware of. We provide code and tutorial notebooks to replicate all analyses.
Applying statistical learning theory to deep learning
Gerbelot, Cédric, Karagulyan, Avetik, Karp, Stefani, Ravichandran, Kavya, Stern, Menachem, Srebro, Nathan
Although statistical learning theory provides a robust framework to understand supervised learning, many theoretical aspects of deep learning remain unclear, in particular how different architectures may lead to inductive bias when trained using gradient based methods. The goal of these lectures is to provide an overview of some of the main questions that arise when attempting to understand deep learning from a learning theory perspective. After a brief reminder on statistical learning theory and stochastic optimization, we discuss implicit bias in the context of benign overfitting. We then move to a general description of the mirror descent algorithm, showing how we may go back and forth between a parameter space and the corresponding function space for a given learning problem, as well as how the geometry of the learning problem may be represented by a metric tensor. Building on this framework, we provide a detailed study of the implicit bias of gradient descent on linear diagonal networks for various regression tasks, showing how the loss function, scale of parameters at initialization and depth of the network may lead to various forms of implicit bias, in particular transitioning between kernel or feature learning.
A review of ensemble learning and data augmentation models for class imbalanced problems: combination, implementation and evaluation
Khan, Azal Ahmad, Chaudhari, Omkar, Chandra, Rohitash
Class imbalance (CI) in classification problems arises when the number of observations belonging to one class is lower than the other. Ensemble learning combines multiple models to obtain a robust model and has been prominently used with data augmentation methods to address class imbalance problems. In the last decade, a number of strategies have been added to enhance ensemble learning and data augmentation methods, along with new methods such as generative adversarial networks (GANs). A combination of these has been applied in many studies, and the evaluation of different combinations would enable a better understanding and guidance for different application domains. In this paper, we present a computational study to evaluate data augmentation and ensemble learning methods used to address prominent benchmark CI problems. We present a general framework that evaluates 9 data augmentation and 9 ensemble learning methods for CI problems. Our objective is to identify the most effective combination for improving classification performance on imbalanced datasets. The results indicate that combinations of data augmentation methods with ensemble learning can significantly improve classification performance on imbalanced datasets. We find that traditional data augmentation methods such as the synthetic minority oversampling technique (SMOTE) and random oversampling (ROS) are not only better in performance for selected CI problems, but also computationally less expensive than GANs. Our study is vital for the development of novel models for handling imbalanced datasets.
15 Best Black Friday TV Deals (2023): TCL, Samsung, LG, Sony
Consider this year's Black Friday and Cyber Monday TV sales to be what you've been looking for if you're in the market for a new screen. There are screaming TV deals from many of our favorite brands, including TCL, Samsung, LG, and Sony. We even found a couple of great deals on streaming devices to boot! Before you dig into the deal pile, be sure to check out our roundups of The Best TVs, Best Soundbars, and Best Projectors, plus learn How To Upgrade Your Home Theater. Don't like the operating system the TV you have an eye on rocks from the factory?
Automatic detection of problem-gambling signs from online texts using large language models
Smith, Elke, Reiter, Nils, Peters, Jan
Problem gambling is a major public health concern and is associated with profound psychological distress and economic problems. There are numerous gambling communities on the internet where users exchange information about games, gambling tactics, as well as gambling-related problems. Individuals exhibiting higher levels of problem gambling engage more in such communities. Online gambling communities may provide insights into problem-gambling behaviour. Using data scraped from a major German gambling discussion board, we fine-tuned a large language model, specifically a Bidirectional Encoder Representations from Transformers (BERT) model, to predict signs of problem-gambling from forum posts. Training data were generated by manual annotation and by taking into account diagnostic criteria and gambling-related cognitive distortions. Using k-fold cross-validation, our models achieved a precision of 0.95 and F1 score of 0.71, demonstrating that satisfactory classification performance can be achieved by generating high-quality training material through manual annotation based on diagnostic criteria. The current study confirms that a BERT-based model can be reliably used on small data sets and to detect signatures of problem gambling in online communication data. Such computational approaches may have potential for the detection of changes in problem-gambling prevalence among online users.