Instructional Material
A Framework for Interpretability in Machine Learning for Medical Imaging
Wang, Alan Q., Karaman, Batuhan K., Kim, Heejong, Rosenthal, Jacob, Saluja, Rachit, Young, Sean I., Sabuncu, Mert R.
Interpretability for machine learning models in medical imaging (MLMI) is an important direction of research. However, there is a general sense of murkiness in what interpretability means. Why does the need for interpretability in MLMI arise? What goals does one actually seek to address when interpretability is needed? To answer these questions, we identify a need to formalize the goals and elements of interpretability in MLMI. By reasoning about real-world tasks and goals common in both medical image analysis and its intersection with machine learning, we identify five core elements of interpretability: localization, visual recognizability, physical attribution, model transparency, and actionability. From this, we arrive at a framework for interpretability in MLMI, which serves as a step-by-step guide to approaching interpretability in this context. Overall, this paper formalizes interpretability needs in the context of medical imaging, and our applied perspective clarifies concrete MLMI-specific goals and considerations in order to guide method design and improve real-world usage. Our goal is to provide practical and didactic information for model designers and practitioners, inspire developers of models in the medical imaging field to reason more deeply about what interpretability is achieving, and suggest future directions of interpretability research.
Universal and Transferable Adversarial Attacks on Aligned Language Models
Zou, Andy, Wang, Zifan, Carlini, Nicholas, Nasr, Milad, Kolter, J. Zico, Fredrikson, Matt
Because "out-of-the-box" large language models are capable of generating a great deal of objectionable content, recent work has focused on aligning these models in an attempt to prevent undesirable generation. While there has been some success at circumventing these measures -- so-called "jailbreaks" against LLMs -- these attacks have required significant human ingenuity and are brittle in practice. In this paper, we propose a simple and effective attack method that causes aligned language models to generate objectionable behaviors. Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer). However, instead of relying on manual engineering, our approach automatically produces these adversarial suffixes by a combination of greedy and gradient-based search techniques, and also improves over past automatic prompt generation methods. Surprisingly, we find that the adversarial prompts generated by our approach are quite transferable, including to black-box, publicly released LLMs. Specifically, we train an adversarial attack suffix on multiple prompts (i.e., queries asking for many different types of objectionable content), as well as multiple models (in our case, Vicuna-7B and 13B). When doing so, the resulting attack suffix is able to induce objectionable content in the public interfaces to ChatGPT, Bard, and Claude, as well as open source LLMs such as LLaMA-2-Chat, Pythia, Falcon, and others. In total, this work significantly advances the state-of-the-art in adversarial attacks against aligned language models, raising important questions about how such systems can be prevented from producing objectionable information. Code is available at github.com/llm-attacks/llm-attacks.
Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization
Zhang, Dinghuai, Chen, Ricky T. Q., Liu, Cheng-Hao, Courville, Aaron, Bengio, Yoshua
We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional "flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals. Through various challenging experiments, we demonstrate that DGFS achieves more accurate estimates of the normalization constant than closely-related prior methods.
A note on regularised NTK dynamics with an application to PAC-Bayesian training
Clerico, Eugenio, Guedj, Benjamin
We establish explicit dynamics for neural networks whose training objective has a regularising term that constrains the parameters to remain close to their initial value. This keeps the network in a lazy training regime, where the dynamics can be linearised around the initialisation. The standard neural tangent kernel (NTK) governs the evolution during the training in the infinite-width limit, although the regularisation yields an additional term appears in the differential equation describing the dynamics. This setting provides an appropriate framework to study the evolution of wide networks trained to optimise generalisation objectives such as PAC-Bayes bounds, and hence potentially contribute to a deeper theoretical understanding of such networks.
Collaborative Learning with Artificial Intelligence Speakers (CLAIS): Pre-Service Elementary Science Teachers' Responses to the Prototype
Lee, Gyeong-Geon, Mun, Seonyeong, Shin, Myeong-Kyeong, Zhai, Xiaoming
This research aims to demonstrate that AI can function not only as a tool for learning, but also as an intelligent agent with which humans can engage in collaborative learning (CL) to change epistemic practices in science classrooms. We adopted a design and development research approach, following the Analysis, Design, Development, Implementation and Evaluation (ADDIE) model, to prototype a tangible instructional system called Collaborative Learning with AI Speakers (CLAIS). The CLAIS system is designed to have 3-4 human learners join an AI speaker to form a small group, where humans and AI are considered as peers participating in the Jigsaw learning process. The development was carried out using the NUGU AI speaker platform. The CLAIS system was successfully implemented in a Science Education course session with 15 pre-service elementary science teachers. The participants evaluated the CLAIS system through mixed methods surveys as teachers, learners, peers, and users. Quantitative data showed that the participants' Intelligent-Technological, Pedagogical, And Content Knowledge was significantly increased after the CLAIS session, the perception of the CLAIS learning experience was positive, the peer assessment on AI speakers and human peers was different, and the user experience was ambivalent. Qualitative data showed that the participants anticipated future changes in the epistemic process in science classrooms, while acknowledging technical issues such as speech recognition performance and response latency. This study highlights the potential of Human-AI Collaboration for knowledge co-construction in authentic classroom settings and exemplify how AI could shape the future landscape of epistemic practices in the classroom.
Human-Centred Learning Analytics and AI in Education: a Systematic Literature Review
Alfredo, Riordan, Echeverria, Vanessa, Jin, Yueqiao, Yan, Lixiang, Swiecki, Zachari, Gaลกeviฤ, Dragan, Martinez-Maldonado, Roberto
The rapid expansion of Learning Analytics (LA) and Artificial Intelligence in Education (AIED) offers new scalable, data-intensive systems but also raises concerns about data privacy and agency. Excluding stakeholders -- like students and teachers -- from the design process can potentially lead to mistrust and inadequately aligned tools. Despite a shift towards human-centred design in recent LA and AIED research, there remain gaps in our understanding of the importance of human control, safety, reliability, and trustworthiness in the design and implementation of these systems. We conducted a systematic literature review to explore these concerns and gaps. We analysed 108 papers to provide insights about i) the current state of human-centred LA/AIED research; ii) the extent to which educational stakeholders have contributed to the design process of human-centred LA/AIED systems; iii) the current balance between human control and computer automation of such systems; and iv) the extent to which safety, reliability and trustworthiness have been considered in the literature. Results indicate some consideration of human control in LA/AIED system design, but limited end-user involvement in actual design. Based on these findings, we recommend: 1) carefully balancing stakeholders' involvement in designing and deploying LA/AIED systems throughout all design phases, 2) actively involving target end-users, especially students, to delineate the balance between human control and automation, and 3) exploring safety, reliability, and trustworthiness as principles in future human-centred LA/AIED systems.
CodeLL: A Lifelong Learning Dataset to Support the Co-Evolution of Data and Language Models of Code
Weyssow, Martin, Di Sipio, Claudio, Di Ruscio, Davide, Sahraoui, Houari
Motivated by recent work on lifelong learning applications for language models (LMs) of code, we introduce CodeLL, a lifelong learning dataset focused on code changes. Our contribution addresses a notable research gap marked by the absence of a long-term temporal dimension in existing code change datasets, limiting their suitability in lifelong learning scenarios. In contrast, our dataset aims to comprehensively capture code changes across the entire release history of open-source software repositories. In this work, we introduce an initial version of CodeLL, comprising 71 machine-learning-based projects mined from Software Heritage. This dataset enables the extraction and in-depth analysis of code changes spanning 2,483 releases at both the method and API levels. CodeLL enables researchers studying the behaviour of LMs in lifelong fine-tuning settings for learning code changes. Additionally, the dataset can help studying data distribution shifts within software repositories and the evolution of API usages over time.
Beyond Traditional Teaching: The Potential of Large Language Models and Chatbots in Graduate Engineering Education
Abedi, Mahyar, Alshybani, Ibrahem, Shahadat, Muhammad Rubayat Bin, Murillo, Michael S.
In the rapidly evolving landscape of education, digital technologies have repeatedly disrupted traditional pedagogical methods. This paper explores the latest of these disruptions: the potential integration of large language models (LLMs) and chatbots into graduate engineering education. We begin by tracing historical and technological disruptions to provide context and then introduce key terms such as machine learning and deep learning and the underlying mechanisms of recent advancements, namely attention/transformer models and graphics processing units. The heart of our investigation lies in the application of an LLM-based chatbot in a graduate fluid mechanics course. We developed a question bank from the course material and assessed the chatbot's ability to provide accurate, insightful responses. The results are encouraging, demonstrating not only the bot's ability to effectively answer complex questions but also the potential advantages of chatbot usage in the classroom, such as the promotion of self-paced learning, the provision of instantaneous feedback, and the reduction of instructors' workload. The study also examines the transformative effect of intelligent prompting on enhancing the chatbot's performance. Furthermore, we demonstrate how powerful plugins like Wolfram Alpha for mathematical problem-solving and code interpretation can significantly extend the chatbot's capabilities, transforming it into a comprehensive educational tool. While acknowledging the challenges and ethical implications surrounding the use of such AI models in education, we advocate for a balanced approach. The use of LLMs and chatbots in graduate education can be greatly beneficial but requires ongoing evaluation and adaptation to ensure ethical and efficient use.
Bypassing the Safety Training of Open-Source LLMs with Priming Attacks
Vega, Jason, Chaudhary, Isha, Xu, Changming, Singh, Gagandeep
Content warning: This paper contains examples of harmful language. With the recent surge in popularity of LLMs has come an ever-increasing need for LLM safety training. In this paper, we investigate the fragility of SOTA opensource LLMs under simple, optimization-free attacks we refer to as priming attacks, which are easy to execute and effectively bypass alignment from safety training. Our proposed attack improves the Attack Success Rate on Harmful Behaviors, as measured by Llama Guard, by up to 3.3 compared to baselines. Autoregressive Large Language Models (LLMs) have emerged as powerful conversational agents widely used in user-facing applications. To ensure that LLMs cannot be used for nefarious purposes, they are extensively safety-trained for human alignment using techniques such as RLHF (Christiano et al., 2023). Despite such efforts, it is still possible to circumvent the alignment to obtain harmful outputs (Carlini et al., 2023). For instance, Zou et al. (2023) generated prompts to attack popular open-source aligned LLMs such as Llama-2 (Touvron et al., 2023a) and Vicuna (Chiang et al., 2023) to either output harmful target strings or comply with harmful behavior requests.
Skills or Degree? The Rise of Skill-Based Hiring for AI and Green Jobs
Ehlinger, Eugenia Gonzalez, Stephany, Fabian
For emerging professions, such as jobs in the field of Artificial Intelligence (AI) or sustainability (green), labour supply does not meet industry demand. In this scenario of labour shortages, our work aims to understand whether employers have started focusing on individual skills rather than on formal qualifications in their recruiting. By analysing a large time series dataset of around one million online job vacancies between 2019 and 2022 from the UK and drawing on diverse literature on technological change and labour market signalling, we provide evidence that employers have started so-called "skill-based hiring" for AI and green roles, as more flexible hiring practices allow them to increase the available talent pool. In our observation period the demand for AI roles grew twice as much as average labour demand. At the same time, the mention of university education for AI roles declined by 23%, while AI roles advertise five times as many skills as job postings on average. Our regression analysis also shows that university degrees no longer show an educational premium for AI roles, while for green positions the educational premium persists. In contrast, AI skills have a wage premium of 16%, similar to having a PhD (17%). Our work recommends making use of alternative skill building formats such as apprenticeships, on-the-job training, MOOCs, vocational education and training, micro-certificates, and online bootcamps to use human capital to its full potential and to tackle talent shortages.