Instructional Material
Accessing Vision Foundation Models at ImageNet-level Costs
Zhang, Yitian, Ma, Xu, Bai, Yue, Wang, Huan, Fu, Yun
Vision foundation models are renowned for their generalization ability due to massive training data. Nevertheless, they demand tremendous training resources, and the training data is often inaccessible, e.g., CLIP, DINOv2, posing great challenges to developing derivatives that could advance research in this field. In this work, we offer a very simple and general solution, named Proteus, to distill foundation models into smaller equivalents on ImageNet-1K without access to the original training data. Specifically, we remove the designs from conventional knowledge distillation settings that result in dataset bias and present three levels of training objectives, i.e., token, patch, and feature, to maximize the efficacy of knowledge transfer. In this manner, Proteus is trained at ImageNet-level costs with surprising ability, facilitating the accessibility of training foundation models for the broader research community. Leveraging DINOv2-g/14 as the teacher, Proteus-L/14 matches the performance of the Oracle method DINOv2-L/14 (142M training data) across 15 benchmarks and outperforms other vision foundation models including CLIP-L/14 (400M), OpenCLIP-L/14 (400M/2B) and SynCLR-L/14 (600M). Code is available at here.
Learning to Represent Surroundings, Anticipate Motion and Take Informed Actions in Unstructured Environments
Contemporary robots have become exceptionally skilled at achieving specific tasks in structured environments. However, they often fail when faced with the limitless permutations of real-world unstructured environments. This motivates robotics methods which learn from experience, rather than follow a pre-defined set of rules. In this thesis, we present a range of learning-based methods aimed at enabling robots, operating in dynamic and unstructured environments, to better understand their surroundings, anticipate the actions of others, and take informed actions accordingly. In the first part of the thesis, we investigate methods which leverage learning to represent the structure and motion in a robot's operating environment, in a continuous manner.
Ontology-driven Reinforcement Learning for Personalized Student Support
In the search for more effective education, there is a widespread effort to develop better approaches to personalize student education. Unassisted, educators often do not have time or resources to personally support every student in a given classroom. Motivated by this issue, and by recent advancements in artificial intelligence, this paper presents a general-purpose framework for personalized student support, applicable to any virtual educational system such as a serious game or an intelligent tutoring system. To fit any educational situation, we apply ontologies for their semantic organization, combining them with data collection considerations and multi-agent reinforcement learning. The result is a modular system that can be adapted to any virtual educational software to provide useful personalized assistance to students.
Integrating AI Tutors in a Programming Course
Ma, Iris, Martins, Alberto Krone, Lopes, Cristina Videira
RAGMan is an LLM-powered tutoring system that can support a variety of course-specific and homework-specific AI tutors. RAGMan leverages Retrieval Augmented Generation (RAG), as well as strict instructions, to ensure the alignment of the AI tutors' responses. By using RAGMan's AI tutors, students receive assistance with their specific homework assignments without directly obtaining solutions, while also having the ability to ask general programming-related questions. RAGMan was deployed as an optional resource in an introductory programming course with an enrollment of 455 students. It was configured as a set of five homework-specific AI tutors. This paper describes the interactions the students had with the AI tutors, the students' feedback, and a comparative grade analysis. Overall, about half of the students engaged with the AI tutors, and the vast majority of the interactions were legitimate homework questions. When students posed questions within the intended scope, the AI tutors delivered accurate responses 98% of the time. Within the students used AI tutors, 78% reported that the tutors helped their learning. Beyond AI tutors' ability to provide valuable suggestions, students reported appreciating them for fostering a safe learning environment free from judgment.
Model-free Distortion Canceling and Control of Quantum Devices
Fouad, Ahmed F., Youssry, Akram, El-Rafei, Ahmed, Hammad, Sherif
Quantum devices need precise control to achieve their full capability. In this work, we address the problem of controlling closed quantum systems, tackling two main issues. First, in practice the control signals are usually subject to unknown classical distortions that could arise from the device fabrication, material properties and/or instruments generating those signals. Second, in most cases modeling the system is very difficult or not even viable due to uncertainties in the relations between some variables and inaccessibility to some measurements inside the system. In this paper, we introduce a general model-free control approach based on deep reinforcement learning (DRL), that can work for any closed quantum system. We train a deep neural network (NN), using the REINFORCE policy gradient algorithm to control the state probability distribution of a closed quantum system as it evolves, and drive it to different target distributions. We present a novel controller architecture that comprises multiple NNs. This enables accommodating as many different target state distributions as desired, without increasing the complexity of the NN or its training process. The used DRL algorithm works whether the control problem can be modeled as a Markov decision process (MDP) or a partially observed MDP. Our method is valid whether the control signals are discrete- or continuous-valued. We verified our method through numerical simulations based on a photonic waveguide array chip. We trained a controller to generate sequences of different target output distributions of the chip with fidelity higher than 99%, where the controller showed superior performance in canceling the classical signal distortions.
Overcoming Catastrophic Forgetting in Tabular Data Classification: A Pseudorehearsal-based approach
Garcรญa-Santaclara, Pablo, Fernรกndez-Castro, Bruno, Dรญaz-Redondo, Rebeca P.
Continual learning (CL) poses the important challenge of adapting to evolving data distributions without forgetting previously acquired knowledge while consolidating new knowledge. In this paper, we introduce a new methodology, coined as Tabular-data Rehearsal-based Incremental Lifelong Learning framework (TRIL3), designed to address the phenomenon of catastrophic forgetting in tabular data classification problems. TRIL3 uses the prototype-based incremental generative model XuILVQ to generate synthetic data to preserve old knowledge and the DNDF algorithm, which was modified to run in an incremental way, to learn classification tasks for tabular data, without storing old samples. After different tests to obtain the adequate percentage of synthetic data and to compare TRIL3 with other CL available proposals, we can conclude that the performance of TRIL3 outstands other options in the literature using only 50% of synthetic data. Continual learning (CL) [De Lange et al.(2021)], [Wang et al.(2024)], also known as lifelong learning [Parisi et al.(2019)], is an artificial intelligence approach that focuses on the ability of models to adapt and improve over time as they incrementally learn while processing dynamic data-streams. The underlying philosophy is using batches of data, a batch may be even just one sample, taken from a data-stream to train the system: each batch is used only once. This means that it is not possible to access previously processed data and, therefore, this entails a radical change compared to the classical pipeline of training, validating, and testing in ML. Therefore, CL is highly recommendable when facing scenarios where the model needs to adapt quickly to new data or when the model needs to be personalized. However, there is an important challenge in CL due to its nature: since models easily adapt to new knowledge, they tend to forget past knowledge. This effect, known as catastrophic forgetting [French(1999)], [Kirkpatrick et al.(2017)], entails models to reduce their performance when acquiring new knowledge, which impacts their usefulness. This is especially severe in class-incremental learning, when it is expected the model is able to differentiate among a set of classes.
New Desiderata for Direct Preference Optimization
Hu, Xiangkun, He, Tong, Wipf, David
Large language models in the past have typically relied on some form of reinforcement learning with human feedback (RLHF) to better align model responses with human preferences. However, because of oft-observed instabilities when implementing these RLHF pipelines, various reparameterization techniques have recently been introduced to sidestep the need for separately learning an RL reward model. Instead, directly fine-tuning for human preferences is achieved via the minimization of a single closed-form training objective, a process originally referred to as direct preference optimization (DPO) and followed by several notable descendants. Although effective in certain real-world settings, we introduce new evaluation criteria that serve to highlight unresolved shortcomings in the ability of existing DPO methods to interpolate between a pre-trained reference model and empirical measures of human preferences, as well as unavoidable trade-offs in how low- and high-quality responses are regularized and constraints are handled. Our insights then motivate an alternative DPO-like loss that provably mitigates these limitations. Empirical results serve to corroborate notable aspects of our analyses.
Jailbreaking as a Reward Misspecification Problem
Xie, Zhihui, Gao, Jiahui, Li, Lei, Li, Zhenguo, Liu, Qi, Kong, Lingpeng
The widespread adoption of large language models (LLMs) has raised concerns about their safety and reliability, particularly regarding their vulnerability to adversarial attacks. In this paper, we propose a novel perspective that attributes this vulnerability to reward misspecification during the alignment process. We introduce a metric ReGap to quantify the extent of reward misspecification and demonstrate its effectiveness and robustness in detecting harmful backdoor prompts. Building upon these insights, we present ReMiss, a system for automated red teaming that generates adversarial prompts against various target aligned LLMs. ReMiss achieves state-of-the-art attack success rates on the AdvBench benchmark while preserving the human readability of the generated prompts. Detailed analysis highlights the unique advantages brought by the proposed reward misspecification objective compared to previous methods.
Latent Conditional Diffusion-based Data Augmentation for Continuous-Time Dynamic Graph Mode
Tian, Yuxing, Qi, Yiyan, Jiang, Aiwen, Huang, Qi, Guo, Jian
Continuous-Time Dynamic Graph (CTDG) precisely models evolving real-world relationships, drawing heightened interest in dynamic graph learning across academia and industry. However, existing CTDG models encounter challenges stemming from noise and limited historical data. Graph Data Augmentation (GDA) emerges as a critical solution, yet current approaches primarily focus on static graphs and struggle to effectively address the dynamics inherent in CTDGs. Moreover, these methods often demand substantial domain expertise for parameter tuning and lack theoretical guarantees for augmentation efficacy. To address these issues, we propose Conda, a novel latent diffusion-based GDA method tailored for CTDGs. Conda features a sandwich-like architecture, incorporating a Variational Auto-Encoder (VAE) and a conditional diffusion model, aimed at generating enhanced historical neighbor embeddings for target nodes. Unlike conventional diffusion models trained on entire graphs via pre-training, Conda requires historical neighbor sequence embeddings of target nodes for training, thus facilitating more targeted augmentation. We integrate Conda into the CTDG model and adopt an alternating training strategy to optimize performance. Extensive experimentation across six widely used real-world datasets showcases the consistent performance improvement of our approach, particularly in scenarios with limited historical data.
BISCUIT: Scaffolding LLM-Generated Code with Ephemeral UIs in Computational Notebooks
Cheng, Ruijia, Barik, Titus, Leung, Alan, Hohman, Fred, Nichols, Jeffrey
Programmers frequently engage with machine learning tutorials in computational notebooks and have been adopting code generation technologies based on large language models (LLMs). However, they encounter difficulties in understanding and working with code produced by LLMs. To mitigate these challenges, we introduce a novel workflow into computational notebooks that augments LLM-based code generation with an additional ephemeral UI step, offering users UI scaffolds as an intermediate stage between user prompts and code generation. We present this workflow in BISCUIT, an extension for JupyterLab that provides users with ephemeral UIs generated by LLMs based on the context of their code and intentions, scaffolding users to understand, guide, and explore with LLM-generated code. Through a user study where 10 novices used BISCUIT for machine learning tutorials, we found that BISCUIT offers users representations of code to aid their understanding, reduces the complexity of prompt engineering, and creates a playground for users to explore different variables and iterate on their ideas.