Instructional Material
Fighting Failures with FIRE: Failure Identification to Reduce Expert Burden in Intervention-Based Learning
Ablett, Trevor, Marić, Filip, Kelly, Jonathan
Supervised imitation learning, also known as behavioral cloning, suffers from distribution drift leading to failures during policy execution. One approach to mitigate this issue is to allow an expert to correct the agent's actions during task execution, based on the expert's determination that the agent has reached a `point of no return.' The agent's policy is then retrained using this new corrective data. This approach alone can enable high-performance agents to be learned, but at a substantial cost: the expert must vigilantly observe execution until the policy reaches a specified level of success, and even at that point, there is no guarantee that the policy will always succeed. To address these limitations, we present FIRE (Failure Identification to Reduce Expert Burden in intervention-based learning), a system that can predict when a running policy will fail, halt its execution, and request a correction from the expert. Unlike existing approaches that learn only from expert data, our approach learns from both expert and non-expert data, akin to adversarial learning. We demonstrate experimentally for a series of challenging manipulation tasks that our method is able to recognize state-action pairs that lead to failures. This permits seamless integration into an intervention-based learning system, where we show an order-of-magnitude gain in sample efficiency compared with a state-of-the-art inverse reinforcement learning method and dramatically improved performance over an equivalent amount of data learned with behavioral cloning.
What's coming up at #NeurIPS2023?
The thirty-seventh Conference on Neural Information Processing Systems (NeurIPS 2023) is due to kick-off on Sunday 10 December and run until Saturday 16 December. There is a bumper programme of events, including invited talks, orals, posters, tutorials, workshops, and socials, not to mention AIhub's session on science communication. There are seven invited talks this year. For this year's conference, there will be a total of 14 tutorials. These will be held on Monday 11 December, in person only.
Testing LLM performance on the Physics GRE: some observations
With the recent developments in large language models (LLMs) and their widespread availability through open source models and/or low-cost APIs, several exciting products and applications are emerging, many of which are in the field of STEM educational technology for K-12 and university students. There is a need to evaluate these powerful language models on several benchmarks, in order to understand their risks and limitations. In this short paper, we summarize and analyze the performance of Bard, a popular LLM-based conversational service made available by Google, on the standardized Physics GRE examination.
Digital Life Project: Autonomous 3D Characters with Social Intelligence
Cai, Zhongang, Jiang, Jianping, Qing, Zhongfei, Guo, Xinying, Zhang, Mingyuan, Lin, Zhengyu, Mei, Haiyi, Wei, Chen, Wang, Ruisi, Yin, Wanqi, Fan, Xiangyu, Du, Han, Pan, Liang, Gao, Peng, Yang, Zhitao, Gao, Yang, Li, Jiaqi, Ren, Tianxiang, Wei, Yukun, Wang, Xiaogang, Loy, Chen Change, Yang, Lei, Liu, Ziwei
In this work, we present Digital Life Project, a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment. Our framework comprises two primary components: 1) SocioMind: a meticulously crafted digital brain that models personalities with systematic few-shot exemplars, incorporates a reflection process based on psychology principles, and emulates autonomy by initiating dialogue topics; 2) MoMat-MoGen: a text-driven motion synthesis paradigm for controlling the character's digital body. It integrates motion matching, a proven industry technique to ensure motion quality, with cutting-edge advancements in motion generation for diversity. Extensive experiments demonstrate that each module achieves state-of-the-art performance in its respective domain. Collectively, they enable virtual characters to initiate and sustain dialogues autonomously, while evolving their socio-psychological states. Concurrently, these characters can perform contextually relevant bodily movements. Additionally, a motion captioning module further allows the virtual character to recognize and appropriately respond to human players' actions. Homepage: https://digital-life-project.com/
Unnatural Algorithms in Machine Learning
Natural gradient descent has a remarkable property that in the small learning rate limit, it displays an invariance with respect to network reparameterizations, leading to robust training behavior even for highly covariant network parameterizations. We show that optimization algorithms with this property can be viewed as discrete approximations of natural transformations from the functor determining an optimizer's state space from the diffeomorphism group if its configuration manifold, to the functor determining that state space's tangent bundle from this group. Algorithms with this property enjoy greater efficiency when used to train poorly parameterized networks, as the network evolution they generate is approximately invariant to network reparameterizations. More specifically, the flow generated by these algorithms in the limit as the learning rate vanishes is invariant under smooth reparameterizations, the respective flows of the parameters being determined by equivariant maps. By casting this property a natural transformation, we allow for generalizations beyond equivariance with respect to group actions; this framework can account for non-invertible maps such as projections, creating a framework for the direct comparison of training behavior across non-isomorphic network architectures, and the formal examination of limiting behavior as network size increases by considering inverse limits of these projections, should they exist. We introduce a simple method of introducing this naturality more generally and examine a number of popular machine learning training algorithms, finding that most are unnatural.
Wake-Sleep Consolidated Learning
Sorrenti, Amelia, Bellitto, Giovanni, Salanitri, Federica Proietto, Pennisi, Matteo, Palazzo, Simone, Spampinato, Concetto
We propose Wake-Sleep Consolidated Learning (WSCL), a learning strategy leveraging Complementary Learning System theory and the wake-sleep phases of the human brain to improve the performance of deep neural networks for visual classification tasks in continual learning settings. Our method learns continually via the synchronization between distinct wake and sleep phases. During the wake phase, the model is exposed to sensory input and adapts its representations, ensuring stability through a dynamic parameter freezing mechanism and storing episodic memories in a short-term temporary memory (similarly to what happens in the hippocampus). During the sleep phase, the training process is split into NREM and REM stages. In the NREM stage, the model's synaptic weights are consolidated using replayed samples from the short-term and long-term memory and the synaptic plasticity mechanism is activated, strengthening important connections and weakening unimportant ones. In the REM stage, the model is exposed to previously-unseen realistic visual sensory experience, and the dreaming process is activated, which enables the model to explore the potential feature space, thus preparing synapses to future knowledge. We evaluate the effectiveness of our approach on three benchmark datasets: CIFAR-10, Tiny-ImageNet and FG-ImageNet. In all cases, our method outperforms the baselines and prior work, yielding a significant performance gain on continual visual classification tasks. Furthermore, we demonstrate the usefulness of all processing stages and the importance of dreaming to enable positive forward transfer.
Coherent Soft Imitation Learning
Watson, Joe, Huang, Sandy H., Heess, Nicolas
Imitation learning methods seek to learn from an expert either through behavioral cloning (BC) of the policy or inverse reinforcement learning (IRL) of the reward. Such methods enable agents to learn complex tasks from humans that are difficult to capture with hand-designed reward functions. Choosing BC or IRL for imitation depends on the quality and state-action coverage of the demonstrations, as well as additional access to the Markov decision process. Hybrid strategies that combine BC and IRL are not common, as initial policy optimization against inaccurate rewards diminishes the benefit of pretraining the policy with BC. This work derives an imitation method that captures the strengths of both BC and IRL. In the entropy-regularized ('soft') reinforcement learning setting, we show that the behaviour-cloned policy can be used as both a shaped reward and a critic hypothesis space by inverting the regularized policy update. This coherency facilitates fine-tuning cloned policies using the reward estimate and additional interactions with the environment. This approach conveniently achieves imitation learning through initial behaviour cloning, followed by refinement via RL with online or offline data sources. The simplicity of the approach enables graceful scaling to high-dimensional and vision-based tasks, with stable learning and minimal hyperparameter tuning, in contrast to adversarial approaches. For the open-source implementation and simulation results, see https://joemwatson.github.io/csil/.
A Comparative Study of AI-Generated (GPT-4) and Human-crafted MCQs in Programming Education
Doughty, Jacob, Wan, Zipiao, Bompelli, Anishka, Qayum, Jubahed, Wang, Taozhi, Zhang, Juran, Zheng, Yujia, Doyle, Aidan, Sridhar, Pragnya, Agarwal, Arav, Bogart, Christopher, Keylor, Eric, Kultur, Can, Savelka, Jaromir, Sakr, Majd
There is a constant need for educators to develop and maintain effective up-to-date assessments. While there is a growing body of research in computing education on utilizing large language models (LLMs) in generation and engagement with coding exercises, the use of LLMs for generating programming MCQs has not been extensively explored. We analyzed the capability of GPT-4 to produce multiple-choice questions (MCQs) aligned with specific learning objectives (LOs) from Python programming classes in higher education. Specifically, we developed an LLM-powered (GPT-4) system for generation of MCQs from high-level course context and module-level LOs. We evaluated 651 LLM-generated and 449 human-crafted MCQs aligned to 246 LOs from 6 Python courses. We found that GPT-4 was capable of producing MCQs with clear language, a single correct choice, and high-quality distractors. We also observed that the generated MCQs appeared to be well-aligned with the LOs. Our findings can be leveraged by educators wishing to take advantage of the state-of-the-art generative models to support MCQ authoring efforts.
Toward Energy-Efficient Massive MIMO: Graph Neural Network Precoding for Mitigating Non-Linear PA Distortion
Feys, Thomas, Van der Perre, Liesbet, Rottenberg, François
Massive MIMO systems are typically designed assuming linear power amplifiers (PAs). However, PAs are most energy efficient close to saturation, where non-linear distortion arises. For conventional precoders, this distortion can coherently combine at user locations, limiting performance. We propose a graph neural network (GNN) to learn a mapping between channel and precoding matrices, which maximizes the sum rate affected by non-linear distortion, using a high-order polynomial PA model. In the distortion-limited regime, this GNN-based precoder outperforms zero forcing (ZF), ZF plus digital pre-distortion (DPD) and the distortion-aware beamforming (DAB) precoder from the state-of-the-art. At an input back-off of -3 dB the proposed precoder compared to ZF increases the sum rate by 8.60 and 8.84 bits/channel use for two and four users respectively. Radiation patterns show that these gains are achieved by transmitting the non-linear distortion in non-user directions. In the four user-case, for a fixed sum rate, the total consumed power (PA and processing) of the GNN precoder is 3.24 and 1.44 times lower compared to ZF and ZF plus DPD respectively. A complexity analysis shows six orders of magnitude reduction compared to DAB precoding. This opens perspectives to operate PAs closer to saturation, which drastically increases their energy efficiency.
Effective Backdoor Mitigation Depends on the Pre-training Objective
Verma, Sahil, Bhatt, Gantavya, Schwarzschild, Avi, Singhal, Soumye, Das, Arnav Mohanty, Shah, Chirag, Dickerson, John P, Bilmes, Jeff
Despite the advanced capabilities of contemporary machine learning (ML) models, they remain vulnerable to adversarial and backdoor attacks. This vulnerability is particularly concerning in real-world deployments, where compromised models may exhibit unpredictable behavior in critical scenarios. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for pre-training multimodal models, as these datasets may harbor backdoors. Various techniques have been proposed to mitigate the effects of backdooring in these models such as CleanCLIP which is the current state-of-the-art approach. In this work, we demonstrate that the efficacy of CleanCLIP in mitigating backdoors is highly dependent on the particular objective used during model pre-training. We observe that stronger pre-training objectives correlate with harder to remove backdoors behaviors. We show this by training multimodal models on two large datasets consisting of 3 million (CC3M) and 6 million (CC6M) datapoints, under various pre-training objectives, followed by poison removal using CleanCLIP. We find that CleanCLIP is ineffective when stronger pre-training objectives are used, even with extensive hyperparameter tuning. Our findings underscore critical considerations for ML practitioners who pre-train models using large-scale web-curated data and are concerned about potential backdoor threats. Notably, our results suggest that simpler pre-training objectives are more amenable to effective backdoor removal. This insight is pivotal for practitioners seeking to balance the trade-offs between using stronger pre-training objectives and security against backdoor attacks.