Instructional Material
On discretisation drift and smoothness regularisation in neural network training
The deep learning recipe of casting real-world problems as mathematical optimisation and tackling the optimisation by training deep neural networks using gradient-based optimisation has undoubtedly proven to be a fruitful one. The understanding behind why deep learning works, however, has lagged behind its practical significance. We aim to make steps towards an improved understanding of deep learning with a focus on optimisation and model regularisation. We start by investigating gradient descent (GD), a discrete-time algorithm at the basis of most popular deep learning optimisation algorithms. Understanding the dynamics of GD has been hindered by the presence of discretisation drift, the numerical integration error between GD and its often studied continuous-time counterpart, the negative gradient flow (NGF). To add to the toolkit available to study GD, we derive novel continuous-time flows that account for discretisation drift. Unlike the NGF, these new flows can be used to describe learning rate specific behaviours of GD, such as training instabilities observed in supervised learning and two-player games. We then translate insights from continuous time into mitigation strategies for unstable GD dynamics, by constructing novel learning rate schedules and regularisers that do not require additional hyperparameters. Like optimisation, smoothness regularisation is another pillar of deep learning's success with wide use in supervised learning and generative modelling. Despite their individual significance, the interactions between smoothness regularisation and optimisation have yet to be explored. We find that smoothness regularisation affects optimisation across multiple deep learning domains, and that incorporating smoothness regularisation in reinforcement learning leads to a performance boost that can be recovered using adaptions to optimisation methods.
Boosting for Bounding the Worst-class Error
Saito, Yuya, Matsuo, Shinnosuke, Uchida, Seiichi, Suehiro, Daiki
This paper tackles the problem of the worstclass error rate, instead of the standard error rate averaged over all classes. For example, a three-class classification task with class-wise error rates of 10%, 10%, and 40% has a worst-class error rate of 40%, whereas the average is 20% under the class-balanced condition. The worst-class error is important in many applications. For example, in a medical image classification task, it would not be acceptable for the malignant tumor class to have a 40% error rate, while the benign and healthy classes have 10% error rates. We propose a boosting algorithm that guarantees an upper bound of the worst-class training error Figure 1: A toy example showing the average error and derive its generalization bound. Experimental minimization results in the case of a high worst-class results show that the algorithm lowers error. Note that all five classes have the same number worst-class test error rates while avoiding of instances, and thus, there is no class imbalance.
Conversation Chronicles: Towards Diverse Temporal and Relational Dynamics in Multi-Session Conversations
Jang, Jihyoung, Boo, Minseong, Kim, Hyounghun
In the field of natural language processing, open-domain chatbots have emerged as an important research topic. However, a major limitation of existing open-domain chatbot research is its singular focus on short single-session dialogue, neglecting the potential need for understanding contextual information in multiple consecutive sessions that precede an ongoing dialogue. Among the elements that compose the context in multi-session conversation settings, the time intervals between sessions and the relationships between speakers would be particularly important. Despite their importance, current research efforts have not sufficiently addressed these dialogical components. In this paper, we introduce a new 1M multi-session dialogue dataset, called Conversation Chronicles, for implementing a long-term conversation setup in which time intervals and fine-grained speaker relationships are incorporated. Following recent works, we exploit a large language model to produce the data. The extensive human evaluation shows that dialogue episodes in Conversation Chronicles reflect those properties while maintaining coherent and consistent interactions across all the sessions. We also propose a dialogue model, called ReBot, which consists of chronological summarization and dialogue generation modules using only around 630M parameters. When trained on Conversation Chronicles, ReBot demonstrates long-term context understanding with a high human engagement score.
IIFL: Implicit Interactive Fleet Learning from Heterogeneous Human Supervisors
Datta, Gaurav, Hoque, Ryan, Gu, Anrui, Solowjow, Eugen, Goldberg, Ken
Imitation learning has been applied to a range of robotic tasks, but can struggle when robots encounter edge cases that are not represented in the training data (i.e., distribution shift). Interactive fleet learning (IFL) mitigates distribution shift by allowing robots to access remote human supervisors during task execution and learn from them over time, but different supervisors may demonstrate the task in different ways. Recent work proposes Implicit Behavior Cloning (IBC), which is able to represent multimodal demonstrations using energy-based models (EBMs). In this work, we propose Implicit Interactive Fleet Learning (IIFL), an algorithm that builds on IBC for interactive imitation learning from multiple heterogeneous human supervisors. A key insight in IIFL is a novel approach for uncertainty quantification in EBMs using Jeffreys divergence. While IIFL is more computationally expensive than explicit methods, results suggest that IIFL achieves a 2.8x higher success rate in simulation experiments and a 4.5x higher return on human effort in a physical block pushing task over (Explicit) IFL, IBC, and other baselines.
PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training
Zhang, Yunyi, Jiang, Minhao, Meng, Yu, Zhang, Yu, Han, Jiawei
Weakly-supervised text classification trains a classifier using the label name of each target class as the only supervision, which largely reduces human annotation efforts. Most existing methods first use the label names as static keyword-based features to generate pseudo labels, which are then used for final classifier training. While reasonable, such a commonly adopted framework suffers from two limitations: (1) keywords can have different meanings in different contexts and some text may not have any keyword, so keyword matching can induce noisy and inadequate pseudo labels; (2) the errors made in the pseudo label generation stage will directly propagate to the classifier training stage without a chance of being corrected. In this paper, we propose a new method, PIEClass, consisting of two modules: (1) a pseudo label acquisition module that uses zero-shot prompting of pre-trained language models (PLM) to get pseudo labels based on contextualized text understanding beyond static keyword matching, and (2) a noise-robust iterative ensemble training module that iteratively trains classifiers and updates pseudo labels by utilizing two PLM fine-tuning methods that regularize each other. Extensive experiments show that PIEClass achieves overall better performance than existing strong baselines on seven benchmark datasets and even achieves similar performance to fully-supervised classifiers on sentiment classification tasks.
Get an extra $10 off AI training for a limited time
Artificial intelligence has taken the world by storm, with new applications seemingly every day. It's also making its way into our everyday lives, offering professionals new ways to automate tasks and streamline their workflows. If you're looking to save time, a basic AI education is a good idea. Fortunately, between 10/19 and 10/23, you can get The 2023 Ultimate Artificial Intelligence & Automation Developer Bundle for an extra $10 off at just $49.97. This 13-course bundle is beginner-friendly and will help you get familiar with a range of consumer-facing AI tools, like ChatGPT, Midjourney, and DALL-E, as well as gain technical skills in Solidity, robotics, Java, C, and more.
Voyager: An Open-Ended Embodied Agent with Large Language Models
Wang, Guanzhi, Xie, Yuqi, Jiang, Yunfan, Mandlekar, Ajay, Xiao, Chaowei, Zhu, Yuke, Fan, Linxi, Anandkumar, Anima
We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.
Reinforcement Learning and Bandits for Speech and Language Processing: Tutorial, Review and Outlook
As two cornerstones of modern day technologies, speech processing and natural language processing (NLP) are innately sequence learning problems to extract information from these linguistic or speech signals and provide insights into interactive systems to communicate in human understandable languages. The sequential and interactive nature of these problems can make them well-suited into the algorithmic framework of reinforcement learning (RL). In a reinforcement learning setting, an agent interacts with an environment through observations and actions, and based on the reward feedback attributed by the underlying reward function of this environment, the agent learns how to perform the task of interest through trials and errors. While the successful applications of reinforcement learning have been highlighted by a wide range of surveys in many real-world engineering domains such as robotics [1], vision [2], finance [3], healthcare [4], linguistics [5], and energy management [6], there have not been one for the rich community of both the speech and language domains. This is the first survey that emphasizes the synergy among the growing fields of the speech processing, natural language processing and the reinforcement learning. We aim to fill this gap by adopting a complete, timely and classical view of the reinforcement learning problems and their connections to speech and language processing.
Let's use AI to rethink education, instead of panicking about cheating
ON A Monday afternoon in May, a final-year student, fresh off the Texas A&M University-Commerce graduation stage, received a shocking email. "The final grade for the course is due today at 5 p.m.," it read. "I will be giving everyone in this course an… incomplete." According to a report in the Washington Post, agricultural sciences professor Jared Mumm had run his students' essays through the AI tool ChatGPT, which had detected its own use in the work – an offence that warranted a zero on the assignment.
Verification of the Socio-Technical Aspects of Voting: The Case of the Polish Postal Vote 2020
Jamroga, Wojciech, Ryan, Peter Y. A., Kim, Yan
Voting procedures are designed and implemented by people, for people, and with significant human involvement. Thus, one should take into account the human factors in order to comprehensively analyze properties of an election and detect threats. In particular, it is essential to assess how actions and strategies of the involved agents (voters, municipal office employees, mail clerks) can influence the outcome of other agents' actions as well as the overall outcome of the election. In this paper, we present our first attempt to capture those aspects in a formal multi-agent model of the Polish presidential election 2020. The election marked the first time when postal vote was universally available in Poland. Unfortunately, the voting scheme was prepared under time pressure and political pressure, and without the involvement of experts. This might have opened up possibilities for various kinds of ballot fraud, in-house coercion, etc. We propose a preliminary scalable model of the procedure in the form of a Multi-Agent Graph, and formalize selected integrity and security properties by formulas of agent logics. Then, we transform the models and formulas so that they can be input to the state-of-art model checker Uppaal. The first series of experiments demonstrates that verification scales rather badly due to the state-space explosion. However, we show that a recently developed technique of user-friendly model reduction by variable abstraction allows us to verify more complex scenarios.