Goto

Collaborating Authors

 Edmonton


Forging Multiple Training Objectives for Pre-trained Language Models via Meta-Learning

arXiv.org Artificial Intelligence

Multiple pre-training objectives fill the vacancy of the understanding capability of single-objective language modeling, which serves the ultimate purpose of pre-trained language models (PrLMs), generalizing well on a mass of scenarios. However, learning multiple training objectives in a single model is challenging due to the unknown relative significance as well as the potential contrariety between them. Empirical studies have shown that the current objective sampling in an ad-hoc manual setting makes the learned language representation barely converge to the desired optimum. Thus, we propose \textit{MOMETAS}, a novel adaptive sampler based on meta-learning, which learns the latent sampling pattern on arbitrary pre-training objectives. Such a design is lightweight with negligible additional training overhead. To validate our approach, we adopt five objectives and conduct continual pre-training with BERT-base and BERT-large models, where MOMETAS demonstrates universal performance gain over other rule-based sampling strategies on 14 natural language processing tasks.


Generative Adversarial Learning for Trusted and Secure Clustering in Industrial Wireless Sensor Networks

arXiv.org Artificial Intelligence

Traditional machine learning techniques have been widely used to establish the trust management systems. However, the scale of training dataset can significantly affect the security performances of the systems, while it is a great challenge to detect malicious nodes due to the absence of labeled data regarding novel attacks. To address this issue, this paper presents a generative adversarial network (GAN) based trust management mechanism for Industrial Wireless Sensor Networks (IWSNs). First, type-2 fuzzy logic is adopted to evaluate the reputation of sensor nodes while alleviating the uncertainty problem. Then, trust vectors are collected to train a GAN-based codec structure, which is used for further malicious node detection. Moreover, to avoid normal nodes being isolated from the network permanently due to error detections, a GAN-based trust redemption model is constructed to enhance the resilience of trust management. Based on the latest detection results, a trust model update method is developed to adapt to the dynamic industrial environment. The proposed trust management mechanism is finally applied to secure clustering for reliable and real-time data transmission, and simulation results show that it achieves a high detection rate up to 96%, as well as a low false positive rate below 8%.


Top 25 Women in AI: Canada Edition

#artificialintelligence

At RE•WORK, we are strong advocates for supporting women working towards advancing technology, so ahead of the upcoming Toronto AI Summit, on November 9-10, we set out to highlight inspirational women who are working at the forefront of AI developments, and who deserve recognition for their achievements. While we set out to create a list of just 20 – we couldn't narrow it down, as there are so many inspiring and prominent females in this space! Hear from many of them at our Toronto AI Summit, and more at our Women in AI Reception, both being held in Toronto next month. Help us to continue highlighting leading women in AI by nominating your influential woman for our next edition. RE•WORK holds Women in AI events, podcasts, and blogs. Get in touch if you'd like to collaborate or support our initiatives! Doina Precup is a researcher living in Montreal, Canada.


Graph Neural Networks for Low-Energy Event Classification & Reconstruction in IceCube

arXiv.org Artificial Intelligence

IceCube, a cubic-kilometer array of optical sensors built to detect atmospheric and astrophysical neutrinos between 1 GeV and 1 PeV, is deployed 1.45 km to 2.45 km below the surface of the ice sheet at the South Pole. The classification and reconstruction of events from the in-ice detectors play a central role in the analysis of data from IceCube. Reconstructing and classifying events is a challenge due to the irregular detector geometry, inhomogeneous scattering and absorption of light in the ice and, below 100 GeV, the relatively low number of signal photons produced per event. To address this challenge, it is possible to represent IceCube events as point cloud graphs and use a Graph Neural Network (GNN) as the classification and reconstruction method. The GNN is capable of distinguishing neutrino events from cosmic-ray backgrounds, classifying different neutrino event types, and reconstructing the deposited energy, direction and interaction vertex. Based on simulation, we provide a comparison in the 1-100 GeV energy range to the current state-of-the-art maximum likelihood techniques used in current IceCube analyses, including the effects of known systematic uncertainties. For neutrino event classification, the GNN increases the signal efficiency by 18% at a fixed false positive rate (FPR), compared to current IceCube methods. Alternatively, the GNN offers a reduction of the FPR by over a factor 8 (to below half a percent) at a fixed signal efficiency. For the reconstruction of energy, direction, and interaction vertex, the resolution improves by an average of 13%-20% compared to current maximum likelihood techniques in the energy range of 1-30 GeV. The GNN, when run on a GPU, is capable of processing IceCube events at a rate nearly double of the median IceCube trigger rate of 2.7 kHz, which opens the possibility of using low energy neutrinos in online searches for transient events.


MariusGNN: Resource-Efficient Out-of-Core Training of Graph Neural Networks

arXiv.org Artificial Intelligence

We study training of Graph Neural Networks (GNNs) for large-scale graphs. We revisit the premise of using distributed training for billion-scale graphs and show that for graphs that fit in main memory or the SSD of a single machine, out-of-core pipelined training with a single GPU can outperform state-of-the-art (SoTA) multi-GPU solutions. We introduce MariusGNN, the first system that utilizes the entire storage hierarchy -- including disk -- for GNN training. MariusGNN introduces a series of data organization and algorithmic contributions that 1) minimize the end-to-end time required for training and 2) ensure that models learned with disk-based training exhibit accuracy similar to those fully trained in memory. We evaluate MariusGNN against SoTA systems for learning GNN models and find that single-GPU training in MariusGNN achieves the same level of accuracy up to 8x faster than multi-GPU training in these systems, thus, introducing an order of magnitude monetary cost reduction. MariusGNN is open-sourced at www.marius-project.org.


Self-organizing nest migration dynamics synthesis for ant colony systems

arXiv.org Artificial Intelligence

In this study, we synthesize a novel dynamical approach for ant colonies enabling them to migrate to new nest sites in a self-organizing fashion. In other words, we realize ant colony migration as a self-organizing phenotype-level collective behavior. For this purpose, we first segment the edges of the graph of ants' pathways. Then, each segment, attributed to its own pheromone profile, may host an ant. So, multiple ants may occupy an edge at the same time. Thanks to this segment-wise edge formulation, ants have more selection options in the course of their pathway determination, thereby increasing the diversity of their colony's emergent behaviors. In light of the continuous pheromone dynamics of segments, each edge owns a spatio-temporal piece-wise continuous pheromone profile in which both deposit and evaporation processes are unified. The passive dynamics of the proposed migration mechanism is sufficiently rich so that an ant colony can migrate to the vicinity of a new nest site in a self-organizing manner without any external supervision. In particular, we perform extensive simulations to test our migration dynamics applied to a colony including 500 ants traversing a pathway graph comprising 200 nodes and 4000 edges which are segmented based on various resolutions. The obtained results exhibit the effectiveness of our strategy.


Hybrid Simulator for Space Docking and Robotic Proximity Operations

arXiv.org Artificial Intelligence

In this work, we present a hybrid simulator for space docking and robotic proximity operations methodology. This methodology also allows for the emulation of a target robot operating in a complex environment by using an actual robot. The emulation scheme aims to replicate the dynamic behavior of the target robot interacting with the environment, without dealing with a complex calculation of the contact dynamics. This method forms a basis for the task verification of a flexible space robot. The actual emulating robot is structurally rigid, while the target robot can represent any class of robots, e.g., flexible, redundant, or space robots. Although the emulating robot is not dynamically equivalent to the target robot, the dynamical similarity can be achieved by using a control law developed herein. The effect of disturbances and actuator dynamics on the fidelity and the contact stability of the robot emulation is thoroughly analyzed.


Robust Bayesian optimization with reinforcement learned acquisition functions

arXiv.org Artificial Intelligence

In Bayesian optimization (BO) for expensive black-box optimization tasks, acquisition function (AF) guides sequential sampling and plays a pivotal role for efficient convergence to better optima. Prevailing AFs usually rely on artificial experiences in terms of preferences for exploration or exploitation, which runs a risk of a computational waste or traps in local optima and resultant re-optimization. To address the crux, the idea of data-driven AF selection is proposed, and the sequential AF selection task is further formalized as a Markov decision process (MDP) and resort to powerful reinforcement learning (RL) technologies. Appropriate selection policy for AFs is learned from superior BO trajectories to balance between exploration and exploitation in real time, which is called reinforcement-learning-assisted Bayesian optimization (RLABO). Competitive and robust BO evaluations on five benchmark problems demonstrate RL's recognition of the implicit AF selection pattern and imply the proposal's potential practicality for intelligent AF selection as well as efficient optimization in expensive black-box problems.


Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals

arXiv.org Artificial Intelligence

Language models are increasingly attracting interest from writers. However, such models lack long-range semantic coherence, limiting their usefulness for longform creative writing. We address this limitation by applying language models hierarchically, in a system we call Dramatron. By building structural context via prompt chaining, Dramatron can generate coherent scripts and screenplays complete with title, characters, story beats, location descriptions, and dialogue. We illustrate Dramatron's usefulness as an interactive co-creative system with a user study of 15 theatre and film industry professionals. Participants co-wrote theatre scripts and screenplays with Dramatron and engaged in open-ended interviews. We report critical reflections both from our interviewees and from independent reviewers who watched stagings of the works to illustrate how both Dramatron and hierarchical text generation could be useful for human-machine co-creativity. Finally, we discuss the suitability of Dramatron for co-creativity, ethical considerations -- including plagiarism and bias -- and participatory models for the design and deployment of such tools.


Toward Discovering Options that Achieve Faster Planning

arXiv.org Artificial Intelligence

We propose a new objective for option discovery that emphasizes the computational advantage of using options in planning. In a sequential machine, the speed of planning is proportional to the number of elementary operations used to achieve a good policy. For episodic tasks, the number of elementary operations depends on the number of options composed by the policy in an episode and the number of options being considered at each decision point. To reduce the amount of computation in planning, for a given set of episodic tasks and a given number of options, our objective prefers options with which it is possible to achieve a high return by composing few options, and also prefers a smaller set of options to choose from at each decision point. We develop an algorithm that optimizes the proposed objective. In a variant of the classic four-room domain, we show that 1) a higher objective value is typically associated with fewer number of elementary planning operations used by the option-value iteration algorithm to obtain a near-optimal value function, 2) our algorithm achieves an objective value that matches it achieved by two human-designed options 3) the amount of computation used by option-value iteration with options discovered by our algorithm matches it with the human-designed options, 4) the options produced by our algorithm also make intuitive sense--they seem to move to and terminate at the entrances of rooms.