Goto

Collaborating Authors

 tgl






Gradient Localization Improves Lifelong Pretraining of Language Models

Fernandez, Jared, Bisk, Yonatan, Strubell, Emma

arXiv.org Artificial Intelligence

Large Language Models (LLMs) trained on web-scale text corpora have been shown to capture world knowledge in their parameters. However, the mechanism by which language models store different types of knowledge is poorly understood. In this work, we examine two types of knowledge relating to temporally sensitive entities and demonstrate that each type is localized to different sets of parameters within the LLMs. We hypothesize that the lack of consideration of the locality of knowledge in existing continual learning methods contributes to both: the failed uptake of new information, and catastrophic forgetting of previously learned information. We observe that sequences containing references to updated and newly mentioned entities exhibit larger gradient norms in a subset of layers. We demonstrate that targeting parameter updates to these relevant layers can improve the performance of continually pretraining on language containing temporal drift.


Multi-Layer Feature Reduction for Tree Structured Group Lasso via Hierarchical Projection Jie Wang

Neural Information Processing Systems

Tree structured group Lasso (TGL) is a powerful technique in uncovering the tree structured sparsity over the features, where each node encodes a group of features. It has been applied successfully in many real-world applications. However, with extremely large feature dimensions, solving TGL remains a significant challenge due to its highly complicated regularizer. In this paper, we propose a novel Multi-Layer Feature reduction method (MLFre) to quickly identify the inactive nodes (the groups of features with zero coefficients in the solution) hierarchically in a top-down fashion, which are guaranteed to be irrelevant to the response. Thus, we can remove the detected nodes from the optimization without sacrificing accuracy. The major challenge in developing such testing rules is due to the overlaps between the parents and their children nodes. By a novel hierarchical projection algorithm, MLFre is able to test the nodes independently from any of their ancestor nodes. Moreover, we can integrate MLFre--that has a low computational cost--with any existing solvers. Experiments on both synthetic and real data sets demonstrate that the speedup gained by MLFre can be orders of magnitude.


LasTGL: An Industrial Framework for Large-Scale Temporal Graph Learning

Li, Jintang, Dan, Jiawang, Wu, Ruofan, Zhou, Jing, Tian, Sheng, Liu, Yunfei, Wang, Baokun, Meng, Changhua, Wang, Weiqiang, Zhu, Yuchang, Chen, Liang, Zheng, Zibin

arXiv.org Artificial Intelligence

Over the past few years, graph neural networks (GNNs) have become powerful and practical tools for learning on (static) graph-structure data. However, many real-world applications, such as social networks and e-commerce, involve temporal graphs where nodes and edges are dynamically evolving. Temporal graph neural networks (TGNNs) have progressively emerged as an extension of GNNs to address time-evolving graphs and have gradually become a trending research topic in both academics and industry. Advancing research and application in such an emerging field necessitates the development of new tools to compose TGNN models and unify their different schemes for dealing with temporal graphs. In this work, we introduce LasTGL, an industrial framework that integrates unified and extensible implementations of common temporal graph learning algorithms for various advanced tasks. The purpose of LasTGL is to provide the essential building blocks for solving temporal graph learning tasks, focusing on the guiding principles of user-friendliness and quick prototyping on which PyTorch is based. In particular, LasTGL provides comprehensive temporal graph datasets, TGNN models and utilities along with well-documented tutorials, making it suitable for both absolute beginners and expert deep learning practitioners alike.


Autonomous Control for Orographic Soaring of Fixed-Wing UAVs

Suys, Tom, Hwang, Sunyou, de Croon, Guido C. H. E., Remes, Bart D. W.

arXiv.org Artificial Intelligence

Abstract-- We present a novel controller for fixed-wing UAVs that enables autonomous soaring in an orographic wind field, extending flight endurance. Our method identifies soaring regions and addresses position control challenges by introducing a target gradient line (TGL) on which the UAV achieves an equilibrium soaring position, where sink rate and updraft are balanced. We also demonstrate a single degree of control freedom in a soaring position through manipulation of the TGL. I. INTRODUCTION UAVs have benefited from advancements in battery technology and miniaturization of avionics, which resulted in an increase in their endurance and range. However, the full potential of UAV applications remains limited by reduced flight time.


Unsupervised Text Generation by Learning from Search

Li, Jingjing, Li, Zichao, Mou, Lili, Jiang, Xin, Lyu, Michael R., King, Irwin

arXiv.org Artificial Intelligence

In this work, we present TGLS, a novel framework to unsupervised Text Generation by Learning from Search. We start by applying a strong search algorithm (in particular, simulated annealing) towards a heuristically defined objective that (roughly) estimates the quality of sentences. Then, a conditional generative model learns from the search results, and meanwhile smooth out the noise of search. The alternation between search and learning can be repeated for performance bootstrapping. We demonstrate the effectiveness of TGLS on two real-world natural language generation tasks, paraphrase generation and text formalization. Our model significantly outperforms unsupervised baseline methods in both tasks. Especially, it achieves comparable performance with the state-of-the-art supervised methods in paraphrase generation.


Multi-Layer Feature Reduction for Tree Structured Group Lasso via Hierarchical Projection

Wang, Jie, Ye, Jieping

Neural Information Processing Systems

Tree structured group Lasso (TGL) is a powerful technique in uncovering the tree structured sparsity over the features, where each node encodes a group of features. It has been applied successfully in many real-world applications. However, with extremely large feature dimensions, solving TGL remains a significant challenge due to its highly complicated regularizer. In this paper, we propose a novel Multi-Layer Feature reduction method (MLFre) to quickly identify the inactive nodes (the groups of features with zero coefficients in the solution) hierarchically in a top-down fashion, which are guaranteed to be irrelevant to the response. Thus, we can remove the detected nodes from the optimization without sacrificing accuracy. The major challenge in developing such testing rules is due to the overlaps between the parents and their children nodes. By a novel hierarchical projection algorithm, MLFre is able to test the nodes independently from any of their ancestor nodes. Moreover, we can integrate MLFre---that has a low computational cost---with any existing solvers. Experiments on both synthetic and real data sets demonstrate that the speedup gained by MLFre can be orders of magnitude.