Goto

Collaborating Authors

 Antarctica


MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

arXiv.org Artificial Intelligence

We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed to probe a LM's ability to keep track of multiple fact updates, a MemoryPrompt-augmented LM outperforms much larger LMs that have access to the full input history. We also test MemoryPrompt on a long-distance dialogue dataset, where its performance is comparable to that of a model conditioned on the entire conversation history. In both experiments we also observe that, unlike full-finetuning approaches, MemoryPrompt does not suffer from catastrophic forgetting when adapted to new tasks, thus not disrupting the generalist capabilities of the underlying LM.


T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching

arXiv.org Artificial Intelligence

Sampling from diffusion probabilistic models (DPMs) is often expensive for high-quality image generation and typically requires many steps with a large model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation. Instead of solely using a large DPM for the entire sampling trajectory, T-Stitch first leverages a smaller DPM in the initial steps as a cheap drop-in replacement of the larger DPM and switches to the larger DPM at a later stage. Our key insight is that different diffusion models learn similar encodings under the same training data distribution and smaller models are capable of generating good global structures in the early steps. Extensive experiments demonstrate that T-Stitch is training-free, generally applicable for different architectures, and complements most existing fast sampling techniques with flexible speed and quality trade-offs. On DiT-XL, for example, 40% of the early timesteps can be safely replaced with a 10x faster DiT-S without performance drop on class-conditional ImageNet generation. We further show that our method can also be used as a drop-in technique to not only accelerate the popular pretrained stable diffusion (SD) models but also improve the prompt alignment of stylized SD models from the public model zoo. Code is released at https://github.com/NVlabs/T-Stitch


A Survey on Fairness for Machine Learning on Graphs

arXiv.org Artificial Intelligence

Nowadays, the analysis of complex phenomena modeled by graphs plays a crucial role in many real-world application domains where decisions can have a strong societal impact. However, numerous studies and papers have recently revealed that machine learning models could lead to potential disparate treatment between individuals and unfair outcomes. In that context, algorithmic contributions for graph mining are not spared by the problem of fairness and present some specific challenges related to the intrinsic nature of graphs: (1) graph data is non-IID, and this assumption may invalidate many existing studies in fair machine learning, (2) suited metric definitions to assess the different types of fairness with relational data and (3) algorithmic challenge on the difficulty of finding a good trade-off between model accuracy and fairness. This survey is the first one dedicated to fairness for relational data. It aims to present a comprehensive review of state-of-the-art techniques in fairness on graph mining and identify the open challenges and future trends. In particular, we start by presenting several sensible application domains and the associated graph mining tasks with a focus on edge prediction and node classification in the sequel. We also recall the different metrics proposed to evaluate potential bias at different levels of the graph mining process; then we provide a comprehensive overview of recent contributions in the domain of fair machine learning for graphs, that we classify into pre-processing, in-processing and post-processing models. We also propose to describe existing graph data, synthetic and real-world benchmarks. Finally, we present in detail five potential promising directions to advance research in studying algorithmic fairness on graphs.


Quantitative causality, causality-guided scientific discovery, and causal machine learning

arXiv.org Artificial Intelligence

It has been said, arguably, that causality analysis should pave a promising way to interpretable deep learning and generalization. Incorporation of causality into artificial intelligence (AI) algorithms, however, is challenged with its vagueness, non-quantitiveness, computational inefficiency, etc. During the past 18 years, these challenges have been essentially resolved, with the establishment of a rigorous formalism of causality analysis initially motivated from atmospheric predictability. This not only opens a new field in the atmosphere-ocean science, namely, information flow, but also has led to scientific discoveries in other disciplines, such as quantum mechanics, neuroscience, financial economics, etc., through various applications. This note provides a brief review of the decade-long effort, including a list of major theoretical results, a sketch of the causal deep learning framework, and some representative real-world applications in geoscience pertaining to this journal, such as those on the anthropogenic cause of global warming, the decadal prediction of El Niño Modoki, the forecasting of an extreme drought in China, among others. Keywords: Causality, Liang-Kleeman information flow, Causal artificial intelligence, Fuzzy cognitive map, Interpretability, Frobenius-Perron operator, Weather/Climate forecasting 1. Introduction Causality analysis is a fundamental problem in scientific research, as commented by Einstein in 1953 in response to a question on the status quo of science in China at that time (cf. the historical record in Hu, 2005).The recent rush in artificial intelligence (AI) has stimulated enormous interest in causal inference, partly due to the realization that it may take the field to the next level to approach human intelligence (see Pearl, 2018; Bengio, 2019; Schölkopf, 2022). In the fields pertaining to this journal, assessment of the cause-effect relations between dynamic events makes a natural objective for the corresponding researches.


Creating a Fine Grained Entity Type Taxonomy Using LLMs

arXiv.org Artificial Intelligence

In this study, we investigate the potential of GPT-4 and its advanced iteration, GPT-4 Turbo, in autonomously developing a detailed entity type taxonomy. Our objective is to construct a comprehensive taxonomy, starting from a broad classification of entity types - including objects, time, locations, organizations, events, actions, and subjects - similar to existing manually curated taxonomies. This classification is then progressively refined through iterative prompting techniques, leveraging GPT-4's internal knowledge base. The result is an extensive taxonomy comprising over 5000 nuanced entity types, which demonstrates remarkable quality upon subjective evaluation. We employed a straightforward yet effective prompting strategy, enabling the taxonomy to be dynamically expanded. The practical applications of this detailed taxonomy are diverse and significant. It facilitates the creation of new, more intricate branches through pattern-based combinations and notably enhances information extraction tasks, such as relation extraction and event argument extraction. Our methodology not only introduces an innovative approach to taxonomy creation but also opens new avenues for applying such taxonomies in various computational linguistics and AI-related fields.


Graph-based Virtual Sensing from Sparse and Partial Multivariate Observations

arXiv.org Artificial Intelligence

Virtual sensing techniques allow for inferring signals at new unmonitored locations by exploiting spatio-temporal measurements coming from physical sensors at different locations. However, as the sensor coverage becomes sparse due to costs or other constraints, physical proximity cannot be used to support interpolation. In this paper, we overcome this challenge by leveraging dependencies between the target variable and a set of correlated variables (covariates) that can frequently be associated with each location of interest. From this viewpoint, covariates provide partial observability, and the problem consists of inferring values for unobserved channels by exploiting observations at other locations to learn how such variables can correlate. We introduce a novel graph-based methodology to exploit such relationships and design a graph deep learning architecture, named GgNet, implementing the framework. The proposed approach relies on propagating information over a nested graph structure that is used to learn dependencies between variables as well as locations. GgNet is extensively evaluated under different virtual sensing scenarios, demonstrating higher reconstruction accuracy compared to the state-of-the-art.


Knowledge Editing on Black-box Large Language Models

arXiv.org Artificial Intelligence

Knowledge editing (KE) aims to efficiently and precisely modify the behavior of large language models (LLMs) to update specific knowledge without negatively influencing other knowledge. Current research primarily focuses on white-box LLMs editing, overlooking an important scenario: black-box LLMs editing, where LLMs are accessed through interfaces and only textual output is available. In this paper, we first officially introduce KE on black-box LLMs and then propose a comprehensive evaluation framework to overcome the limitations of existing evaluations that are not applicable to black-box LLMs editing and lack comprehensiveness. To tackle privacy leaks of editing data and style over-editing in current methods, we introduce a novel postEdit framework, resolving privacy concerns through downstream post-processing and maintaining textual style consistency via fine-grained editing to original responses. Experiments and analysis on two benchmarks demonstrate that postEdit outperforms all baselines and achieves strong generalization, especially with huge improvements on style retention (average $+20.82\%\uparrow$).


Energy-aware Multi-UAV Coverage Mission Planning with Optimal Speed of Flight

arXiv.org Artificial Intelligence

This paper tackles the problem of planning minimum-energy coverage paths for multiple UAVs. The addressed Multi-UAV Coverage Path Planning (mCPP) is a crucial problem for many UAV applications such as inspection and aerial survey. However, the typical path-length objective of existing approaches does not directly minimize the energy consumption, nor allows for constraining energy of individual paths by the battery capacity. To this end, we propose a novel mCPP method that uses the optimal flight speed for minimizing energy consumption per traveled distance and a simple yet precise energy consumption estimation algorithm that is utilized during the mCPP planning phase. The method decomposes a given area with boustrophedon decomposition and represents the mCPP as an instance of Multiple Set Traveling Salesman Problem with a minimum energy objective and energy consumption constraint. The proposed method is shown to outperform state-of-the-art methods in terms of computational time and energy efficiency of produced paths. The experimental results show that the accuracy of the energy consumption estimation is on average 97% compared to real flight consumption. The feasibility of the proposed method was verified in a real-world coverage experiment with two UAVs.


Physics-informed machine learning as a kernel method

arXiv.org Artificial Intelligence

Physics-informed machine learning combines the expressiveness of data-based approaches with the interpretability of physical models. In this context, we consider a general regression problem where the empirical risk is regularized by a partial differential equation that quantifies the physical inconsistency. We prove that for linear differential priors, the problem can be formulated as a kernel regression task. Taking advantage of kernel theory, we derive convergence rates for the minimizer of the regularized risk and show that it converges at least at the Sobolev minimax rate. However, faster rates can be achieved, depending on the physical error. This principle is illustrated with a one-dimensional example, supporting the claim that regularizing the empirical risk with physical information can be beneficial to the statistical performance of estimators.


Graph Neural Networks as Fast and High-fidelity Emulators for Finite-Element Ice Sheet Modeling

arXiv.org Artificial Intelligence

Although the finite element approach of the Ice-sheet and Sea-level System Model (ISSM) solves ice dynamics problems governed by Stokes equations quickly and accurately, such numerical modeling requires intensive computation on central processing units (CPU). In this study, we develop graph neural networks (GNN) as fast surrogate models to preserve the finite element structure of ISSM. Using the 20-year transient simulations in the Pine Island Glacier (PIG), we train and test three GNNs: graph convolutional network (GCN), graph attention network (GAT), and equivariant graph convolutional network (EGCN). These GNNs reproduce ice thickness and velocity with better accuracy than the classic convolutional neural network (CNN) and multi-layer perception (MLP). In particular, GNNs successfully capture the ice mass loss and acceleration induced by higher basal melting rates in the PIG. When our GNN emulators are implemented on graphic processing units (GPUs), they show up to 50 times faster computational time than the CPU-based ISSM simulation.