Goto

Collaborating Authors

 usr


LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples

arXiv.org Artificial Intelligence

Large language models (LLMs) possess vast knowledge acquired from extensive training corpora, but they often cannot remove specific pieces of information when needed, which makes it hard to handle privacy, bias mitigation, and knowledge correction. Traditional model unlearning approaches require computationally expensive fine-tuning or direct weight editing, making them impractical for real-world deployment. In this work, we introduce LoRA-based Unlearning with Negative Examples (LUNE), a lightweight framework that performs negative-only unlearning by updating only low-rank adapters while freezing the backbone, thereby localizing edits and avoiding disruptive global changes. Leveraging Low-Rank Adaptation (LoRA), LUNE targets intermediate representations to suppress (or replace) requested knowledge with an order-of-magnitude lower compute and memory than full fine-tuning or direct weight editing. Extensive experiments on multiple factual unlearning tasks show that LUNE: (I) achieves effectiveness comparable to full fine-tuning and memory-editing methods, and (II) reduces computational cost by about an order of magnitude.


Optimal Sequential Recommendations: Exploiting User and Item Structure

arXiv.org Machine Learning

Given the importance of these recommendation algorithms, it makes sense to try to design optimal ones. A basic criterion for optimality, that captures the first-order experience of users in a recommendation system, is to maximize the proportion of recommendations that are liked, 1 similar to [11, 23] The goal of this paper is to gain insight into the design of recommendation algorithms by finding a statistically optimal algorithm within the context of a natural model for recommendation systems. One of our findings is that the best way to obtain information about users and items in order to make good recommendations depends on the time horizon and its relation to various system parameters including the number of users, the diversity of users, and richness of the items; there are a number of operating regimes depending on these parameters. It goes without saying that the nature of any insight obtained is intertwined with the choice of model. We use the same model as [11], closely related to those studied in [10, 12]. The model is different from those in other papers on the topic; we now motivate its key features.


STARS: Sensor-agnostic Transformer Architecture for Remote Sensing

arXiv.org Artificial Intelligence

We present a sensor-agnostic spectral transformer as the basis for spectral foundation models. To that end, we introduce a Universal Spectral Representation (USR) that leverages sensor meta-data, such as sensing kernel specifications and sensing wavelengths, to encode spectra obtained from any spectral instrument into a common representation, such that a single model can ingest data from any sensor. Furthermore, we develop a methodology for pre-training such models in a self-supervised manner using a novel random sensor-augmentation and reconstruction pipeline to learn spectral features independent of the sensing paradigm. We demonstrate that our architecture can learn sensor independent spectral features that generalize effectively to sensors not seen during training. This work sets the stage for training foundation models that can both leverage and be effective for the growing diversity of spectral data.


SurroFlow: A Flow-Based Surrogate Model for Parameter Space Exploration and Uncertainty Quantification

arXiv.org Artificial Intelligence

Existing deep learning-based surrogate models facilitate efficient data generation, but fall short in uncertainty quantification, efficient parameter space exploration, and reverse prediction. In our work, we introduce SurroFlow, a novel normalizing flow-based surrogate model, to learn the invertible transformation between simulation parameters and simulation outputs. The model not only allows accurate predictions of simulation outcomes for a given simulation parameter but also supports uncertainty quantification in the data generation process. Additionally, it enables efficient simulation parameter recommendation and exploration. We integrate SurroFlow and a genetic algorithm as the backend of a visual interface to support effective user-guided ensemble simulation exploration and visualization. Our framework significantly reduces the computational costs while enhancing the reliability and exploration capabilities of scientific surrogate models.


Domain Private Transformers for Multi-Domain Dialog Systems

arXiv.org Artificial Intelligence

Large, general purpose language models have demonstrated impressive performance across many different conversational domains. While multi-domain language models achieve low overall perplexity, their outputs are not guaranteed to stay within the domain of a given input prompt. This paper proposes domain privacy as a novel way to quantify how likely a conditional language model will leak across domains. We also develop policy functions based on token-level domain classification, and propose an efficient fine-tuning method to improve the trained model's domain privacy. Experiments on membership inference attacks show that our proposed method has comparable resiliency to methods adapted from recent literature on differentially private language models.


'What are you referring to?' Evaluating the Ability of Multi-Modal Dialogue Models to Process Clarificational Exchanges

arXiv.org Artificial Intelligence

Referential ambiguities arise in dialogue when a referring expression does not uniquely identify the intended referent for the addressee. Addressees usually detect such ambiguities immediately and work with the speaker to repair it using meta-communicative, Clarificational Exchanges (CE): a Clarification Request (CR) and a response. Here, we argue that the ability to generate and respond to CRs imposes specific constraints on the architecture and objective functions of multi-modal, visually grounded dialogue models. We use the SIMMC 2.0 dataset to evaluate the ability of different state-of-the-art model architectures to process CEs, with a metric that probes the contextual updates that arise from them in the model. We find that language-based models are able to encode simple multi-modal semantic information and process some CEs, excelling with those related to the dialogue history, whilst multi-modal models can use additional learning objectives to obtain disentangled object representations, which become crucial to handle complex referential ambiguities across modalities overall.


Revisiting the DARPA Communicator Data using Conversation Analysis

arXiv.org Artificial Intelligence

The state of the art in human computer conversation leaves something to be desired and, indeed, talking to a computer can be down-right annoying. This paper describes an approach to identifying ``opportunities for improvement'' in these systems by looking for abuse in the form of swear words. The premise is that humans swear at computers as a sanction and, as such, swear words represent a point of failure where the system did not behave as it should. Having identified where things went wrong, we can work backward through the transcripts and, using conversation analysis (CA) work out how things went wrong. Conversation analysis is a qualitative methodology and can appear quite alien - indeed unscientific - to those of us from a quantitative background. The paper starts with a description of Conversation analysis in its modern form, and then goes on to apply the methodology to transcripts of frustrated and annoyed users in the DARPA Communicator project. The conclusion is that there is at least one species of failure caused by the inability of the Communicator systems to handle mixed initiative at the discourse structure level. Along the way, I hope to demonstrate that there is an alternative future for computational linguistics that does not rely on larger and larger text corpora.


Refine and Imitate: Reducing Repetition and Inconsistency in Persuasion Dialogues via Reinforcement Learning and Human Demonstration

arXiv.org Artificial Intelligence

Despite the recent success of large-scale language models on various downstream NLP tasks, the repetition and inconsistency problems still persist in dialogue response generation. Previous approaches have attempted to avoid repetition by penalizing the language model's undesirable behaviors in the loss function. However, these methods focus on token-level information and can lead to incoherent responses and uninterpretable behaviors. To alleviate these issues, we propose to apply reinforcement learning to refine an MLE-based language model without user simulators, and distill sentence-level information about repetition, inconsistency and task relevance through rewards. In addition, to better accomplish the dialogue task, the model learns from human demonstration to imitate intellectual activities such as persuasion, and selects the most persuasive responses. Experiments show that our model outperforms previous state-of-the-art dialogue models on both automatic metrics and human evaluation results on a donation persuasion task, and generates more diverse, consistent and persuasive conversations according to the user feedback.


Resource Constrained Dialog Policy Learning via Differentiable Inductive Logic Programming

arXiv.org Artificial Intelligence

Motivated by the needs of resource constrained dialog policy learning, we introduce dialog policy via differentiable inductive logic (DILOG). We explore the tasks of one-shot learning and zero-shot domain transfer with DILOG on SimDial and MultiWoZ. Using a single representative dialog from the restaurant domain, we train DILOG on the SimDial dataset and obtain 99 % in-domain test accuracy. We also show that the trained DILOG zero-shot transfers to all other domains with 99 % accuracy, proving the suitability of DILOG to slot-filling dialogs. We further extend our study to the MultiWoZ dataset achieving 90 % inform and success metrics. We also observe that these metrics are not capturing some of the shortcomings of DILOG in terms of false positives, prompting us to measure an auxiliary Action F1 score. We show that DILOG is 100x more data efficient than state-of-the-art neural approaches on MultiWoZ while achieving similar performance metrics. We conclude with a discussion on the strengths and weaknesses of DILOG.


Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

arXiv.org Artificial Intelligence

Designing task-oriented dialogue systems is a challenging research topic, since it needs not only to generate utterances fulfilling user requests but also to guarantee the comprehensibility. Many previous works trained end-to-end (E2E) models with supervised learning (SL), however, the bias in annotated system utterances remains as a bottleneck. Reinforcement learning (RL) deals with the problem through using non-differentiable evaluation metrics (e.g., the success rate) as rewards. Nonetheless, existing works with RL showed that the comprehensibility of generated system utterances could be corrupted when improving the performance on fulfilling user requests. In our work, we (1) propose modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO; (2) train HDNO with hierarchical reinforcement learning (HRL), as well as suggest alternating updates between dialogue policy and NLG during HRL inspired by fictitious play, to preserve the comprehensibility of generated system utterances while improving fulfilling user requests; and (3) propose using a discriminator modelled with language models as an additional reward to further improve the comprehensibility. We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA, showing a significant improvement on the total performance evaluated with automatic metrics.