Goto

Collaborating Authors

 pmc





Model Collapse Is Not a Bug but a Feature in Machine Unlearning for LLMs

arXiv.org Artificial Intelligence

Current unlearning methods for LLMs optimize on the private information they seek to remove by incorporating it into their fine-tuning data. We argue this not only risks reinforcing exposure to sensitive data, it also fundamentally contradicts the principle of minimizing its use. As a remedy, we propose a novel unlearning method-Partial Model Collapse (PMC), which does not require unlearning targets in the unlearning objective. Our approach is inspired by recent observations that training generative models on their own generations leads to distribution collapse, effectively removing information from model outputs. Our central insight is that model collapse can be leveraged for machine unlearning by deliberately triggering it for data we aim to remove. We theoretically analyze that our approach converges to the desired outcome, i.e. the model unlearns the data targeted for removal. We empirically demonstrate that PMC overcomes three key limitations of existing unlearning methods that explicitly optimize on unlearning targets, and more effectively removes private information from model outputs while preserving general model utility. Overall, our contributions represent an important step toward more comprehensive unlearning that aligns with real-world privacy constraints. Code available at https://www.cs.cit.tum.de/daml/partial-model-collapse/.


Highly Fast Text Segmentation With Pairwise Markov Chains

arXiv.org Artificial Intelligence

Natural Language Processing (NLP) models' current trend consists of using increasingly more extra-data to build the best models as possible. It implies more expensive computational costs and training time, difficulties for deployment, and worries about these models' carbon footprint reveal a critical problem in the future. Against this trend, our goal is to develop NLP models requiring no extra-data and minimizing training time. To do so, in this paper, we explore Markov chain models, Hidden Markov Chain (HMC) and Pairwise Markov Chain (PMC), for NLP segmentation tasks. We apply these models for three classic applications: POS Tagging, Named-Entity-Recognition, and Chunking. We develop an original method to adapt these models for text segmentation's specific challenges to obtain relevant performances with very short training and execution times. PMC achieves equivalent results to those obtained by Conditional Random Fields (CRF), one of the most applied models for these tasks when no extra-data are used. Moreover, PMC has training times 30 times shorter than the CRF ones, which validates this model given our objectives.


Analyzing Value Functions of States in Parametric Markov Chains

arXiv.org Artificial Intelligence

Parametric Markov chains (pMC) are used to model probabilistic systems with unknown or partially known probabilities. Although (universal) pMC verification for reachability properties is known to be coETR-complete, there have been efforts to approach it using potentially easier-to-check properties such as asking whether the pMC is monotonic in certain parameters. In this paper, we first reduce monotonicity to asking whether the reachability probability from a given state is never less than that of another given state. Recent results for the latter property imply an efficient algorithm to collapse same-value equivalence classes, which in turn preserves verification results and monotonicity. We implement our algorithm to collapse "trivial" equivalence classes in the pMC and show empirical evidence for the following: First, the collapse gives reductions in size for some existing benchmarks and significant reductions on some custom benchmarks; Second, the collapse speeds up existing algorithms to check monotonicity and parameter lifting, and hence can be used as a fast pre-processing step in practice.


Pairwise Markov Chains for Volatility Forecasting

arXiv.org Machine Learning

The Pairwise Markov Chain (PMC) is a probabilistic graphical model extending the well-known Hidden Markov Model. This model, although highly effective for many tasks, has been scarcely utilized for continuous value prediction. This is mainly due to the issue of modeling observations inherent in generative probabilistic models. In this paper, we introduce a new algorithm for prediction with the PMC. On the one hand, this algorithm allows circumventing the feature problem, thus fully exploiting the capabilities of the PMC. On the other hand, it enables the PMC to extend any predictive model by introducing hidden states, updated at each time step, and allowing the introduction of non-stationarity for any model. We apply the PMC with its new algorithm for volatility forecasting, which we compare to the highly popular GARCH(1,1) and feedforward neural models across numerous pairs. This is particularly relevant given the regime changes that we can observe in volatility. For each scenario, our algorithm enhances the performance of the extended model, demonstrating the value of our approach.


Adversarial Search and Tracking with Multiagent Reinforcement Learning in Sparsely Observable Environment

arXiv.org Artificial Intelligence

We study a search and tracking (S&T) problem where a team of dynamic search agents must collaborate to track an adversarial, evasive agent. The heterogeneous search team may only have access to a limited number of past adversary trajectories within a large search space. This problem is challenging for both model-based searching and reinforcement learning (RL) methods since the adversary exhibits reactionary and deceptive evasive behaviors in a large space leading to sparse detections for the search agents. To address this challenge, we propose a novel Multi-Agent RL (MARL) framework that leverages the estimated adversary location from our learnable filtering model. We show that our MARL architecture can outperform all baselines and achieves a 46% increase in detection rate.


Fair admission risk prediction with proportional multicalibration

arXiv.org Artificial Intelligence

Fair calibration is a widely desirable fairness criteria in risk prediction contexts. One way to measure and achieve fair calibration is with multicalibration. Multicalibration constrains calibration error among flexibly-defined subpopulations while maintaining overall calibration. However, multicalibrated models can exhibit a higher percent calibration error among groups with lower base rates than groups with higher base rates. As a result, it is possible for a decision-maker to learn to trust or distrust model predictions for specific groups. To alleviate this, we propose \emph{proportional multicalibration}, a criteria that constrains the percent calibration error among groups and within prediction bins. We prove that satisfying proportional multicalibration bounds a model's multicalibration as well its \emph{differential calibration}, a fairness criteria that directly measures how closely a model approximates sufficiency. Therefore, proportionally calibrated models limit the ability of decision makers to distinguish between model performance on different patient groups, which may make the models more trustworthy in practice. We provide an efficient algorithm for post-processing risk prediction models for proportional multicalibration and evaluate it empirically. We conduct simulation studies and investigate a real-world application of PMC-postprocessing to prediction of emergency department patient admissions. We observe that proportional multicalibration is a promising criteria for controlling simultaneous measures of calibration fairness of a model over intersectional groups with virtually no cost in terms of classification performance.


Solving Projected Model Counting by Utilizing Treewidth and its Limits

arXiv.org Artificial Intelligence

In this paper, we introduce a novel algorithm to solve projected model counting (PMC). PMC asks to count solutions of a Boolean formula with respect to a given set of projection variables, where multiple solutions that are identical when restricted to the projection variables count as only one solution. Inspired by the observation that the so-called "treewidth" is one of the most prominent structural parameters, our algorithm utilizes small treewidth of the primal graph of the input instance. More precisely, it runs in time O(2^2k+4n2) where k is the treewidth and n is the input size of the instance. In other words, we obtain that the problem PMC is fixed-parameter tractable when parameterized by treewidth. Further, we take the exponential time hypothesis (ETH) into consideration and establish lower bounds of bounded treewidth algorithms for PMC, yielding asymptotically tight runtime bounds of our algorithm. While the algorithm above serves as a first theoretical upper bound and although it might be quite appealing for small values of k, unsurprisingly a naive implementation adhering to this runtime bound suffers already from instances of relatively small width. Therefore, we turn our attention to several measures in order to resolve this issue towards exploiting treewidth in practice: We present a technique called nested dynamic programming, where different levels of abstractions of the primal graph are used to (recursively) compute and refine tree decompositions of a given instance. Finally, we provide a nested dynamic programming algorithm and an implementation that relies on database technology for PMC and a prominent special case of PMC, namely model counting (#Sat). Experiments indicate that the advancements are promising, allowing us to solve instances of treewidth upper bounds beyond 200.