Bazinet, Mathieu
Sample Compression for Continual Learning
Comeau, Jacob, Bazinet, Mathieu, Germain, Pascal, Subakan, Cem
Continual learning algorithms aim to learn from a sequence of tasks, making the training distribution non-stationary. The majority of existing continual learning approaches in the literature rely on heuristics and do not provide learning guarantees for the continual learning setup. In this paper, we present a new method called 'Continual Pick-to-Learn' (CoP2L), which is able to retain the most representative samples for each task in an efficient way. The algorithm is adapted from the Pick-to-Learn algorithm, rooted in the sample compression theory. This allows us to provide high-confidence upper bounds on the generalization loss of the learned predictors, numerically computable after every update of the learned model. We also empirically show on several standard continual learning benchmarks that our algorithm is able to outperform standard experience replay, significantly mitigating catastrophic forgetting.
Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses
Bazinet, Mathieu, Zantedeschi, Valentina, Germain, Pascal
The sample compression theory provides The sample compression theory is rich and multiple generalization guarantees for predictors that different approaches exist. For example, Attias et al. can be fully defined using a subset of the (2018); Ben-David et al. (2024); David et al. (2016); training dataset and a (short) message string, Floyd and Warmuth (1995); Hanneke and Kontorovich generally defined as a binary sequence. Previous (2021); Hanneke et al. (2018, 2019, 2024); Moran and works provided generalization bounds for Yehudayoff (2016); Rubinstein and Rubinstein (2012) the zero-one loss, which is restrictive notably propose theoretical results relating the VC dimension when applied to deep learning approaches. In (Vapnik and Chervonenkis, 1971) and the compression this paper, we present a general framework for analysis. By relating the probability of change of deriving new sample compression bounds that compression to the true risk, Campi and Garatti (2023); hold for real-valued unbounded losses. Using Paccagnan et al. (2024) express very tight guarantees for the Pick-To-Learn (P2L) meta-algorithm, the consistent case, i.e., when the error on the training which transforms the training method of set is zero. Finally, Laviolette et al. (2005); Marchand any machine-learning predictor to yield et al. (2003); Marchand and Shawe-Taylor (2002); Marchand sample-compressed predictors, we empirically and Sokolova (2005); Shah (2007) give computable demonstrate the tightness of the bounds and risk certificates valid even in the non-consistent case.
Sample Compression Hypernetworks: From Generalization Bounds to Meta-Learning
Leblanc, Benjamin, Bazinet, Mathieu, D'Amours, Nathaniel, Drouin, Alexandre, Germain, Pascal
Reconstruction functions are pivotal in sample compression theory, a framework for deriving tight generalization bounds. From a small sample of the training set (the compression set) and an optional stream of information (the message), they recover a predictor previously learned from the whole training set. While usually fixed, we propose to learn reconstruction functions. To facilitate the optimization and increase the expressiveness of the message, we derive a new sample compression generalization bound for real-valued messages. From this theoretical analysis, we then present a new hypernetwork architecture that outputs predictors with tight generalization guarantees when trained using an original meta-learning framework. The results of promising preliminary experiments are then reported.