Moroshko, Edward
CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models
Xue, Yuyang, Moroshko, Edward, Chen, Feng, McDonagh, Steven, Tsaftaris, Sotirios A.
Text-to-Image diffusion models can produce undesirable content that necessitates concept erasure techniques. However, existing methods struggle with under-erasure, leaving residual traces of targeted concepts, or over-erasure, mistakenly eliminating unrelated but visually similar concepts. To address these limitations, we introduce CRCE, a novel concept erasure framework that leverages Large Language Models to identify both semantically related concepts that should be erased alongside the target and distinct concepts that should be preserved. By explicitly modeling coreferential and retained concepts semantically, CRCE enables more precise concept removal, without unintended erasure. Experiments demonstrate that CRCE outperforms existing methods on diverse erasure tasks.
Continual Learning in Linear Classification on Separable Data
Evron, Itay, Moroshko, Edward, Buzaglo, Gon, Khriesh, Maroun, Marjieh, Badea, Srebro, Nathan, Soudry, Daniel
We theoretically study the continual learning of a linear classification model on separable data with binary classes. We analyze continual learning on a sequence Even though this is a fundamental setup to consider, there of separable linear classification tasks with binary are still very few analytic results on it, since most of the labels. We show theoretically that learning continual learning theory thus far has focused on regression with weak regularization reduces to solving settings (e.g., Bennani et al. (2020); Doan et al. (2021); a sequential max-margin problem, corresponding Asanuma et al. (2021); Lee et al. (2021); Evron et al. (2022); to a special case of the Projection Onto Convex Goldfarb & Hand (2023); Li et al. (2023)).
Implicit Bias in Deep Linear Classification: Initialization Scale vs Training Accuracy
Moroshko, Edward, Gunasekar, Suriya, Woodworth, Blake, Lee, Jason D., Srebro, Nathan, Soudry, Daniel
We provide a detailed asymptotic study of gradient flow trajectories and their implicit optimization bias when minimizing the exponential loss over "diagonal linear networks". This is the simplest model displaying a transition between "kernel" and non-kernel ("rich" or "active") regimes. We show how the transition is controlled by the relationship between the initialization scale and how accurately we minimize the training loss. Our results indicate that some limit behaviors of gradient descent only kick in at ridiculous training accuracies (well beyond $10^{-100}$). Moreover, the implicit bias at reasonable initialization scales and training accuracies is more complex and not captured by these limits.
Variance Estimation For Online Regression via Spectrum Thresholding
Kozdoba, Mark, Moroshko, Edward, Mannor, Shie, Crammer, Koby
We consider the online linear regression problem, where the predictor vector may vary with time. This problem can be modelled as a linear dynamical system, where the parameters that need to be learned are the variance of both the process noise and the observation noise. The classical approach to learning the variance is via the maximum likelihood estimator -- a non-convex optimization problem prone to local minima and with no finite sample complexity bounds. In this paper we study the global system operator: the operator that maps the noises vectors to the output. In particular, we obtain estimates on its spectrum, and as a result derive the first known variance estimators with sample complexity guarantees for online regression problems. We demonstrate the approach on a number of synthetic and real-world benchmarks.
Efficient Loss-Based Decoding on Graphs for Extreme Classification
Evron, Itay, Moroshko, Edward, Crammer, Koby
In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space (LTLS), and on a general approach for error correcting output coding (ECOC) with loss-based decoding, and introduce a flexible and efficient approach accompanied by theoretical bounds. Our framework employs output codes induced by graphs, for which we show how to perform efficient loss-based decoding to potentially improve accuracy. In addition, our framework offers a tradeoff between accuracy, model size and prediction time. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows that our method is competitive with state-of-the-art algorithms.
Efficient Loss-Based Decoding on Graphs for Extreme Classification
Evron, Itay, Moroshko, Edward, Crammer, Koby
In extreme classification problems, learning algorithms are required to map instances tolabels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space [19], and on a general approach for error correcting output coding (ECOC) with loss-based decoding [1], and introduce a flexible and efficient approach accompanied by theoretical bounds. Our framework employs output codes induced by graphs, for which we show how to perform efficient loss-based decoding to potentially improve accuracy. In addition, ourframework offers a tradeoff between accuracy, model size and prediction time. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows that our method is competitive with state-of-the-art algorithms.
Multi Instance Learning For Unbalanced Data
Kozdoba, Mark, Moroshko, Edward, Shani, Lior, Takagi, Takuya, Katoh, Takashi, Mannor, Shie, Crammer, Koby
In the context of Multi Instance Learning, we analyze the Single Instance (SI) learning objective. We show that when the data is unbalanced and the family of classifiers is sufficiently rich, the SI method is a useful learning algorithm. In particular, we show that larger data imbalance, a quality that is typically perceived as negative, in fact implies a better resilience of the algorithm to the statistical dependencies of the objects in bags. In addition, our results shed new light on some known issues with the SI method in the setting of linear classifiers, and we show that these issues are significantly less likely to occur in the setting of neural networks. We demonstrate our results on a synthetic dataset, and on the COCO dataset for the problem of patch classification with weak image level labels derived from captions.
Efficient Loss-Based Decoding On Graphs For Extreme Classification
Evron, Itay, Moroshko, Edward, Crammer, Koby
In extreme classification problems, learning algorithms are required to map instances to labels from an extremely large label set. We build on a recent extreme classification framework with logarithmic time and space, and on a general approach for error correcting output coding (ECOC), and introduce a flexible and efficient approach accompanied by bounds. Our framework employs output codes induced by graphs, and offers a tradeoff between accuracy and model size. We show how to find the sweet spot of this tradeoff using only the training data. Our experimental study demonstrates the validity of our assumptions and claims, and shows the superiority of our method compared with state-of-the-art algorithms.