Not enough data to create a plot.
Try a different view from the menu above.
Supplementary Material for ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
We adopt the memory budget used in the RMM paper [12]. In details, for each benchmark data, the memory budget is determined according to the phase number K. For instance [12], on CIFAR-10, the budget is 7k samples for K = 5 (7k samples = 10 classes per phase 500 samples per class + 2k samples). The numbers reported in Table A are duplicated from [12] where the compared methods are implemented in the same setting. The ACIL gives identical results either in growing-exemplar or fixed memory settings. This is because the ACIL does not belong to the branch of replay-based CIL.
ACIL: Analytic Class-Incremental Learning with Absolute Memorization and Privacy Protection
Class-incremental learning (CIL) learns a classification model with training data of different classes arising progressively. Existing CIL either suffers from serious accuracy loss due to catastrophic forgetting, or invades data privacy by revisiting used exemplars. Inspired by linear learning formulations, we propose an analytic class-incremental learning (ACIL) with absolute memorization of past knowledge while avoiding breaching of data privacy (i.e., without storing historical data). The absolute memorization is demonstrated in the sense that class-incremental learning using ACIL given present data would give identical results to that from its joint-learning counterpart which consumes both present and historical samples. This equality is theoretically validated. Data privacy is ensured since no historical data are involved during the learning process. Empirical validations demonstrate ACIL's competitive accuracy performance with near-identical results for various incremental task settings (e.g., 5-50 phases). This also allows ACIL to outperform the state-of-the-art methods for large-phase scenarios (e.g., 25 and 50 phases).
Transformer Doctor: Diagnosing and Treating Vision Transformers, Hao Chen 1, Yang Gao
Due to its powerful representational capabilities, Transformers have gradually become the mainstream model in the field of machine vision. However, the vast and complex parameters of Transformers impede researchers from gaining a deep understanding of their internal mechanisms, especially error mechanisms. Existing methods for interpreting Transformers mainly focus on understanding them from the perspectives of the importance of input tokens or internal modules, as well as the formation and meaning of features. In contrast, inspired by research on information integration mechanisms and conjunctive errors in the biological visual system, this paper conducts an in-depth exploration of the internal error mechanisms of Transformers. We first propose an information integration hypothesis for Transformers in the machine vision domain and provide substantial experimental evidence to support this hypothesis. This includes the dynamic integration of information among tokens and the static integration of information within tokens in Transformers, as well as the presence of conjunctive errors therein. Addressing these errors, we further propose heuristic dynamic integration constraint methods and rule-based static integration constraint methods to rectify errors and ultimately improve model performance. The entire methodology framework is termed as Transformer Doctor, designed for diagnosing and treating internal errors within transformers. Through a plethora of quantitative and qualitative experiments, it has been demonstrated that Transformer Doctor can effectively address internal errors in transformers, thereby enhancing model performance.
Design of Experiments for Stochastic Contextual Linear Bandits Andrea Zanette Department of Computer Science Department of Computer Science Stanford University
In the stochastic linear contextual bandit setting there exist several minimax procedures for exploration with policies that are reactive to the data being acquired. In practice, there can be a significant engineering overhead to deploy these algorithms, especially when the dataset is collected in a distributed fashion or when a human in the loop is needed to implement a different policy. Exploring with a single non-reactive policy is beneficial in such cases. Assuming some batch contexts are available, we design a single stochastic policy to collect a good dataset from which a near-optimal policy can be extracted. We present a theoretical analysis as well as numerical experiments on both synthetic and real-world datasets.
A Gunsilius's Algorithm
Gunsilius (2020) provides a theoretical framework for minimal conditions for a continuous IV model to imply non-trivial bounds (that is, bounds tighter that what can be obtained by just assuming that the density function p(x, y | z) exists). That work also introduces two variations of an algorithm for fitting bounds. The final distribution is reweighted combination of the pre-sampled l response functions with weights ยต playing the role of the decision variables to be optimized. Hence, by construction, the space of distributions in the response function space is absolutely continuous with respect to the pre-defined Gaussian process. Large deviance bounds are then used to show the (intuitive) result that this approximation is a probably approximately correct formulation of the original optimization problem. One issue with this algorithm is that l may be required to be large as it is a non-adaptive Monte Carlo approximation in a high dimensional space. A variant is described where, every time a solution for ยต is found, response function samples with low corresponding values of ยต are replaced (again, from the given and non-adaptive Gaussian process).
A Class of Algorithms for General Instrumental Variable Models
Causal treatment effect estimation is a key problem that arises in a variety of real-world settings, from personalized medicine to governmental policy making. There has been a flurry of recent work in machine learning on estimating causal effects when one has access to an instrument. However, to achieve identifiability, they in general require one-size-fits-all assumptions such as an additive error model for the outcome. An alternative is partial identification, which provides bounds on the causal effect. Little exists in terms of bounding methods that can deal with the most general case, where the treatment itself can be continuous. Moreover, bounding methods generally do not allow for a continuum of assumptions on the shape of the causal effect that can smoothly trade off stronger background knowledge for more informative bounds. In this work, we provide a method for causal effect bounding in continuous distributions, leveraging recent advances in gradient-based methods for the optimization of computationally intractable objective functions. We demonstrate on a set of synthetic and real-world data that our bounds capture the causal effect when additive methods fail, providing a useful range of answers compatible with observation as opposed to relying on unwarranted structural assumptions.