initialize
The Implicit Bias of Structured State Space Models Can Be Poisoned With Clean Labels
Neural networks are powered by an implicit bias: a tendency of gradient descent to fit training data in a way that generalizes to unseen data. A recent class of neural network models gaining increasing popularity is structured state space models (SSMs). Prior work argued that the implicit bias of SSMs leads to generalization in a setting where data is generated by a low dimensional teacher. In this paper, we revisit the latter setting, and formally establish a phenomenon entirely undetected by prior work on the implicit bias of SSMs. Namely, we prove that while implicit bias leads to generalization under many choices of training data, there exist special examples whose inclusion in training completely distorts the implicit bias, to a point where generalization fails. This failure occurs despite the special training examples being labeled by the teacher, i.e., having clean labels! We empirically demonstrate the phenomenon, with SSMs trained independently and as part of non-linear neural networks. In the area of adversarial machine learning, disrupting generalization with cleanly labeled training examples is known as clean-label poisoning. Given the proliferation of SSMs, we believe that delineating their susceptibility to clean-label poisoning, and developing methods for overcoming this susceptibility, are critical research directions to pursue.
Appendix to: Conformal Frequency Estimation with Sketched Data
Output: deterministic upper-bound for the frequency of z in the data set: หfCMSup (z). The CMS-CU algorithm Algorithm A2 CMS-CU Input: Data set Z1,...,Zm. Output: deterministic upper-bound for the frequency of z in the data set: หfCMS CUup (z). Input: A (trainable) rule for computing nested intervals [หLm,ฮฑ(; t), หUm,ฮฑ(; t)], t T. Input: Number of data points mtrain0
example where multi step outperforms one step
As explained in the main text, this section presents an example that is only a slight modification of the one in Figure 4, but where a multi-step approach is clearly preferred over just one step. The data-generating and learning processes are exactly the same (100 trajectories of length 100, discount 0.9, ฮฑ = 0.1for reverse KL regularization). The only difference is that rather than using a behavior that is a mixture of optimal and uniform, we use a behavior that is a mixture of maximally suboptimal and uniform. If we call the suboptimal policy ฯ (which always goes down and left in our gridworld), then the behavior for the modified example is ฮฒ = 0.2 ฯ +0.8 u, where uis uniform. Results are shown in Figure 7. Figure 7: A gridworld example with modified behavior where multi-step is much better than one-step.
A Architectures, Hyper-parameters and Algorithms
Our approach, named ORDER, uses a three-step training process. In the next parts of this section, we'll explain the methods, structures, and settings we use in each of After that, we'll talk about how we set up and carried out our experiments. In this section, we'll break down the design of the state encoder, how we decided on the best We used a grid search strategy to find the optimal hyper-parameters for our experiments. This allowed each observation dimension to match up with a state factor. We summarize the training process in Algorithm 1.
Checklist
Do the main claims made in the abstract and introduction accurately reflect the paper's Did you describe the limitations of your work? Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experi-20 Did you include the total amount of compute and the type of resources used (e.g., type If your work uses existing assets, did you cite the creators? Did you mention the license of the assets? Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're We thereby state that we bear all responsibility in case of violation of rights, etc., and confirmation of F or what purpose was the dataset created? - For the novel task of data analysis as explained Who created the dataset and on behalf of which entity? - This dataset is created during a Who funded the creation of the dataset? What do the instances that comprise the dataset represent?