Goto

Collaborating Authors

 lemmae


OfflineReinforcementLearningwithDifferential Privacy

Neural Information Processing Systems

Since offline RL does not require access to the environment, it can be applied to problems where interaction with environment is infeasible,e.g., when collecting new data is costly (trade or finance [Zhang et al., 2020]), risky (autonomous driving [Sallab et al., 2017]) or illegal / unethical (healthcare [Raghu etal.,2017]).



Appendices

Neural Information Processing Systems

The Hessian of f(Z) can be viewed as an KN KN matrix by vectorizing the matrix Z. For deeper linear networks, it can be shown that flat saddle points exist at the origin, but there are no spurious local minima [34,37]. While most of these results based on the bottom-up approach explain optimization and generalization of certain types of deep neural networks, they provided limited insights into the practice of deep learning. In fact, our proof techniques are inspired by recent results on low-rank matrix recovery [77,80]. Some of the metrics are similar to those presented in [1]. Figure 7 depicts the learning curves in terms of both the training and test accuracy for all three optimization algorithms (i.e., SGD, Adam, and LBFGS).


f3d9de86462c28781cbe5c47ef22c3e5-Supplemental.pdf

Neural Information Processing Systems

The algorithm [62] consider Algorithm 2 for the stochastic generalized linear bandit problem. Assume thatθ is the true parameter of the reward model. Then we consider the lower bounds. For fj(A) = 12(ej1eTj2 +ej2eTj1),A with j1 j2, fj(Ai) is only 1 wheni = j and 0 otherwise. With Claim D.12 and Claim D.11 we get that g C q To get 1), we writeVl = [v1, vl] Rd l and V l = [vl+1, vk].


DeepNetworksProvablyClassifyDataonCurves Supplemental

Neural Information Processing Systems

Wewill also writeζθ(x) = fθ(x) f?(x)to denote the fitting error. We use Gaussian initialization: if` {1,2,...,L}, the weights are initialized as


LearningandTransferringSparseContextualBigrams withLinearTransformers

Neural Information Processing Systems

Weshowthat when trained from scratch,thetraining process can be split into an initial sample-intensive stage where the correlation is boosted from zero to a nontrivial value, followed by a more sample-efficient stageoffurther improvement. Additionally,weprovethat, provided anontrivial correlation between the downstream and pretraining tasks, finetuning from a pretrained model allowsustobypass the initial sample-intensivestage.


obliviousandData

Neural Information Processing Systems

In this section, we show a separation on the power of data-oblivious and data-aware poisoning attacks on classification. A different goal could be to make θ fail on a particular test set of adversary's interest, making it a targeted poisoning [3, 56] or increase the probability of a general "bad predicate" of θ [44]. We now state and prove our separation on the power of data-oblivious and data-aware poisoning attacks on classification. In particular we show that empirical risk minimization (ERM) algorithm could be much more susceptible to data-aware poisoning adversaries, compared to data-oblivious adversaries. On the other hand, any adversary will have much smaller advantage in the data-oblivious game.


40bb79c081828bebdc39d65a82367246-Supplemental-Conference.pdf

Neural Information Processing Systems

Table1: Linearnetwork Layer# Name Layer Inshape Outshape 1 Flatten() (3,32,32) 3072 2 fc1 nn.Linear(3072, 200) 3072 200 3 fc2 nn.Linear(200, 1) 200 1 Fully-connected Network We conduct further experiments on several different fully-connected networks with 4 hidden layers with various activation functions. Our subset is smaller because of the computation limitation when calculating the Gram matrix. Experiments show that the properties along GD trajectory(e.g. We consider simple linear networks, fully-connected networks, convolutional networks in this appendix. The following Figure 4 illustrates the positive correlation between thesharpness andtheA-norm, andtherelationship between theloss D(t) 2 and R(t) 2 alongthetrajectory.