conclusion
Interpreting Representation Quality of DNNs for 3DPoint Cloud Processing: Supplementary Materials Wen Shenb Qihan Rena Dongrui Liua Quanshi Zhanga aShanghai Jiao Tong UniversitybTongji University
This section provides more details about Shapley values in Section 3 of the paper. Linearity: If two independent games vand wcan be merged into one game u(S) = v(S)+w(S), then the Shapley value of the player i in game v and game w also can be merged, i.e. φu(i) = φv(i)+φw(i). Nullity: A dummy player isatisfies S N\{i},v(S {i}) = v(S)+v({i}), which indicates that the player ihas no interaction with other players, i.e. φ(i) = v({i}). Efficiency: The overall reward can be allocated to all players in the game, i.e. This section provides more details about multi-order interactions [8] in Section 3.3 of the paper.
Appendix
We first introduce some handy concepts and results to make the proof succinct, meanwhile providing more information for understanding our model and theory. We begin with some extended discussions on CSG. Note that a reparameterization unnecessarily has its output dimensions in S, i.e. The condition that p(y|s) = p0(y|ΦS(s,v)) for any v V does not indicate that ΦS(s,v) is constant of v, since p0(y|s0) may ignore the change of s0 = ΦS(s,v) from the change of v. The following lemma shows the meaning of a reparameterization: it allows a CSG to vary while inducing the same distribution on the observed data variables (x,y) (i.e., holding the same effect on describing data). We can now define and verify an equivalent relation on CSGs so that the resulting equivalent class contains CSGs that induce the same (x,y) data distribution and hold the same semantic information in their svariables. We say two CSGs pand p0 are semantic-equivalent, if there exists a homeomorphism11 Φ on S V, such that (i) is semantic-preserving: its output dimensions in S is constant of v, ΦS(s,v) = ΦS(s) for any v V, and (ii) it acts as a reparameterization from p to p0: Φ#[ps,v] = p0s,v, p(x|s,v) = p0(x|Φ(s,v)) and p(y|s) = p0(y|ΦS(s)). A.1 below shows that the defined binary relation is indeed an equivalence relation in common cases. As a reparameterization, Φ allows the two models to have different latent-variable parameterizations while inducing the same distribution on the observed data variables (x,y) (Lemma 9). This definition of semantic-equivalence can be rephrased as the existence of a semantic-preserving reparameterization. With proper model assumptions, we can show that any reparameterization between two CSGs is semantic-preserving, so that semantic-preserving CSGs cannot be converted to each other by a reparameterization that mixes swith v. Lemma 11. For two CSGs pand p0, if p0(y|s) has a statistics M0(s) that is an injective function of s, then any reparameterization Φ from pto p0, if exists, has its ΦS constant of v. Proof. Then the condition that p(y|s) = p0(y|ΦS(s,v)) for any v V indicates that M(s) = M0(ΦS(s,v)). If there exist s S and v(1) 6= v(2) V such that ΦS(s,v(1)) 6= ΦS(s,v(2)), then M0(ΦS(s,v(1))) 6= M0(ΦS(s,v(2))) 11A transformation is a homeomorphism if it is a continuous bijection with continuous inverse. This violates M(s) = M0(ΦS(s,v)) which requires both M0(ΦS(s,v(1))) and M0(ΦS(s,v(2))) to be equal to M(s). We then introduce two mathematical facts. Let z be a random variable on a Euclidean space RdZ with density function pz(z), and let Φ be a homeomorphism on RdZ whose inverse Φ 1 is differentiable.
35th Conference on Neural Information Processing Systems 2021 . Corresponding author https
We demonstrate our framework's utility by proving and methods that are guaranteed to be defended against deception, given bounded sistent conclusions about performance. Our framework enables us to prove EHPO put forth a logical framework to capture its semantics and how it can lead to inconrigorous. We call this process epistemic hyperparameter optimization (EHPO), and deception, the process of drawing conclusions from HPO should be made more provide a theoretical complement to this prior work, arguing that, to avoid such the opposite. In short, the way we choose hyperparameters can deceive us. We yield the conclusion that J outperforms K, whereas searching another can entail research.
Appendix: Remodel Self-Attention with Gaussian Kernel and Nyström Method
Y-axis: Cross Entropy Loss on validation set. Figure 1 shows the validation loss changes with respect to training time for 50k steps as supplementary results for the experiments in Section 5. In general, Skyformer converges faster and finishes 50k steps earlier than vanilla Attention and Kernelized Attention over all tasks. We further remark that on Text Classification, all models quickly fall into over-fitting, and thus the validation losses rise quickly. On Pathfinder, due to the difficulty of training, in the trial shown in the figure vanilla Attention fails to reach the best long-time limit under a certain setting. Figure 2 shows the singular value distribution of attention output from the second layer of a trained vanilla transformer.
ALabel model and illustrations
A.1 Majority Voting The Majority Voting (MV) is the most intuitive algorithm for aggregate LFs' annotations. We omit this case for simplicity. A.3 Snorkel MeTaL The parameters µof Snorkel MeTaL [31] are given by Bayes' theorem we have: pµ(y = c,λ = m) = pµ(λ = m | y = c)p(y = c) = Consider a label model g(L(x),x) F in arbitrary functional class F, e.g., neural network, and having additional dependency on data feature x4, we can still approximate such complicated function with identity function-based label model g W(x)(L(x)) similar to the aforementioned one except that W(x): X RM (C+1) C is a similarly complicated function, e.g., neural network, that maps each data x X to a unique label model parameter W(x). We leave the exploration of more complicated form of label models into future work. B.1 Case 1: identity function We define the loss with reweighted sample as, Instead of employing the decomposing loss function, we introduce a more general influence estimation method - weight-moving Influence, which get ride of the loss decomposition and approximation and is agnostic to the selection of σ() function.
Perturbation Towards Easy Samples Improves Targeted Adversarial Transferability
The transferability of adversarial perturbations provides an effective shortcut for black-box attacks. Targeted perturbations have greater practicality but are more difficult to transfer between models. In this paper, we experimentally and theoretically demonstrated that neural networks trained on the same dataset have more consistent performance in High-Sample-Density-Regions (HSDR) of each class instead of low sample density regions. Therefore, in the target setting, adding perturbations towards HSDR of the target class is more effective in improving transferability. However, density estimation is challenging in high-dimensional scenarios.