Goto

Collaborating Authors

 dmat



Supplement to ' Autoencoders that don't overfit towards the Identity '

Neural Information Processing Systems

Eq. 1 in the paper (re-stated in Eq. 2 below), and show that it is equal to the objective function in the Theorem in the paper (see Eq. 8 below) up to the factor In the following, we provide the detailed steps. We start by re-stating Eq. 1 in the paper 1 n null null nullA The details are outlined in Sections 2.2 and 2.3 below. See Eq. 1 above for the definitions of X, multiplied by the dropout-probability p, and q = 1 p. In line 6, we change the sum over the m columns back to matrix notation. Finally, in line 8, we used the substitutions from Eq. 1 as to obtain In lines 11 and 12, the squared loss is expanded into its four terms.


Review for NeurIPS paper: Dual Manifold Adversarial Robustness: Defense against Lp and non-Lp Adversarial Attacks

Neural Information Processing Systems

Additional Feedback: - From Table 2, it appears that the generalization of robustness by adversarial training to unseen attacks is boosted by the addition of OM-AT, although still mainly due AT. How are the results in such metric combining instead AT with other kind of perturbations, e.g. Is the improvement due to on-manifold adversarial training or just to more diverse adversarial attacks seen at training time? - Do the authors have an intuition about how the results presented can be useful in practice without the assumption of knowing the exact manifold? As mentioned above, it seems that the benefit of DMAT against unseen attacks decreases with out-of-manifold images. After reading it and the other reviews, I see positively the contribution/experiments on the artificial dataset. In particular, showing the benefit from OM-AT for clean accuracy and OM-robustness also in the domain of natural images is meaningful, combining AT and OM-AT seems also novel, although methodologically quite straightforward and DMAT leads to better robustness against unforeseen attacks (Table 2) on the artificial dataset OM-ImageNet.


Is Cosine-Similarity of Embeddings Really About Similarity?

Steck, Harald, Ekanadham, Chaitanya, Kallus, Nathan

arXiv.org Artificial Intelligence

Cosine-similarity is the cosine of the angle between two vectors, or equivalently the dot product between their normalizations. A popular application is to quantify semantic similarity between high-dimensional objects by applying cosine-similarity to a learned low-dimensional feature embedding. This can work better but sometimes also worse than the unnormalized dot-product between embedded vectors in practice. To gain insight into this empirical observation, we study embeddings derived from regularized linear models, where closed-form solutions facilitate analytical insights. We derive analytically how cosine-similarity can yield arbitrary and therefore meaningless `similarities.' For some linear models the similarities are not even unique, while for others they are implicitly controlled by the regularization. We discuss implications beyond linear models: a combination of different regularizations are employed when learning deep models; these have implicit and unintended effects when taking cosine-similarities of the resulting embeddings, rendering results opaque and possibly arbitrary. Based on these insights, we caution against blindly using cosine-similarity and outline alternatives.