Chen, Ning
Rethinking Softmax Cross-Entropy Loss for Adversarial Robustness
Pang, Tianyu, Xu, Kun, Dong, Yinpeng, Du, Chao, Chen, Ning, Zhu, Jun
Previous work shows that adversarially robust generalization requires larger sample complexity, and the same dataset, e.g., CIFAR-10, which enables good standard accuracy may not suffice to train robust models. Since collecting new training data could be costly, we instead focus on inducing locally dense sample distribution, i.e., high sample density in the feature space which could lead to locally sufficient samples for robust learning. We first formally show that the softmax cross-entropy (SCE) loss and its variants induce inappropriate sample density distributions in the feature space, which inspires us to design appropriate training objectives. Specifically, we propose the Max-Mahalanobis center (MMC) loss to create high-density regions for better robustness. It encourages the learned features to gather around the preset class centers with optimal inter-class dispersion. Comparing to the SCE loss and its variants, we empirically demonstrate that applying the MMC loss can significantly improve robustness even under strong adaptive attacks, while keeping state-of-the-art accuracy on clean inputs with little extra computation.
Improving Adversarial Robustness via Promoting Ensemble Diversity
Pang, Tianyu, Xu, Kun, Du, Chao, Chen, Ning, Zhu, Jun
Though deep neural networks have achieved significant progress on various tasks, often enhanced by model ensemble, existing high-performance models can be vulnerable to adversarial attacks. Many efforts have been devoted to enhancing the robustness of individual networks and then constructing a straightforward ensemble, e.g., by directly averaging the outputs, which ignores the interaction among networks. This paper presents a new method that explores the interaction among individual networks to improve robustness for ensemble models. Technically, we define a new notion of ensemble diversity in the adversarial setting as the diversity among non-maximal predictions of individual members, and present an adaptive diversity promoting (ADP) regularizer to encourage the diversity, which leads to globally better robustness for the ensemble by making adversarial examples difficult to transfer among individual members. Our method is computationally efficient and compatible with the defense methods acting on individual networks. Empirical results on various datasets verify that our method can improve adversarial robustness while maintaining state-of-the-art accuracy on normal examples.
Analyzing and Improving Stein Variational Gradient Descent for High-dimensional Marginal Inference
Zhuo, Jingwei, Liu, Chang, Chen, Ning, Zhang, Bo
Stein variational gradient descent (SVGD) is a nonparametric inference method, which iteratively transports a set of randomly initialized particles to approximate a differentiable target distribution, along the direction that maximally decreases the KL divergence within a vector-valued reproducing kernel Hilbert space (RKHS). Compared to Monte Carlo methods, SVGD is particle-efficient because of the repulsive force induced by kernels. In this paper, we develop the first analysis about the high dimensional performance of SVGD and emonstrate that the repulsive force drops at least polynomially with increasing dimensions, which results in poor marginal approximation. To improve the marginal inference of SVGD, we propose Marginal SVGD (M-SVGD), which incorporates structural information described by a Markov random field (MRF) into kernels. M-SVGD inherits the particle efficiency of SVGD and can be used as a general purpose marginal inference tool for MRFs. Experimental results on grid based Markov random fields show the effectiveness of our methods.
SAM: Semantic Attribute Modulation for Language Modeling and Style Variation
Hu, Wenbo, Hua, Lifeng, Li, Lei, Su, Hang, Wang, Tian, Chen, Ning, Zhang, Bo
This paper presents a Semantic Attribute Modulation (SAM) for language modeling and style variation. The semantic attribute modulation includes various document attributes, such as titles, authors, and document categories. We consider two types of attributes, (title attributes and category attributes), and a flexible attribute selection scheme by automatically scoring them via an attribute attention mechanism. The semantic attributes are embedded into the hidden semantic space as the generation inputs. With the attributes properly harnessed, our proposed SAM can generate interpretable texts with regard to the input attributes. Qualitative analysis, including word semantic analysis and attention values, shows the interpretability of SAM. On several typical text datasets, we empirically demonstrate the superiority of the Semantic Attribute Modulated language model with different combinations of document attributes. Moreover, we present a style variation for the lyric generation using SAM, which shows a strong connection between the style variation and the semantic attributes.
Learning Attributes from the Crowdsourced Relative Labels
Tian, Tian (Tsinghua University) | Chen, Ning (Tsinghua University) | Zhu, Jun (Tsinghua University)
Finding semantic attributes to describe related concepts is typically a hard problem. The commonly used attributes in most fields are designed by domain experts, which is expensive and time-consuming. In this paper we propose an efficient method to learn human comprehensible attributes with crowdsourcing. We first design an analogical interface to collect relative labels from the crowds. Then we propose a hierarchical Bayesian model, as well as an efficient initialization strategy, to aggregate labels and extract concise attributes. Our experimental results demonstrate promise on discovering diverse and convincing attributes, which significantly improve the performance of the challenging zero-shot learning tasks.
Incentives for Strategic Behavior in Fisher Market Games
Chen, Ning (Nanyang Technological University) | Deng, Xiaotie (Shanghai Jiao Tong University) | Tang, Bo (University of Oxford) | Zhang, Hongyang (Stanford University)
In a Fisher market game, a market equilibrium is computed in terms of the utility functions and money endowments that agents reported. As a consequence, an individual buyer may misreport his private information to obtain a utility gain. We investigate the extent to which an agent's utility can be increased by unilateral strategic plays and prove that the percentage of this improvement is at most 2 for markets with weak gross substitute utilities. Equivalently, we show that truthfully reporting is a 0.5-approximate Nash equilibrium in this game. To identify sufficient conditions for truthfully reporting being close to Nash equilibrium, we conduct a parameterized study on strategic behaviors and further show that the ratio of utility gain decreases linearly as buyer's initial endowment increases or his maximum share of an item decreases. Finally, we consider collusive behavior of a coalition and prove that the utility gain is bounded by 1/(1 - maximum share of the collusion). Our findings justify the truthful reporting assumption in Fisher markets by a quantitative study on participants incentive, and imply that under large market assumption, the utility gain of a buyer from manipulations diminishes to 0.
Discriminative Nonparametric Latent Feature Relational Models with Data Augmentation
Chen, Bei (Tsinghua University) | Chen, Ning (Tsinghua University) | Zhu, Jun ( Tsinghua University ) | Song, Jiaming ( Tsinghua University ) | Zhang, Bo ( Tsinghua University )
We present a discriminative nonparametric latent feature relational model (LFRM) for link prediction to automatically infer the dimensionality of latent features. Under the generic RegBayes (regularized Bayesian inference) framework, we handily incorporate the prediction loss with probabilistic inference of a Bayesian model; set distinct regularization parameters for different types of links to handle the imbalance issue in real networks; and unify the analysis of both the smooth logistic log-loss and the piecewise linear hinge loss. For the nonconjugate posterior inference, we present a simple Gibbs sampler via data augmentation, without making restricting assumptions as done in variational methods. We further develop an approximate sampler using stochastic gradient Langevin dynamics to handle large networks with hundreds of thousands of entities and millions of links, orders of magnitude larger than what existing LFRM models can process. Extensive studies on various real networks show promising performance.
Nonlinear Feature Extraction with Max-Margin Data Shifting
Wangni, Jianqiao (Tsinghua University) | Chen, Ning (Tsinghua University )
Feature extraction is an important task in machine learning. In this paper, we present a simple and efficient method, named max-margin data shifting (MMDS), to process the data before feature extraction. By relying on a large-margin classifier, MMDS is helpful to enhance the discriminative ability of subsequent feature extractors. The kernel trick can be applied to extract nonlinear features from input data. We further analyze in detail the example of principal component analysis (PCA). The empirical results on multiple linear and nonlinear models demonstrate that MMDS can efficiently improve the performance of unsupervised extractors.
Discriminative Nonparametric Latent Feature Relational Models with Data Augmentation
Chen, Bei, Chen, Ning, Zhu, Jun, Song, Jiaming, Zhang, Bo
We present a discriminative nonparametric latent feature relational model (LFRM) for link prediction to automatically infer the dimensionality of latent features. Under the generic RegBayes (regularized Bayesian inference) framework, we handily incorporate the prediction loss with probabilistic inference of a Bayesian model; set distinct regularization parameters for different types of links to handle the imbalance issue in real networks; and unify the analysis of both the smooth logistic log-loss and the piecewise linear hinge loss. For the nonconjugate posterior inference, we present a simple Gibbs sampler via data augmentation, without making restricting assumptions as done in variational methods. We further develop an approximate sampler using stochastic gradient Langevin dynamics to handle large networks with hundreds of thousands of entities and millions of links, orders of magnitude larger than what existing LFRM models can process. Extensive studies on various real networks show promising performance.
Dropout Training for Support Vector Machines
Chen, Ning (Tsinghua University) | Zhu, Jun (Tsinghua University) | Chen, Jianfei (Tsinghua University) | Zhang, Bo (Tsinghua University)
Dropout and other feature noising schemes have shown promising results in controlling over-fitting by artificially corrupting the training data. Though extensive theoretical and empirical studies have been performed for generalized linear models, little work has been done for support vector machines (SVMs), one of the most successful approaches for supervised learning. This paper presents dropout training for linear SVMs. To deal with the intractable expectation of the non-smooth hinge loss under corrupting distributions, we develop an iteratively re-weighted least square (IRLS) algorithm by exploring data augmentation techniques. Our algorithm iteratively minimizes the expectation of a re-weighted least square problem, where the re-weights have closed-form solutions. The similar ideas are applied to develop a new IRLS algorithm for the expected logistic loss under corrupting distributions. Our algorithms offer insights on the connection and difference between the hinge loss and logistic loss in dropout training. Empirical results on several real datasets demonstrate the effectiveness of dropout training on significantly boosting the classification accuracy of linear SVMs.