Goto

Collaborating Authors

 dnn


A Composite Activation Function for Learning Stable Binary Representations

arXiv.org Machine Learning

Activation functions play a central role in neural networks by shaping internal representations. Recently, learning binary activation representations has attracted significant attention due to their advantages in computational and memory efficiency, as well as interpretability. However, training neural networks with Heaviside activations remains challenging, as their non-differentiability obstructs standard gradient-based optimization. In this paper, we propose Heavy Tailed Activation Function (HTAF), a smooth approximation to the Heaviside function that enables stable training with gradient-based optimization. We construct HTAF as a sigmoid hyperbolic tangent composite function and theoretically show that it maintains a large gradient mass around zero inputs while exhibiting slower gradient decay in the tail regions. We show that Spiking Neural Networks, Binary Neural Networks and Deep Heaviside neural Networks can be trained stably using HTAF with gradient-based optimization. Finally, we introduce Implicit Concept Bottleneck Models (ICBMs), an interpretable image model that leverages HTAF to induce discrete feature representations. Extensive experiments across various architectures and image datasets demonstrate that ICBM enables stable discretization while achieving prediction performance comparable to or better than standard models.


Universality in Deep Neural Networks: An approach via the Lindeberg exchange principle

arXiv.org Machine Learning

We consider the infinite-width limit of a fully connected deep neural network with general weights, and we prove quantitative general bounds on the $2$-Wasserstein distance between the network and its infinite-width Gaussian limit, under appropriate regularity assumptions on the activation function. Our main tool is a Lindeberg principle for Deep Neural Networks, which we use to successively replace the weights on each layer by Gaussian random variables.



Rethinking Calibration of Deep Neural Networks: Do Not Be Afraid of Overconfidence

Neural Information Processing Systems

Capturing accurate uncertainty quantification of the predictions from deep neural networks is important in many real-world decision-making applications. A reliable predictor is expected to be accurate when it is confident about its predictions and indicate high uncertainty when it is likely to be inaccurate. However, modern neural networks have been found to be poorly calibrated, primarily in the direction of overconfidence. In recent years, there is a surge of research on model calibration by leveraging implicit or explicit regularization techniques during training, which achieve well calibration performance by avoiding overconfident outputs. In our study, we empirically found that despite the predictions obtained from these regularized models are better calibrated, they suffer from not being as calibratable, namely, it is harder to further calibrate these predictions with post-hoc calibration methods like temperature scaling and histogram binning. We conduct a series of empirical studies showing that overconfidence may not hurt final calibration performance if post-hoc calibration is allowed, rather, the penalty of confident outputs will compress the room of potential improvement in post-hoc calibration phase. Our experimental findings point out a new direction to improve calibration of DNNs by considering main training and post-hoc calibration as a unified framework.


Interpreting Representation Quality of DNNs for 3DPoint Cloud Processing: Supplementary Materials Wen Shenb Qihan Rena Dongrui Liua Quanshi Zhanga aShanghai Jiao Tong UniversitybTongji University

Neural Information Processing Systems

This section provides more details about Shapley values in Section 3 of the paper. Linearity: If two independent games vand wcan be merged into one game u(S) = v(S)+w(S), then the Shapley value of the player i in game v and game w also can be merged, i.e. φu(i) = φv(i)+φw(i). Nullity: A dummy player isatisfies S N\{i},v(S {i}) = v(S)+v({i}), which indicates that the player ihas no interaction with other players, i.e. φ(i) = v({i}). Efficiency: The overall reward can be allocated to all players in the game, i.e. This section provides more details about multi-order interactions [8] in Section 3.3 of the paper.


Interpreting Representation Quality of DNNs for 3DPoint Cloud Processing

Neural Information Processing Systems

In this paper, we evaluate the quality of knowledge representations encoded in deep neural networks (DNNs) for 3D point cloud processing. We propose a method to disentangle the overall model vulnerability into the sensitivity to the rotation, the translation, the scale, and local 3D structures. Besides, we also propose metrics to evaluate the spatial smoothness of encoding 3D structures, and the representation complexity of the DNN. Based on such analysis, experiments expose representation problems with classic DNNs, and explain the utility of the adversarial training. The code will be released when this paper is accepted.


Polyhedron Attention Module: Learning Adaptive-order Interactions

Neural Information Processing Systems

Learning feature interactions can be the key for multivariate predictive modeling. ReLU-activated neural networks create piecewise linear prediction models. Other nonlinear activation functions lead to models with only high-order feature interactions, thus lacking of interpretability. Recent methods incorporate candidate polynomial terms of fixed orders into deep learning, which is subject to the issue of combinatorial explosion, or learn the orders that are difficult to adapt to different regions of the feature space. We propose a Polyhedron Attention Module (PAM) to create piecewise polynomial models where the input space is split into polyhedrons which define the different pieces and on each piece the hyperplanes that define the polyhedron boundary multiply to form the interactive terms, resulting in interactions of adaptive order to each piece. PAM is interpretable to identify important interactions in predicting a target. Theoretic analysis shows that PAM has stronger expression capability than ReLU-activated networks. Extensive experimental results demonstrate the superior classification performance of PAM on massive datasets of the click-through rate prediction and PAM can learn meaningful interaction effects in a medical problem.




AUnified Game-Theoretic Interpretation of Adversarial Robustness: Supplementary Material

Neural Information Processing Systems

In this section, in order to help readers understand the metric in the paper, we first revisit the definition of the Shapley value [14], which is widely considered as an unbiased estimation of the numerical importance w.r.t. each input variable. In game theory, the complex system is usually represented as a game, where each input variable is taken as a player, and the output of this system is regarded as the total reward of all players. Given a game with multiple players (input variables) N = {1,2,,n}, some players cooperate to pursue a high reward. Thus, the task is to divide the total reward, and fairly assign the divided elementary reward to each individual player. In this way, the elementary reward can be considered as the numerical importance of the corresponding variable to the complex system. Let 2N def= {S|S N}indicate all potential subsets of N. The game v: 2N R is a function, which estimates the overall reward v(S) earned by each specific subset of players S N. In this way, the Shapley value, denoted by φ(i), represents the numerical importance of the player ito the game v. φ(i) = X Using Shapley values to explain DNNs.