Metric as Transform: Exploring beyond Affine Transform for Interpretable Neural Network
–arXiv.org Artificial Intelligence
Artificial Neural Networks of varying architectures are generally paired with affine transformation at the core. However, we find dot product neurons with global influence less interpretable as compared to local influence of euclidean distance (as used in Radial Basis Function Network). We find that metrics as transform performs similarly to affine transform when used in MultiLayer Perceptron or Convolutional Neural Network. Moreover, we explore various properties of Metrics, compare it with Affine, and present multiple cases where metrics seem to provide better interpretability. We develop an interpretable local dictionary based Neural Networks and use it to understand and reject adversarial examples. Artificial Neural Networks (ANN) are used end-to-end and generally as black-box function approximators. This is partly due to the vast number of parameters, the underlying function used, and the high dimension of input and hidden neurons. The backbone of Deep Networks including MLP, CNN Krizhevsky et al. (2012), Transformers Vaswani et al. (2017), and MLP-Mixers Tolstikhin et al. (2021) has been a linear transform of form y = Wx + b (or per neuron: y However, matrix multiplication has won both software and hardware-lottery Hooker (2021), is highly optimized and tightly integrated in current Deep Neural Networks. Since metrics have a point of minima, the lower contour set of metrics is generally bounded and the minima represents that two points are similar. We can also generalize the metrics by relaxing different axioms (see Section 2.9). In this paper, we explore various properties of metrics such as voronoi partitioning, metrics as transform, their generalization, invertibility, application in low dimensional embeddings, dictionary learning, adversarial rejection and noisy optimization. First, we analyze various properties of Metrics and how it could help us with interpretability as compared to linear transforms generally used in most neural network architectures. The dot product is simple to understand but we find the dot product neuron, with non-linear activation, is difficult to interpret. It represents a planar neuron rather than a local neuron as shown in Figure 1.
arXiv.org Artificial Intelligence
Oct-21-2024
- Country:
- Oceania > Australia > New South Wales > Sydney (0.04)
- Genre:
- Research Report (0.63)
- Technology: