Goto

Collaborating Authors

 Zhang, Quanshi


Revisiting Generalization Power of a DNN in Terms of Symbolic Interactions

arXiv.org Artificial Intelligence

This paper aims to analyze the generalization power of deep neural networks (DNNs) from the perspective of interactions. Unlike previous analysis of a DNN's generalization power in a highdimensional feature space, we find that the generalization power of a DNN can be explained as the generalization power of the interactions. We found that the generalizable interactions follow a decay-shaped distribution, while non-generalizable interactions follow a spindle-shaped distribution. Furthermore, our theory can effectively disentangle these two types of interactions from a DNN. We have verified that our theory can well match real interactions in a DNN in experiments.


Randomness of Low-Layer Parameters Determines Confusing Samples in Terms of Interaction Representations of a DNN

arXiv.org Artificial Intelligence

We also discover that the confusing samples The above theory serves as a mathematical guarantee of a DNN, which are represented by non-generalizable to let AND-OR interactions in the logical model be roughly interactions, are determined by its low-layer parameters. In considered as primitive inference patterns equivalently used comparison, other factors, such as high-layer parameters by the DNN for inference. For example, as Figure 1 shows, and network architecture, have much less impact on the given an input prompt x ="A red apple falls to the ground composition of confusing samples. Two DNNs with different because of the pull of," the LLM generates the next token low-layer parameters usually have fully different sets of "gravity," and its inference score of token generation can confusing samples, even though they have similar performance.


Disentangling Regional Primitives for Image Generation

arXiv.org Artificial Intelligence

This paper presents a method to explain the internal representation structure of a neural network for image generation. Specifically, our method disentangles primitive feature components from the intermediate-layer feature of the neural network, which ensures that each feature component is exclusively used to generate a specific set of image regions. In this way, the generation of the entire image can be considered as the superposition of different pre-encoded primitive regional patterns, each being generated by a feature component. We find that the feature component can be represented as an OR relationship between the demands for generating different image regions, which is encoded by the neural network. Therefore, we extend the Harsanyi interaction to represent such an OR interaction to disentangle the feature component. Experiments show a clear correspondence between each feature component and the generation of specific image regions.


Alignment Between the Decision-Making Logic of LLMs and Human Cognition: A Case Study on Legal LLMs

arXiv.org Artificial Intelligence

This paper presents a method to evaluate the alignment between the decision-making logic of Large Language Models (LLMs) and human cognition in a case study on legal LLMs. Unlike traditional evaluations on language generation results, we propose to evaluate the correctness of the detailed decision-making logic of an LLM behind its seemingly correct outputs, which represents the core challenge for an LLM to earn human trust. To this end, we quantify the interactions encoded by the LLM as primitive decision-making logic, because recent theoretical achievements have proven several mathematical guarantees of the faithfulness of the interaction-based explanation. We design a set of metrics to evaluate the detailed decision-making logic of LLMs. Experiments show that even when the language generation results appear correct, a significant portion of the internal inference logic contains notable issues.


Quantifying In-Context Reasoning Effects and Memorization Effects in LLMs

arXiv.org Artificial Intelligence

In this study, we propose an axiomatic system to define and quantify the precise memorization and in-context reasoning effects used by the large language model (LLM) for language generation. These effects are formulated as non-linear interactions between tokens/words encoded by the LLM. Specifically, the axiomatic system enables us to categorize the memorization effects into foundational memorization effects and chaotic memorization effects, and further classify in-context reasoning effects into enhanced inference patterns, eliminated inference patterns, and reversed inference patterns. Besides, the decomposed effects satisfy the sparsity property and the universal matching property, which mathematically guarantee that the LLM's confidence score can be faithfully decomposed into the memorization effects and in-context reasoning effects. Experiments show that the clear disentanglement of memorization effects and in-context reasoning effects enables a straightforward examination of detailed inference patterns encoded by LLMs.


Two-Phase Dynamics of Interactions Explains the Starting Point of a DNN Learning Over-Fitted Features

arXiv.org Artificial Intelligence

This paper investigates the dynamics of a deep neural network (DNN) learning interactions. Previous studies have discovered and mathematically proven that given each input sample, a well-trained DNN usually only encodes a small number of interactions (non-linear relationships) between input variables in the sample. A series of theorems have been derived to prove that we can consider the DNN's inference equivalent to using these interactions as primitive patterns for inference. In this paper, we discover the DNN learns interactions in two phases. The first phase mainly penalizes interactions of medium and high orders, and the second phase mainly learns interactions of gradually increasing orders. We can consider the two-phase phenomenon as the starting point of a DNN learning over-fitted features. Such a phenomenon has been widely shared by DNNs with various architectures trained for different tasks. Therefore, the discovery of the two-phase dynamics provides a detailed mechanism for how a DNN gradually learns different inference patterns (interactions). In particular, we have also verified the claim that high-order interactions have weaker generalization power than low-order interactions. Thus, the discovered two-phase dynamics also explains how the generalization power of a DNN changes during the training process.


Defining and Extracting generalizable interaction primitives from DNNs

arXiv.org Artificial Intelligence

Faithfully summarizing the knowledge encoded by a deep neural network (DNN) into a few symbolic primitive patterns without losing much information represents a core challenge in explainable AI. To this end, Ren et al. (2023c) have derived a series of theorems to prove that the inference score of a DNN can be explained as a small set of interactions between input variables. However, the lack of generalization power makes it still hard to consider such interactions as faithful primitive patterns encoded by the DNN. Therefore, given different DNNs trained for the same task, we develop a new method to extract interactions that are shared by these DNNs. Experiments show that the extracted interactions can better reflect common knowledge shared by different DNNs. Explaining and quantifying the exact knowledge encoded by a deep neural network (DNN) presents a new challenge in explainable AI. Previous studies mainly visualized patterns encoded by DNNs (Bau et al., 2017; Kim et al., 2018) and estimated a saliency map on input variables (Simonyan et al., 2013; R. Selvaraju et al., 2017). However, a new question is that can we formulate the implicit knowledge encoded by the DNN as explicit and symbolic primitive patterns? In fact, we hope these primitive patterns serve as elementary units for inference, just like concepts in human cognition. However, there is no widely accepted way to define the concept encoded by a DNN, because we cannot mathematically define/formulate the exact concept in human cognition. Nevertheless, if we ignore cognitive issues, Ren et al. (2023c); Li & Zhang (2023b) have derived a series of theorems as convincing evidence to take interactions as symbolic primitives encoded by a DNN.


Bayesian Neural Networks Avoid Encoding Complex and Perturbation-Sensitive Concepts

arXiv.org Artificial Intelligence

In this paper, we focus on mean-field variational Bayesian Neural Networks (BNNs) and explore the representation capacity of such BNNs by investigating which types of concepts are less likely to be encoded by the BNN. It has been observed and studied that a relatively small set of interactive concepts usually emerge in the knowledge representation of a sufficiently-trained neural network, and such concepts can faithfully explain the network output. Based on this, our study proves that compared to standard deep neural networks (DNNs), it is less likely for BNNs to encode complex concepts. Experiments verify our theoretical proofs. Note that the tendency to encode less complex concepts does not necessarily imply weak representation power, considering that complex concepts exhibit low generalization power and high adversarial vulnerability. The code is available at https://github.com/sjtu-xai-lab/BNN-concepts.


Defects of Convolutional Decoder Networks in Frequency Representation

arXiv.org Artificial Intelligence

In this paper, we prove the representation defects of a cascaded convolutional decoder network, considering the capacity of representing different frequency components of an input sample. We conduct the discrete Fourier transform on each channel of the feature map in an intermediate layer of the decoder network. Then, we extend the 2D circular convolution theorem to represent the forward and backward propagations through convolutional layers in the frequency domain. Based on this, we prove three defects in representing feature spectrums. First, we prove that the convolution operation, the zero-padding operation, and a set of other settings all make a convolutional decoder network more likely to weaken high-frequency components. Second, we prove that the upsampling operation generates a feature spectrum, in which strong signals repetitively appear at certain frequencies. Third, we prove that if the frequency components in the input sample and frequency components in the target output for regression have a small shift, then the decoder usually cannot be effectively learned.


Does a Neural Network Really Encode Symbolic Concepts?

arXiv.org Artificial Intelligence

Recently, a series of studies have tried to extract interactions between input variables modeled by a DNN and define such interactions as concepts encoded by the DNN. However, strictly speaking, there still lacks a solid guarantee whether such interactions indeed represent meaningful concepts. Therefore, in this paper, we examine the trustworthiness of interaction concepts from four perspectives. Extensive empirical studies have verified that a well-trained DNN usually encodes sparse, transferable, and discriminative concepts, which is partially aligned with human intuition.