Goto

Collaborating Authors

 interpretable direction


DiffEx: Explaining a Classifier with Diffusion Models to Identify Microscopic Cellular Variations

arXiv.org Artificial Intelligence

In recent years, deep learning models have been extensively applied to biological data across various modalities. Discriminative deep learning models have excelled at classifying images into categories (e.g., healthy versus diseased, treated versus untreated). However, these models are often perceived as black boxes due to their complexity and lack of interpretability, limiting their application in real-world biological contexts. In biological research, explainability is essential: understanding classifier decisions and identifying subtle differences between conditions are critical for elucidating the effects of treatments, disease progression, and biological processes. To address this challenge, we propose DiffEx, a method for generating visually interpretable attributes to explain classifiers and identify microscopic cellular variations between different conditions. We demonstrate the effectiveness of DiffEx in explaining classifiers trained on natural and biological images. Furthermore, we use DiffEx to uncover phenotypic differences within microscopy datasets. By offering insights into cellular variations through classifier explanations, DiffEx has the potential to advance the understanding of diseases and aid drug discovery by identifying novel biomarkers.


Unsupervised Panoptic Interpretation of Latent Spaces in GANs Using Space-Filling Vector Quantization

arXiv.org Artificial Intelligence

Generative adversarial networks (GANs) learn a latent space whose samples can be mapped to real-world images. Such latent spaces are difficult to interpret. Some earlier supervised methods aim to create an interpretable latent space or discover interpretable directions that require exploiting data labels or annotated synthesized samples for training. However, we propose using a modification of vector quantization called space-filling vector quantization (SFVQ), which quantizes the data on a piece-wise linear curve. SFVQ can capture the underlying morphological structure of the latent space and thus make it interpretable. We apply this technique to model the latent space of pretrained StyleGAN2 and BigGAN networks on various datasets. Our experiments show that the SFVQ curve yields a general interpretable model of the latent space that determines which part of the latent space corresponds to what specific generative factors. Furthermore, we demonstrate that each line of SFVQ's curve can potentially refer to an interpretable direction for applying intelligible image transformations. We also showed that the points located on an SFVQ line can be used for controllable data augmentation.


Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models

arXiv.org Artificial Intelligence

We propose the first unsupervised and learning-based method to identify interpretable directions in h-space of pre-trained diffusion models. Our method is derived from an existing technique that operates on the GAN latent space. Specifically, we employ a shift control module that works on h-space of pre-trained diffusion models to manipulate a sample into a shifted version of itself, followed by a reconstructor to reproduce both the type and the strength of the manipulation. By jointly optimizing them, the model will spontaneously discover disentangled and interpretable directions. To prevent the discovery of meaningless and destructive directions, we employ a discriminator to maintain the fidelity of shifted sample. Due to the iterative generative process of diffusion models, our training requires a substantial amount of GPU VRAM to store numerous intermediate tensors for back-propagating gradient. To address this issue, we propose a general VRAM-efficient training algorithm based on gradient checkpointing technique to back-propagate any gradient through the whole generative process, with acceptable occupancy of VRAM and sacrifice of training efficiency. Compared with existing related works on diffusion models, our method inherently identifies global and scalable directions, without necessitating any other complicated procedures. Extensive experiments on various datasets demonstrate the effectiveness of our method.


Identifying Interpretable Visual Features in Artificial and Biological Neural Systems

arXiv.org Machine Learning

Single neurons in neural networks are often interpretable in that they represent individual, intuitively meaningful features. However, many neurons exhibit mixed selectivity, i.e., they represent multiple unrelated features. A recent hypothesis proposes that features in deep networks may be represented in superposition, i.e., on non-orthogonal axes by multiple neurons, since the number of possible interpretable features in natural data is generally larger than the number of neurons in a given network. Accordingly, we should be able to find meaningful directions in activation space that are not aligned with individual neurons. Here, we propose (1) an automated method for quantifying visual interpretability that is validated against a large database of human psychophysics judgments of neuron interpretability, and (2) an approach for finding meaningful directions in network activation space. We leverage these methods to discover directions in convolutional neural networks that are more intuitively meaningful than individual neurons, as we confirm and investigate in a series of analyses. Moreover, we apply the same method to three recent datasets of visual neural responses in the brain and find that our conclusions largely transfer to real neural data, suggesting that superposition might be deployed by the brain. This also provides a link with disentanglement and raises fundamental questions about robust, efficient and factorized representations in both artificial and biological neural systems. One of the oldest ideas in neuroscience is Cajal's single neuron doctrine (Finger, 2001) and its application to perception (Barlow, 1972), i.e., the hypothesis that individual sensory neurons encode individually meaningful features. The idea dates back to the early 1950s, when researchers began to find evidence of neurons that reliably and selectively fire in response to particular stimuli, such as dots on a contrasting background (Barlow, 1953) and lines of particular orientation and width (Hubel & Wiesel, 1959). These findings gave rise to the standard model of the ventral visual stream as a process of hierarchical feature extraction and pooling (Hubel & Wiesel, 1968; Gross et al., 1972; In this work, we adopt a pragmatic definition of feature based on human discernability, measured through psychophysics experiments (see below). For an attempt at a more formal definition see Elhage et al. (2022). Neurons in the early stages extract simple features, such as oriented lines, while neurons at later stages combine simple features to construct more complex composite features. In the highest stages, complex features are combined to yield representations of entire objects encoded by single neurons--the shape of a hand, or the face of a friend.


Orthogonal SVD Covariance Conditioning and Latent Disentanglement

arXiv.org Artificial Intelligence

Inserting an SVD meta-layer into neural networks is prone to make the covariance ill-conditioned, which could harm the model in the training stability and generalization abilities. In this paper, we systematically study how to improve the covariance conditioning by enforcing orthogonality to the Pre-SVD layer. Existing orthogonal treatments on the weights are first investigated. However, these techniques can improve the conditioning but would hurt the performance. To avoid such a side effect, we propose the Nearest Orthogonal Gradient (NOG) and Optimal Learning Rate (OLR). The effectiveness of our methods is validated in two applications: decorrelated Batch Normalization (BN) and Global Covariance Pooling (GCP). Extensive experiments on visual recognition demonstrate that our methods can simultaneously improve covariance conditioning and generalization. The combinations with orthogonal weight can further boost the performance. Moreover, we show that our orthogonality techniques can benefit generative models for better latent disentanglement through a series of experiments on various benchmarks. Code is available at: \href{https://github.com/KingJamesSong/OrthoImproveCond}{https://github.com/KingJamesSong/OrthoImproveCond}.


Interpreting Latent Spaces of Generative Models for Medical Images using Unsupervised Methods

arXiv.org Artificial Intelligence

Generative models such as Generative Adversarial Networks (GANs) and Variational Autoencoders (VAEs) play an increasingly important role in medical image analysis. The latent spaces of these models often show semantically meaningful directions corresponding to human-interpretable image transformations. However, until now, their exploration for medical images has been limited due to the requirement of supervised data. Several methods for unsupervised discovery of interpretable directions in GAN latent spaces have shown interesting results on natural images. This work explores the potential of applying these techniques on medical images by training a GAN and a VAE on thoracic CT scans and using an unsupervised method to discover interpretable directions in the resulting latent space. We find several directions corresponding to non-trivial image transformations, such as rotation or breast size. Furthermore, the directions show that the generative models capture 3D structure despite being presented only with 2D data. The results show that unsupervised methods to discover interpretable directions in GANs generalize to VAEs and can be applied to medical images. This opens a wide array of future work using these methods in medical image analysis.


Unsupervised Discovery of Interpretable Directions in the GAN Latent Space

arXiv.org Machine Learning

The latent spaces of typical GAN models often have semantically meaningful directions. Moving in these directions corresponds to human-interpretable image transformations, such as zooming or recoloring, enabling a more controllable generation process. However, the discovery of such directions is currently performed in a supervised manner, requiring human labels, pretrained models, or some form of self-supervision. These requirements can severely limit a range of directions existing approaches can discover. In this paper, we introduce an unsupervised method to identify interpretable directions in the latent space of a pretrained GAN model. By a simple model-agnostic procedure, we find directions corresponding to sensible semantic manipulations without any form of (self-)supervision. Furthermore, we reveal several non-trivial findings, which would be difficult to obtain by existing methods, e.g., a direction corresponding to background removal. As an immediate practical benefit of our work, we show how to exploit this finding to achieve a new state-of-the-art for the problem of saliency detection.