Collaborating Authors

Li, Chun-Liang

DISSECT: Disentangled Simultaneous Explanations via Concept Traversals Artificial Intelligence

Explaining deep learning model inferences is a promising venue for scientific understanding, improving safety, uncovering hidden biases, evaluating fairness, and beyond, as argued by many scholars. One of the principal benefits of counterfactual explanations is allowing users to explore "what-if" scenarios through what does not and cannot exist in the data, a quality that many other forms of explanation such as heatmaps and influence functions are inherently incapable of doing. However, most previous work on generative explainability cannot disentangle important concepts effectively, produces unrealistic examples, or fails to retain relevant information. We propose a novel approach, DISSECT, that jointly trains a generator, a discriminator, and a concept disentangler to overcome such challenges using little supervision. DISSECT generates Concept Traversals (CTs), defined as a sequence of generated examples with increasing degrees of concepts that influence a classifier's decision. By training a generative model from a classifier's signal, DISSECT offers a way to discover a classifier's inherent "notion" of distinct concepts automatically rather than rely on user-predefined concepts. We show that DISSECT produces CTs that (1) disentangle several concepts, (2) are influential to a classifier's decision and are coupled to its reasoning due to joint training (3), are realistic, (4) preserve relevant information, and (5) are stable across similar inputs. We validate DISSECT on several challenging synthetic and realistic datasets where previous methods fall short of satisfying desirable criteria for interpretability and show that it performs consistently well and better than existing methods.

i-Mix: A Strategy for Regularizing Contrastive Representation Learning Machine Learning

Contrastive representation learning has shown to be an effective way of learning representations from unlabeled data. However, much progress has been made in vision domains relying on data augmentations carefully designed using domain knowledge. In this work, we propose i-Mix, a simple yet effective regularization strategy for improving contrastive representation learning in both vision and non-vision domains. We cast contrastive learning as training a non-parametric classifier by assigning a unique virtual class to each data in a batch. Then, data instances are mixed in both the input and virtual label spaces, providing more augmented data during training. In experiments, we demonstrate that i-Mix consistently improves the quality of self-supervised representations across domains, resulting in significant performance gains on downstream tasks. Furthermore, we confirm its regularization effect via extensive ablation studies across model and dataset sizes.

Interpretable Sequence Learning for COVID-19 Forecasting Machine Learning

We propose a novel approach that integrates machine learning into compartmental disease modeling to predict the progression of COVID-19. Our model is explainable by design as it explicitly shows how different compartments evolve and it uses interpretable encoders to incorporate covariates and improve performance. Explainability is valuable to ensure that the model's forecasts are credible to epidemiologists and to instill confidence in end-users such as policy makers and healthcare institutions. Our model can be applied at different geographic resolutions, and here we demonstrate it for states and counties in the United States. We show that our model provides more accurate forecasts, in metrics averaged across the entire US, than state-of-the-art alternatives, and that it provides qualitatively meaningful explanatory insights. Lastly, we analyze the performance of our model for different subgroups based on the subgroup distributions within the counties.

Kernel Stein Generative Modeling Machine Learning

We are interested in gradient-based Explicit Generative Modeling where samples can be derived from iterative gradient updates based on an estimate of the score function of the data distribution. Recent advances in Stochastic Gradient Langevin Dynamics (SGLD) demonstrates impressive results with energy-based models on high-dimensional and complex data distributions. Stein Variational Gradient Descent (SVGD) is a deterministic sampling algorithm that iteratively transports a set of particles to approximate a given distribution, based on functional gradient descent that decreases the KL divergence. SVGD has promising results on several Bayesian inference applications. However, applying SVGD on high dimensional problems is still under-explored. The goal of this work is to study high dimensional inference with SVGD. We first identify key challenges in practical kernel SVGD inference in high-dimension. We propose noise conditional kernel SVGD (NCK-SVGD), that works in tandem with the recently introduced Noise Conditional Score Network estimator. NCK is crucial for successful inference with SVGD in high dimension, as it adapts the kernel to the noise level of the score estimate. As we anneal the noise, NCK-SVGD targets the real data distribution. We then extend the annealed SVGD with an entropic regularization. We show that this offers a flexible control between sample quality and diversity, and verify it empirically by precision and recall evaluations. The NCK-SVGD produces samples comparable to GANs and annealed SGLD on computer vision benchmarks, including MNIST and CIFAR-10.

Nonparametric Density Estimation under Adversarial Losses

Neural Information Processing Systems

We study minimax convergence rates of nonparametric density estimation under a large class of loss functions called adversarial losses'', which, besides classical L p losses, includes maximum mean discrepancy (MMD), Wasserstein distance, and total variation distance. These losses are closely related to the losses encoded by discriminator networks in generative adversarial networks (GANs). In a general framework, we study how the choice of loss and the assumed smoothness of the underlying density together determine the minimax rate. We also discuss implications for training GANs based on deep ReLU networks, and more general connections to learning implicit generative models in a minimax statistical sense. Papers published at the Neural Information Processing Systems Conference.

MMD GAN: Towards Deeper Understanding of Moment Matching Network

Neural Information Processing Systems

Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a two-sample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing {\it adversarial kernel learning} techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMD-GAN.

Unsupervised Program Synthesis for Images using Tree-Structured LSTM Machine Learning

Program synthesis has recently emerged as a promising approach to the image parsing task. However, most prior works have relied on supervised learning methods, which require ground truth programs for each training image. We present an unsupervised learning algorithm that can parse constructive solid geometry (CSG) images into context-free grammar with a non-differentiable renderer. We propose a grammar-encoded tree LSTM to effectively constrain our search space by leveraging the structure of the context-free grammar while handling the non-differentiable renderer via REINFORCE and encouraging the exploration by regularizing the objective with an entropy term. Instead of using simple Monte Carlo sampling, we propose a lower-variance entropy estimator with sampling without replacement for effective exploration. We demonstrate the effectiveness of the proposed algorithm on a synthetic 2D CSG dataset, which outperforms baseline models by a large margin.

FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence Machine Learning

Semi-supervised learning (SSL) provides an effective means of leveraging unlabeled data to improve a model's performance. In this paper, we demonstrate the power of a simple combination of two common SSL methods: consistency regularization and pseudo-labeling. Our algorithm, FixMatch, first generates pseudo-labels using the model's predictions on weakly-augmented unlabeled images. For a given image, the pseudo-label is only retained if the model produces a high-confidence prediction. The model is then trained to predict the pseudo-label when fed a strongly-augmented version of the same image. Despite its simplicity, we show that FixMatch achieves state-of-the-art performance across a variety of standard semi-supervised learning benchmarks, including 94.93% accuracy on CIFAR-10 with 250 labels and 88.61% accuracy with 40 -- just 4 labels per class. Since FixMatch bears many similarities to existing SSL methods that achieve worse performance, we carry out an extensive ablation study to tease apart the experimental factors that are most important to FixMatch's success. We make our code available at

Learned Interpolation for 3D Generation Machine Learning

In order to generate novel 3D shapes with machine learning, one must allow for interpolation. The typical approach for incorporating this creative process is to interpolate in a learned latent space so as to avoid the problem of generating unrealistic instances by exploiting the model's learned structure. The process of the interpolation is supposed to form a semantically smooth morphing. While this approach is sound for synthesizing realistic media such as lifelike portraits or new designs for everyday objects, it subjectively fails to directly model the unexpected, unrealistic, or creative. In this work, we present a method for learning how to interpolate point clouds. By encoding prior knowledge about real-world objects, the intermediate forms are both realistic and unlike any existing forms. We show not only how this method can be used to generate "creative" point clouds, but how the method can also be leveraged to generate 3D models suitable for sculpture.

Getting Topology and Point Cloud Generation to Mesh Machine Learning

In this work, we explore the idea that effective generative models for point clouds under the autoencoding framework must acknowledge the relationship between a continuous surface, a discretized mesh, and a set of points sampled from the surface. This view motivates a generative model that works by progressively deforming a uniform sphere until it approximates the goal point cloud. We review the underlying concepts leading to this conclusion from computer graphics and topology in differential geometry, and model the generation process as deformation via deep neural network parameterization. Finally, we show that this view of the problem produces a model that can generate quality meshes efficiently.