Rangarajan, Anand
Efficient Iterative Amortized Inference for Learning Symmetric and Disentangled Multi-Object Representations
Emami, Patrick, He, Pan, Ranka, Sanjay, Rangarajan, Anand
Unsupervised multi-object representation learning depends on inductive biases to guide the discovery of object-centric representations that generalize. However, we observe that methods for learning these representations are either impractical due to long training times and large memory consumption or forego key inductive biases. In this work, we introduce EfficientMORL, an efficient framework for the unsupervised learning of object-centric representations. We show that optimization challenges caused by requiring both symmetry and disentanglement can in fact be addressed by high-cost iterative amortized inference by designing the framework to minimize its dependence on it. We take a two-stage approach to inference: first, a hierarchical variational autoencoder extracts symmetric and disentangled representations through bottom-up inference, and second, a lightweight network refines the representations with top-down feedback. The number of refinement steps taken during training is reduced following a curriculum, so that at test time with zero steps the model achieves 99.1% of the refined decomposition performance. We demonstrate strong object decomposition and disentanglement on the standard multi-object benchmark while achieving nearly an order of magnitude faster training and test time inference over the previous state-of-the-art model.
Visual Explanations From Deep 3D Convolutional Neural Networks for Alzheimer's Disease Classification
Yang, Chengliang, Rangarajan, Anand, Ranka, Sanjay
We develop three efficient approaches for generating visual explanations from 3D convolutional neural networks (3D-CNNs) for Alzheimer's disease classification. One approach conducts sensitivity analysis on hierarchical 3D image segmentation, and the other two visualize network activations on a spatial map. Visual checks and a quantitative localization benchmark indicate that all approaches identify important brain parts for Alzheimer's disease diagnosis. Comparative analysis show that the sensitivity analysis based approach has difficulty handling loosely distributed cerebral cortex, and approaches based on visualization of activations are constrained by the resolution of the convolutional layer. The complementarity of these methods improves the understanding of 3D-CNNs in Alzheimer's disease classification from different perspectives.
Global Model Interpretation via Recursive Partitioning
Yang, Chengliang, Rangarajan, Anand, Ranka, Sanjay
In this work, we propose a simple but effective method to interpret black-box machine learning models globally. That is, we use a compact binary tree, the interpretation tree, to explicitly represent the most important decision rules that are implicitly contained in the black-box machine learning models. This tree is learned from the contribution matrix which consists of the contributions of input variables to predicted scores for each single prediction. To generate the interpretation tree, a unified process recursively partitions the input variable space by maximizing the difference in the average contribution of the split variable between the divided spaces. We demonstrate the effectiveness of our method in diagnosing machine learning models on multiple tasks. Also, it is useful for new knowledge discovery as such insights are not easily identifiable when only looking at single predictions. In general, our work makes it easier and more efficient for human beings to understand machine learning models.
A Category Space Approach to Supervised Dimensionality Reduction
Smith, Anthony O., Rangarajan, Anand
Supervised dimensionality reduction has emerged as an important theme in the last decade. Despite the plethora of models and formulations, there is a lack of a simple model which aims to project the set of patterns into a space defined by the classes (or categories). To this end, we set up a model in which each class is represented as a 1D subspace of the vector space formed by the features. Assuming the set of classes does not exceed the cardinality of the features, the model results in multi-class supervised learning in which the features of each class are projected into the class subspace. Class discrimination is automatically guaranteed via the imposition of orthogonality of the 1D class sub-spaces. The resulting optimization problem - formulated as the minimization of a sum of quadratic functions on a Stiefel manifold - while being non-convex (due to the constraints), nevertheless has a structure for which we can identify when we have reached a global minimum. After formulating a version with standard inner products, we extend the formulation to reproducing kernel Hilbert spaces in a straightforward manner. The optimization approach also extends in a similar fashion to the kernel version. Results and comparisons with the multi-class Fisher linear (and kernel) discriminants and principal component analysis (linear and kernel) showcase the relative merits of this approach to dimensionality reduction.
Gradient density estimation in arbitrary finite dimensions using the method of stationary phase
Gurumoorthy, Karthik S., Rangarajan, Anand, Corring, John
We prove that the density function of the gradient of a sufficiently smooth function $S : \Omega \subset \mathbb{R}^d \rightarrow \mathbb{R}$, obtained via a random variable transformation of a uniformly distributed random variable, is increasingly closely approximated by the normalized power spectrum of $\phi=\exp\left(\frac{iS}{\tau}\right)$ as the free parameter $\tau \rightarrow 0$. The result is shown using the stationary phase approximation and standard integration techniques and requires proper ordering of limits. We highlight a relationship with the well-known characteristic function approach to density estimation, and detail why our result is distinct from this approach.
Distance Transform Gradient Density Estimation using the Stationary Phase Approximation
Gurumoorthy, Karthik S., Rangarajan, Anand
The complex wave representation (CWR) converts unsigned 2D distance transforms into their corresponding wave functions. Here, the distance transform S(X) appears as the phase of the wave function \phi(X)---specifically, \phi(X)=exp(iS(X)/\tau where \tau is a free parameter. In this work, we prove a novel result using the higher-order stationary phase approximation: we show convergence of the normalized power spectrum (squared magnitude of the Fourier transform) of the wave function to the density function of the distance transform gradients as the free parameter \tau-->0. In colloquial terms, spatial frequencies are gradient histogram bins. Since the distance transform gradients have only orientation information (as their magnitudes are identically equal to one almost everywhere), as \tau-->0, the 2D Fourier transform values mainly lie on the unit circle in the spatial frequency domain. The proof of the result involves standard integration techniques and requires proper ordering of limits. Our mathematical relation indicates that the CWR of distance transforms is an intriguing, new representation.
The Concave-Convex Procedure (CCCP)
Yuille, Alan L., Rangarajan, Anand
This paper describes a simple geometrical Concave-Convex procedure (CCCP) for constructing discrete time dynamical systems which can be guaranteed to decrease almost any global optimization/energy function (see technical conditions in section (2)). We prove that there is a relationship between CCCP and optimization techniques based on introducing auxiliary variables using Legendre transforms. We distinguish between Legendre min-max and Legendre minimization. In the former, see [6], the introduction of auxiliary variables converts the problem to a min-max problem where the goal is to find a saddle point. By contrast, in Legendre minimization, see [8], the problem remains a minimization one (and so it becomes easier to analyze convergence).
MIME: Mutual Information Minimization and Entropy Maximization for Bayesian Belief Propagation
Rangarajan, Anand, Yuille, Alan L.
Bayesian belief propagation in graphical models has been recently shown to have very close ties to inference methods based in statistical physics. After Yedidia et al. demonstrated that belief propagation fixed points correspond to extrema of the so-called Bethe free energy, Yuille derived a double loop algorithm that is guaranteed to converge to a local minimum of the Bethe free energy. Yuille's algorithm is based on a certain decomposition of the Bethe free energy and he mentions that other decompositions are possible and may even be fruitful. In the present work, we begin with the Bethe free energy and show that it has a principled interpretation as pairwise mutual information minimization and marginal entropy maximization (MIME). Next, we construct a family of free energy functions from a spectrum of decompositions of the original Bethe free energy. For each free energy in this family, we develop a new algorithm that is guaranteed to converge to a local minimum. Preliminary computer simulations are in agreement with this theoretical development.
The Concave-Convex Procedure (CCCP)
Yuille, Alan L., Rangarajan, Anand
This paper describes a simple geometrical Concave-Convex procedure (CCCP) for constructing discrete time dynamical systems which can be guaranteed to decrease almost any global optimization/energy function (see technical conditions in section (2)). We prove that there is a relationship between CCCP and optimization techniques based on introducing auxiliary variables using Legendre transforms. We distinguish between Legendre min-max and Legendre minimization. In the former, see [6], the introduction of auxiliary variables converts the problem to a min-max problem where the goal is to find a saddle point. By contrast, in Legendre minimization, see [8], the problem remains a minimization one (and so it becomes easier to analyze convergence).
The Concave-Convex Procedure (CCCP)
Yuille, Alan L., Rangarajan, Anand
We introduce the Concave-Convex procedure (CCCP) which constructs discretetime iterative dynamical systems which are guaranteed to monotonically decrease global optimization/energy functions. It can be applied to (almost) any optimization problem and many existing algorithms can be interpreted in terms of CCCP. In particular, we prove relationships to some applications of Legendre transform techniques. We then illustrate CCCP by applications to Potts models, linear assignment, EM algorithms, and Generalized Iterative Scaling (GIS). CCCP can be used both as a new way to understand existing optimization algorithms and as a procedure for generating new algorithms. 1 Introduction There is a lot of interest in designing discrete time dynamical systems for inference and learning (see, for example, [10], [3], [7], [13]).