Sreekumar, Gautam
Compositional World Knowledge leads to High Utility Synthetic data
Gaudi, Sachit, Sreekumar, Gautam, Boddeti, Vishnu
Machine learning systems struggle with robustness, under subpopulation shifts. This problem becomes especially pronounced in scenarios where only a subset of attribute combinations is observed during training -a severe form of subpopulation shift, referred as compositional shift. To address this problem, we ask the following question: Can we improve the robustness by training on synthetic data, spanning all possible attribute combinations? We first show that training of conditional diffusion models on limited data lead to incorrect underlying distribution. Therefore, synthetic data sampled from such models will result in unfaithful samples and does not lead to improve performance of downstream machine learning systems. To address this problem, we propose CoInD to reflect the compositional nature of the world by enforcing conditional independence through minimizing Fisher's divergence between joint and marginal distributions. We demonstrate that synthetic data generated by CoInD is faithful and this translates to state-of-the-art worst-group accuracy on compositional shift tasks on CelebA.
CoInD: Enabling Logical Compositions in Diffusion Models
Gaudi, Sachit, Sreekumar, Gautam, Boddeti, Vishnu
How can we learn generative models to sample data with arbitrary logical compositions of statistically independent attributes? The prevailing solution is to sample from distributions expressed as a composition of attributes' conditional marginal distributions under the assumption that they are statistically independent. This paper shows that standard conditional diffusion models violate this assumption, even when all attribute compositions are observed during training. And, this violation is significantly more severe when only a subset of the compositions is observed. We propose CoInD to address this problem. It explicitly enforces statistical independence between the conditional marginal distributions by minimizing Fisher's divergence between the joint and marginal distributions. The theoretical advantages of CoInD are reflected in both qualitative and quantitative experiments, demonstrating a significantly more faithful and controlled generation of samples for arbitrary logical compositions of attributes. The benefit is more pronounced for scenarios that current solutions relying on the assumption of conditionally independent marginals struggle with, namely, logical compositions involving the NOT operation and when only a subset of compositions are observed during training.
OASIS Uncovers: High-Quality T2I Models, Same Old Stereotypes
Dehdashtian, Sepehr, Sreekumar, Gautam, Boddeti, Vishnu Naresh
Images generated by text-to-image (T2I) models often exhibit visual biases and stereotypes of concepts such as culture and profession. Existing quantitative measures of stereotypes are based on statistical parity that does not align with the sociological definition of stereotypes and, therefore, incorrectly categorizes biases as stereotypes. Instead of oversimplifying stereotypes as biases, we propose a quantitative measure of stereotypes that aligns with its sociological definition. We then propose OASIS to measure the stereotypes in a generated dataset and understand their origins within the T2I model. S to measure spectral variance in the images along a stereotypical attribute. OASIS also includes two methods to understand the origins of stereotypes in T2I models: (U1) StOP to discover attributes that the T2I model internally associates with a given concept, and (U2) SPI to quantify the emergence of stereotypical attributes in the latent space of the T2I model during image generation. Despite the considerable progress in image fidelity, using OASIS, we conclude that newer T2I models such as FLUX.1 and SDv3 contain strong stereotypical predispositions about concepts and still generate images with widespread stereotypical attributes. S measures the variance of images along these attributes. In a sociological context, stereotypes are generalized beliefs or assumptions about a particular group of people, things, or categories (Bordalo et al., 2016). For instance, consider the images in Figure 1 generated by FLUX.1 (BlackForestLabs, 2024), SDv3 (Esser et al., 2024), and SDv2 (Rombach et al., 2022) for the prompt "A photo of a/an
Spurious Correlations and Where to Find Them
Sreekumar, Gautam, Boddeti, Vishnu Naresh
Spurious correlations occur when a model learns unreliable features from the data and are a well-known drawback of data-driven learning. Although there are several algorithms proposed to mitigate it, we are yet to jointly derive the indicators of spurious correlations. As a result, the solutions built upon standalone hypotheses fail to beat simple ERM baselines. We collect some of the commonly studied hypotheses behind the occurrence of spurious correlations and investigate their influence on standard ERM baselines using synthetic datasets generated from causal graphs. Subsequently, we observe patterns connecting these hypotheses and model design choices.
On the Biometric Capacity of Generative Face Models
Boddeti, Vishnu Naresh, Sreekumar, Gautam, Ross, Arun
There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: "Given a generative face model, how many unique identities can it generate?" In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and comparing different generative face models and establish an upper bound on their scalability. This paper proposes a statistical approach to estimate the biometric capacity of generated face images in a hyperspherical feature space. We employ our approach on multiple generative models, including unconditional generators like StyleGAN, Latent Diffusion Model, and "Generated Photos," as well as DCFace, a class-conditional generator. We also estimate capacity w.r.t. demographic attributes such as gender and age. Our capacity estimates indicate that (a) under ArcFace representation at a false acceptance rate (FAR) of 0.1%, StyleGAN3 and DCFace have a capacity upper bound of $1.43\times10^6$ and $1.190\times10^4$, respectively; (b) the capacity reduces drastically as we lower the desired FAR with an estimate of $1.796\times10^4$ and $562$ at FAR of 1% and 10%, respectively, for StyleGAN3; (c) there is no discernible disparity in the capacity w.r.t gender; and (d) for some generative models, there is an appreciable disparity in the capacity w.r.t age. Code is available at https://github.com/human-analysis/capacity-generative-face-models.
Neuro-DynaStress: Predicting Dynamic Stress Distributions in Structural Components
Bolandi, Hamed, Sreekumar, Gautam, Li, Xuyang, Lajnef, Nizar, Boddeti, Vishnu Naresh
Numerical analysis methods, such as Finite Element Analysis (FEA), are typically used to conduct stress analysis of various structures and systems for which it is impractical or hard to determine an analytical solution. Researchers commonly use FEA methods to evaluate the design, safety and maintenance of different structures in various fields, including aerospace, automotive, architecture and civil structural systems. The current workflow for FEA applications includes: (i) modeling the geometry and its components, (ii) specifying material properties, boundary conditions, meshing, and loading, (iii) dynamic analysis, which may be time-consuming based on the complexity of the model. The time requirement constraint and the complexity of the current FEA workflow make it impractical for real-time or near real-time applications, such as in the aftermath of a disaster or during extreme disruptive events that require immediate corrections to avoid catastrophic failures. Based on the steps of FEA described above, performing a complete stress analysis with conventional FEA has a high computational cost.
Physics Informed Neural Network for Dynamic Stress Prediction
Bolandi, Hamed, Sreekumar, Gautam, Li, Xuyang, Lajnef, Nizar, Boddeti, Vishnu Naresh
Structural failures are often caused by catastrophic events such as earthquakes and winds. As a result, it is crucial to predict dynamic stress distributions during highly disruptive events in real time. Currently available high-fidelity methods, such as Finite Element Models (FEMs), suffer from their inherent high complexity. Therefore, to reduce computational cost while maintaining accuracy, a Physics Informed Neural Network (PINN), PINN-Stress model, is proposed to predict the entire sequence of stress distribution based on Finite Element simulations using a partial differential equation (PDE) solver. Using automatic differentiation, we embed a PDE into a deep neural network's loss function to incorporate information from measurements and PDEs. The PINN-Stress model can predict the sequence of stress distribution in almost real-time and can generalize better than the model without PINN.