Goto

Collaborating Authors

 Gupta, Debayan


Improving text-conditioned latent diffusion for cancer pathology

arXiv.org Artificial Intelligence

The development of generative models in the past decade has allowed for hyperrealistic data synthesis. While potentially beneficial, this synthetic data generation process has been relatively underexplored in cancer histopathology. One algorithm for synthesising a realistic image is diffusion; it iteratively converts an image to noise and learns the recovery process from this noise [Wang and Vastola, 2023]. While effective, it is highly computationally expensive for high-resolution images, rendering it infeasible for histopathology. The development of Variational Autoencoders (VAEs) has allowed us to learn the representation of complex high-resolution images in a latent space. A vital by-product of this is the ability to compress high-resolution images to space and recover them lossless. The marriage of diffusion and VAEs allows us to carry out diffusion in the latent space of an autoencoder, enabling us to leverage the realistic generative capabilities of diffusion while maintaining reasonable computational requirements. Rombach et al. [2021b] and Yellapragada et al. [2023] build foundational models for this task, paving the way to generate realistic histopathology images. In this paper, we discuss the pitfalls of current methods, namely [Yellapragada et al., 2023] and resolve critical errors while proposing improvements along the way. Our methods achieve an FID score of 21.11, beating its SOTA counterparts in [Yellapragada et al., 2023] by 1.2 FID, while presenting a train-time GPU memory usage reduction of 7%.


Visual Concept Networks: A Graph-Based Approach to Detecting Anomalous Data in Deep Neural Networks

arXiv.org Artificial Intelligence

Deep neural networks (DNNs), while increasingly deployed in many applications, struggle with robustness against anomalous and out-of-distribution (OOD) data. Current OOD benchmarks often oversimplify, focusing on single-object tasks and not fully representing complex real-world anomalies. This paper introduces a new, straightforward method employing graph structures and topological features to effectively detect both far-OOD and near-OOD data. We convert images into networks of interconnected human understandable features or visual concepts. Through extensive testing on two novel tasks, including ablation studies with large vocabularies and diverse tasks, we demonstrate the method's effectiveness. This approach enhances DNN resilience to OOD data and promises improved performance in various applications.


Developmental Pretraining (DPT) for Image Classification Networks

arXiv.org Artificial Intelligence

The advent of Deep Learning (DL) has massively aided the Artificial Intelligence community, especially in the realm of object recognition. One of the critical reasons for the success of DL has been the availability of massive image datasets [1] and the computational power offered by modern Graphics Processing Units (GPUs) that are able to accommodate the large amounts of data required by Deep Networks. State-of-the-art image recognition networks like the ResNet family [2], VGG networks [3], EfficientNet models [4] and the recently introduced Vision Transformers [5] require extremely large amounts of data compared to their classical Machine Learning (ML) counterparts [6]. This characteristic requirement for large amounts of data becomes a problem in fields where data availability is low like in medical fields [7]. A common approach to this problem is Transfer Learning [8] which consists of pre-training a network on a large dataset like ImageNet [1] and fine tune the network on a smaller dataset that is relevant to the recognition problem at hand.


Synthpop++: A Hybrid Framework for Generating A Country-scale Synthetic Population

arXiv.org Artificial Intelligence

Population censuses are vital to public policy decision-making. They provide insight into human resource, demography, culture, and economic structure at local, regional, and national levels. However, such surveys are very expensive (especially for low and middle-income countries with high populations, such as India), time-consuming, and may also raise privacy concerns, depending upon the type of data collected. In light of these issues, we introduce SynthPop++, a novel hybrid framework, which can combine data from multiple real-world surveys (with different, partially overlapping sets of attributes) to produce a real-scale synthetic population of humans. Critically, our population maintains family structures comprising individuals with demographic, socioeconomic, health, and geolocation attributes: this means that our "fake" people live in realistic locations, have realistic families, etc. Such data can be used for a variety of purposes: we explore one such use case, Agent-based modelling of infectious disease in India. To gauge the quality of our synthetic population, we use machine learning and statistical metrics. Our experimental results show that synthetic population can realistically simulate the population for various administrative units of India, producing real-scale, detailed data at the desired level of zoom - from cities, to districts, to states, eventually combining to form a country-scale synthetic population. Financial institutions, government agencies, think tanks, etc. are using techniques like agent-based modelling(ABM) Bonabeau (2002) to simulate increasingly complex scenarios for decision-making.


S++: A Fast and Deployable Secure-Computation Framework for Privacy-Preserving Neural Network Training

arXiv.org Artificial Intelligence

We introduce S++, a simple, robust, and deployable framework for training a neural network (NN) using private data from multiple sources, using secret-shared secure function evaluation. In short, consider a virtual third party to whom every data-holder sends their inputs, and which computes the neural network: in our case, this virtual third party is actually a set of servers which individually learn nothing, even with a malicious (but non-colluding) adversary. Previous work in this area has been limited to just one specific activation function: ReLU, rendering the approach impractical for many use-cases. For the first time, we provide fast and verifiable protocols for all common activation functions and optimize them for running in a secret-shared manner. The ability to quickly, verifiably, and robustly compute exponentiation, softmax, sigmoid, etc., allows us to use previously written NNs without modification, vastly reducing developer effort and complexity of code. In recent times, ReLU has been found to converge much faster and be more computationally efficient as compared to non-linear functions like sigmoid or tanh. However, we argue that it would be remiss not to extend the mechanism to non-linear functions such as the logistic sigmoid, tanh, and softmax that are fundamental due to their ability to express outputs as probabilities and their universal approximation property. Their contribution in RNNs and a few recent advancements also makes them more relevant.