Goto

Collaborating Authors

 Abrol, Vinayak


On Characterizing the Evolution of Embedding Space of Neural Networks using Algebraic Topology

arXiv.org Artificial Intelligence

We study how the topology of feature embedding space changes as it passes through the layers of a well-trained deep neural network (DNN) through Betti numbers. Motivated by existing studies using simplicial complexes on shallow fully connected networks (FCN), we present an extended analysis using Cubical homology instead, with a variety of popular deep architectures and real image datasets. We demonstrate that as depth increases, a topologically complicated dataset is transformed into a simple one, resulting in Betti numbers attaining their lowest possible value. The rate of decay in topological complexity (as a metric) helps quantify the impact of architectural choices on the generalization ability. Interestingly from a representation learning perspective, we highlight several invariances such as topological invariance of (1) an architecture on similar datasets; (2) embedding space of a dataset for architectures of variable depth; (3) embedding space to input resolution/size, and (4) data sub-sampling. In order to further demonstrate the link between expressivity \& the generalization capability of a network, we consider the task of ranking pre-trained models for downstream classification task (transfer learning). Compared to existing approaches, the proposed metric has a better correlation to the actually achievable accuracy via fine-tuning the pre-trained model.


Data Encoding For Healthcare Data Democratisation and Information Leakage Prevention

arXiv.org Artificial Intelligence

In recent years, deep learning has demonstrated remarkable success in a wide variety of fields [1], and it is expected to have a significant impact on healthcare as well [2]. Many attempts have been made to achieve this breakthrough in healthcare informatics, which often deals with noisy, heterogeneous, and non-standardized electronic health records (EHRs) [3]. However, most clinical deep learning tools are either not robust enough or have not been tested in real-world scenarios [4, 5]. Deep learning solutions, approved by regulatory bodies, are less common in healthcare informatics, which shows that deep learning hasn't had the same level of success as in other fields such as speech and image processing [6]. Along with well-known explainability challenges in deep learning models [7], the lack of data democratization [8] and latent information leakage (information leakage from trained models) [9, 10] can also be regarded as a major hindrance in the development and acceptance of robust clinical deep learning solutions. In the current context, data democratization and information leakage can be described as: Data democratization: It involves making digital healthcare data available to a wider cohort of the AI researchers.