Anand, Saket
Learning and Evaluating Hierarchical Feature Representations
Sani, Depanshu, Anand, Saket
Hierarchy-aware representations ensure that the semantically closer classes are mapped closer in the feature space, thereby reducing the severity of mistakes while enabling consistent coarse-level class predictions. Towards this end, we propose a novel framework, Hierarchical Composition of Orthogonal Subspaces (Hier-COS), which learns to map deep feature embeddings into a vector space that is, by design, consistent with the structure of a given taxonomy tree. Our approach augments neural network backbones with a simple transformation module that maps learned discriminative features to subspaces defined using a fixed orthogonal frame. This construction naturally improves the severity of mistakes and promotes hierarchical consistency. Furthermore, we highlight the fundamental limitations of existing hierarchical evaluation metrics popularly used by the vision community and introduce a preference-based metric, Hierarchically Ordered Preference Score (HOPS), to overcome these limitations. We benchmark our method on multiple large and challenging datasets having deep label hierarchies (ranging from 3 - 12 levels) and compare with several baselines and SOTA. Through extensive experiments, we demonstrate that Hier-COS achieves state-of-the-art hierarchical performance across all the datasets while simultaneously beating top-1 accuracy in all but one case. We also demonstrate the performance of a Vision Transformer (ViT) backbone and show that learning a transformation module alone can map the learned features from a pre-trained ViT to Hier-COS and yield substantial performance benefits.
Graph-Based Multi-Modal Sensor Fusion for Autonomous Driving
Sani, Depanshu, Anand, Saket
The growing demand for robust scene understanding in mobile robotics and autonomous driving has highlighted the importance of integrating multiple sensing modalities. By combining data from diverse sensors like cameras and LIDARs, fusion techniques can overcome the limitations of individual sensors, enabling a more complete and accurate perception of the environment. We introduce a novel approach to multi-modal sensor fusion, focusing on developing a graph-based state representation that supports critical decision-making processes in autonomous driving. We present a Sensor-Agnostic Graph-Aware Kalman Filter [3], the first online state estimation technique designed to fuse multi-modal graphs derived from noisy multi-sensor data. The estimated graph-based state representations serve as a foundation for advanced applications like Multi-Object Tracking (MOT), offering a comprehensive framework for enhancing the situational awareness and safety of autonomous systems. We validate the effectiveness of our proposed framework through extensive experiments conducted on both synthetic and real-world driving datasets (nuScenes). Our results showcase an improvement in MOTA and a reduction in estimated position errors (MOTP) and identity switches (IDS) for tracked objects using the SAGA-KF. Furthermore, we highlight the capability of such a framework to develop methods that can leverage heterogeneous information (like semantic objects and geometric structures) from various sensing modalities, enabling a more holistic approach to scene understanding and enhancing the safety and effectiveness of autonomous systems.
Army of Thieves: Enhancing Black-Box Model Extraction via Ensemble based sample selection
Jindal, Akshit, Goyal, Vikram, Anand, Saket, Arora, Chetan
Machine Learning (ML) models become vulnerable to Model Stealing Attacks (MSA) when they are deployed as a service. In such attacks, the deployed model is queried repeatedly to build a labelled dataset. This dataset allows the attacker to train a thief model that mimics the original model. To maximize query efficiency, the attacker has to select the most informative subset of data points from the pool of available data. Existing attack strategies utilize approaches like Active Learning and Semi-Supervised learning to minimize costs. However, in the black-box setting, these approaches may select sub-optimal samples as they train only one thief model. Depending on the thief model's capacity and the data it was pretrained on, the model might even select noisy samples that harm the learning process. In this work, we explore the usage of an ensemble of deep learning models as our thief model. We call our attack Army of Thieves(AOT) as we train multiple models with varying complexities to leverage the crowd's wisdom. Based on the ensemble's collective decision, uncertain samples are selected for querying, while the most confident samples are directly included in the training data. Our approach is the first one to utilize an ensemble of thief models to perform model extraction. We outperform the base approaches of existing state-of-the-art methods by at least 3% and achieve a 21% higher adversarial sample transferability than previous work for models trained on the CIFAR-10 dataset.
Dissecting Deep Networks into an Ensemble of Generative Classifiers for Robust Predictions
Tiwari, Lokender, Madan, Anish, Anand, Saket, Banerjee, Subhashis
Deep Neural Networks (DNNs) are often criticized for being susceptible to adversarial attacks. Most successful defense strategies adopt adversarial training or random input transformations that typically require retraining or fine-tuning the model to achieve reasonable performance. In this work, our investigations of intermediate representations of a pre-trained DNN lead to an interesting discovery pointing to intrinsic robustness to adversarial attacks. We find that we can learn a generative classifier by statistically characterizing the neural response of an intermediate layer to clean training samples. The predictions of multiple such intermediate-layer based classifiers, when aggregated, show unexpected robustness to adversarial attacks. Specifically, we devise an ensemble of these generative classifiers that rank-aggregates their predictions via a Borda count-based consensus. Our proposed approach uses a subset of the clean training data and a pre-trained model, and yet is agnostic to network architectures or the adversarial attack generation method. We show extensive experiments to establish that our defense strategy achieves state-of-the-art performance on the ImageNet validation set.
ClusterNet : Semi-Supervised Clustering using Neural Networks
Shukla, Ankita, Cheema, Gullal Singh, Anand, Saket
Clustering using neural networks has recently demon- strated promising performance in machine learning and computer vision applications. However, the performance of current approaches is limited either by unsupervised learn- ing or their dependence on large set of labeled data sam- ples. In this paper, we propose ClusterNet that uses pair- wise semantic constraints from very few labeled data sam- ples (< 5% of total data) and exploits the abundant un- labeled data to drive the clustering approach. We define a new loss function that uses pairwise semantic similarity between objects combined with constrained k-means clus- tering to efficiently utilize both labeled and unlabeled data in the same framework. The proposed network uses con- volution autoencoder to learn a latent representation that groups data into k specified clusters, while also learning the cluster centers simultaneously. We evaluate and com- pare the performance of ClusterNet on several datasets and state of the art deep clustering approaches.