Goto

Collaborating Authors

 Das, Arnav


Deep Submodular Peripteral Networks

arXiv.org Artificial Intelligence

Submodular functions, crucial for various applications, often lack practical learning methods for their acquisition. Seemingly unrelated, learning a scaling from oracles offering graded pairwise preferences (GPC) is underexplored, despite a rich history in psychometrics. In this paper, we introduce deep submodular peripteral networks (DSPNs), a novel parametric family of submodular functions, and methods for their training using a contrastive-learning inspired GPC-ready strategy to connect and then tackle both of the above challenges. We introduce newly devised GPC-style "peripteral" loss which leverages numerically graded relationships between pairs of objects (sets in our case). Unlike traditional contrastive learning, our method utilizes graded comparisons, extracting more nuanced information than just binary-outcome comparisons, and contrasts sets of any size (not just two). We also define a novel suite of automatic sampling strategies for training, including active-learning inspired submodular feedback. We demonstrate DSPNs' efficacy in learning submodularity from a costly target submodular function showing superiority in downstream tasks such as experimental design and streaming applications.


Accelerating Batch Active Learning Using Continual Learning Techniques

arXiv.org Artificial Intelligence

A major problem with Active Learning (AL) is high training costs since models are typically retrained from scratch after every query round. We start by demonstrating that standard AL on neural networks with warm starting fails, both to accelerate training and to avoid catastrophic forgetting when using fine-tuning over AL query rounds. We then develop a new class of techniques, circumventing this problem, by biasing further training towards previously labeled sets. We accomplish this by employing existing, and developing novel, replay-based Continual Learning (CL) algorithms that are effective at quickly learning the new without forgetting the old, especially when data comes from an evolving distribution. We call this paradigm "Continual Active Learning" (CAL). We show CAL achieves significant speedups using a plethora of replay schemes that use model distillation and that select diverse/uncertain points from the history. We conduct experiments across many data domains, including natural language, vision, medical imaging, and computational biology, each with different neural architectures and dataset sizes. CAL consistently provides a 3x reduction in training time, while retaining performance and out-of-distribution robustness, showing its wide applicability.


High Resolution Point Clouds from mmWave Radar

arXiv.org Artificial Intelligence

This paper explores a machine learning approach for generating high resolution point clouds from a single-chip mmWave radar. Unlike lidar and vision-based systems, mmWave radar can operate in harsh environments and see through occlusions like smoke, fog, and dust. Unfortunately, current mmWave processing techniques offer poor spatial resolution compared to lidar point clouds. This paper presents RadarHD, an end-to-end neural network that constructs lidar-like point clouds from low resolution radar input. Enhancing radar images is challenging due to the presence of specular and spurious reflections. Radar data also doesn't map well to traditional image processing techniques due to the signal's sinc-like spreading pattern. We overcome these challenges by training RadarHD on a large volume of raw I/Q radar data paired with lidar point clouds across diverse indoor settings. Our experiments show the ability to generate rich point clouds even in scenes unobserved during training and in the presence of heavy smoke occlusion. Further, RadarHD's point clouds are high-quality enough to work with existing lidar odometry and mapping workflows.


Physics-inspired deep learning to characterize the signal manifold of quasi-circular, spinning, non-precessing binary black hole mergers

arXiv.org Artificial Intelligence

For instance, it is expected that inferring orbital eccentricity in GW observations [1-13] may provide the most conclusive evidence for the existence of compact binary systems in dense stellar environments [14-34]. For instance, in the case of binary black hole (BBH) mergers, it is assumed that the spin distribution of BBHs formed in dense stellar environments may be distributed isotropically, whereas BBHs formed through massive stellar evolution in isolation may have spin distributions that are aligned with the binary's orbital angular momentum [35, 36]. As the number of GW observations of BBH mergers continues to grow in years to come [37-39], it will be possible to infer the astrophysical properties of these sources and elucidate their formation history. The goal of this article is to explore how deep learning handles parameter space degeneracies in the signal manifold of quasi-circular, spinning, nonprecessing BBH mergers; and to quantify how accurately deep learning may constrain the individual spins, effective spin and mass-ratio of these GW sources in the absence of noise. We then go on to discuss the computational grand challenges that naturally arise when one tries to address these problems: (1) the parameter space that needs to be sampled is very large, requiring the use of TBsize waveform datasets, and thereby demanding the development of novel distributed algorithms to use many GPUs to fully train deep learning algorithms in a reasonable amount of time; and (2) the need to incorporate domain knowledge into the optimization of deep learning algorithms to accelerate their convergence and to ensure that their predictions are physically consistent. In connection to the first challenge mentioned above, we introduce distributed training algorithms that reduces the training stage of the neural network model used in this study from one month using a single V100 GPU to: (i) 12.4 hours using 64 V100 GPUs at the Hardware Accelerated Learning (HAL) cluster at the National Center for Supercomputing Applications (NCSA); and (ii) within 1.2 hours using 1536 V100 GPUs at the Summit supercomputer at Oak Ridge National Laboratory. These results establish a record in the number of GPUs used to train these types of physics-inspired deep learning models. Regarding the second challenge, we have found that naive methods to train deep learning architectures lead to rather sup-optimal results. However, we show that when we use physics-inspired optimization algorithms, which incorporate general relativistic constraints of the spin of BBHs, we are able to accurately recover the individual spins and mass-ratio of BBH signals across the mass-ratio under consideration.