Goto

Collaborating Authors

 Ganesan, Deepak


Aligned Vector Quantization for Edge-Cloud Collabrative Vision-Language Models

arXiv.org Artificial Intelligence

Vision Language Models (VLMs) are central to Visual Question Answering (VQA) systems and are typically deployed in the cloud due to their high computational demands. However, this cloud-only approach underutilizes edge computational resources and requires significant bandwidth for transmitting raw images. In this paper, we introduce an edge-cloud collaborative VQA system, called LLaVA-AlignedVQ, which features a novel Aligned Vector Quantization algorithm (AlignedVQ) that efficiently compress intermediate features without compromising accuracy to support partitioned execution. Our experiments demonstrate that LLaVA-AlignedVQ achieves approximately 1365x compression rate of intermediate features, reducing data transmission overhead by 96.8% compared to transmitting JPEG90-compressed images to the cloud. LLaVA-AlignedVQ achieves an inference speedup of 2-15x while maintaining high accuracy, remaining within -2.23% to +1.6% of the original model's accuracy performance across eight VQA datasets, compared to the cloud-only solution.


GDTM: An Indoor Geospatial Tracking Dataset with Distributed Multimodal Sensors

arXiv.org Artificial Intelligence

Constantly locating moving objects, i.e., geospatial tracking, is essential for autonomous building infrastructure. Accurate and robust geospatial tracking often leverages multimodal sensor fusion algorithms, which require large datasets with time-aligned, synchronized data from various sensor types. However, such datasets are not readily available. Hence, we propose GDTM, a nine-hour dataset for multimodal object tracking with distributed multimodal sensors and reconfigurable sensor node placements. Our dataset enables the exploration of several research problems, such as optimizing architectures for processing multimodal data, and investigating models' robustness to adverse sensing conditions and sensor placement variances. A GitHub repository containing the code, sample data, and checkpoints of this work is available at https://github.com/nesl/GDTM.


Efficient IoT Inference via Context-Awareness

arXiv.org Artificial Intelligence

While existing strategies to execute deep learning-based classification on low-power platforms assume the models are trained on all classes of interest, this paper posits that adopting context-awareness i.e. narrowing down a classification task to the current deployment context consisting of only recent inference queries can substantially enhance performance in resource-constrained environments. We propose a new paradigm, CACTUS, for scalable and efficient context-aware classification where a micro-classifier recognizes a small set of classes relevant to the current context and, when context change happens (e.g., a new class comes into the scene), rapidly switches to another suitable micro-classifier. CACTUS features several innovations, including optimizing the training cost of context-aware classifiers, enabling on-the-fly context-aware switching between classifiers, and balancing context switching costs and performance gains via simple yet effective switching policies. We show that CACTUS achieves significant benefits in accuracy, latency, and compute budget across a range of datasets and IoT platforms.


Heteroskedastic Geospatial Tracking with Distributed Camera Networks

arXiv.org Artificial Intelligence

Visual object tracking has seen significant progress in recent years. However, the vast majority of this work focuses on tracking objects within the image plane of a single camera and ignores the uncertainty associated with predicted object locations. In this work, we focus on the geospatial object tracking problem using data from a distributed camera network. The goal is to predict an object's track in geospatial coordinates along with uncertainty over the object's location while respecting communication constraints that prohibit centralizing raw image data. We present a novel single-object geospatial tracking data set that includes high-accuracy ground truth object locations and video data from a network of four cameras. We present a modeling framework for addressing this task including a novel backbone model and explore how uncertainty calibration and fine-tuning through a differentiable tracker affect performance.


Eulerian Phase-based Motion Magnification for High-Fidelity Vital Sign Estimation with Radar in Clinical Settings

arXiv.org Artificial Intelligence

Efficient and accurate detection of subtle motion generated from small objects in noisy environments, as needed for vital sign monitoring, is challenging, but can be substantially improved with magnification. We developed a complex Gabor filter-based decomposition method to amplify phases at different spatial wavelength levels to magnify motion and extract 1D motion signals for fundamental frequency estimation. The phase-based complex Gabor filter outputs are processed and then used to train machine learning models that predict respiration and heart rate with greater accuracy. We show that our proposed technique performs better than the conventional temporal FFT-based method in clinical settings, such as sleep laboratories and emergency departments, as well for a variety of human postures.