trivedi
Learnable Conformal Prediction with Context-Aware Nonconformity Functions for Robotic Planning and Perception
Kumar, Divake, Tayebati, Sina, Migliarba, Francesco, Krishnan, Ranganath, Trivedi, Amit Ranjan
Deep learning models in robotics often output point estimates with poorly calibrated confidences, offering no native mechanism to quantify predictive reliability under novel, noisy, or out-of-distribution inputs. Conformal prediction (CP) addresses this gap by providing distribution-free coverage guarantees, yet its reliance on fixed nonconformity scores ignores context and can yield intervals that are overly conservative or unsafe. We address this with Learnable Conformal Prediction (LCP), which replaces fixed scores with a lightweight neural function that leverages geometric, semantic, and task-specific features to produce context-aware uncertainty sets. LCP maintains CP's theoretical guarantees while reducing prediction set sizes by 18% in classification, tightening detection intervals by 52%, and improving path planning safety from 72% to 91% success with minimal overhead. Across three robotic tasks on seven benchmarks, LCP consistently outperforms Standard CP and ensemble baselines. In classification on CIFAR-100 and ImageNet, it achieves smaller set sizes (4.7-9.9% reduction) at target coverage. For object detection on COCO, BDD100K, and Cityscapes, it produces 46-54% tighter bounding boxes. In path planning through cluttered environments, it improves success to 91.5% with only 4.5% path inflation, compared to 12.2% for Standard CP. The method is lightweight (approximately 4.8% runtime overhead, 42 KB memory) and supports online adaptation, making it well suited to resource-constrained autonomous systems. Hardware evaluation shows LCP adds less than 1% memory and 15.9% inference overhead, yet sustains 39 FPS on detection tasks while being 7.4 times more energy-efficient than ensembles.
- Europe (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)
Intelligent Sensing-to-Action for Robust Autonomy at the Edge: Opportunities and Challenges
Trivedi, Amit Ranjan, Tayebati, Sina, Kumawat, Hemant, Darabi, Nastaran, Kumar, Divake, Kosta, Adarsh Kumar, Venkatesha, Yeshwanth, Jayasuriya, Dinithi, Jayasinghe, Nethmi, Panda, Priyadarshini, Mukhopadhyay, Saibal, Roy, Kaushik
Autonomous edge computing in robotics, smart cities, and autonomous vehicles relies on the seamless integration of sensing, processing, and actuation for real-time decision-making in dynamic environments. At its core is the sensing-to-action loop, which iteratively aligns sensor inputs with computational models to drive adaptive control strategies. These loops can adapt to hyper-local conditions, enhancing resource efficiency and responsiveness, but also face challenges such as resource constraints, synchronization delays in multi-modal data fusion, and the risk of cascading errors in feedback loops. This article explores how proactive, context-aware sensing-to-action and action-to-sensing adaptations can enhance efficiency by dynamically adjusting sensing and computation based on task demands, such as sensing a very limited part of the environment and predicting the rest. By guiding sensing through control actions, action-to-sensing pathways can improve task relevance and resource use, but they also require robust monitoring to prevent cascading errors and maintain reliability. Multi-agent sensing-action loops further extend these capabilities through coordinated sensing and actions across distributed agents, optimizing resource use via collaboration. Additionally, neuromorphic computing, inspired by biological systems, provides an efficient framework for spike-based, event-driven processing that conserves energy, reduces latency, and supports hierarchical control--making it ideal for multi-agent optimization. This article highlights the importance of end-to-end co-design strategies that align algorithmic models with hardware and environmental dynamics and improve cross-layer interdependencies to improve throughput, precision, and adaptability for energy-efficient edge autonomy in complex environments.
- Europe (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Information Technology (0.93)
- Energy (0.68)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Architecture > Real Time Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Neural Precision Polarization: Simplifying Neural Network Inference with Dual-Level Precision
Jayasuriya, Dinithi, Darabi, Nastaran, Hashem, Maeesha Binte, Trivedi, Amit Ranjan
We introduce a precision polarization scheme for DNN inference that utilizes only very low and very high precision levels, assigning low precision to the majority of network weights and activations while reserving high precision paths for targeted error compensation. This separation allows for distinct optimization of each precision level, thereby reducing memory and computation demands without compromising model accuracy. In the discussed approach, a floating-point model can be trained in the cloud and then downloaded to an edge device, where network weights and activations are directly quantized to meet the edge devices' desired level, such as NF4 or INT8. To address accuracy loss from quantization, surrogate paths are introduced, leveraging low-rank approximations on a layer-by-layer basis. These paths are trained with a sensitivity-based metric on minimal training data to recover accuracy loss under quantization as well as due to process variability, such as when the main prediction path is implemented using analog acceleration. Our simulation results show that neural precision polarization enables approximately 464 TOPS per Watt MAC efficiency and reliability by integrating rank-8 error recovery paths with highly efficient, though potentially unreliable, bit plane-wise compute-in-memory processing.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Colorado (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
Language-Driven Active Learning for Diverse Open-Set 3D Object Detection
Greer, Ross, Antoniussen, Bjørk, Møgelmose, Andreas, Trivedi, Mohan
Object detection is crucial for ensuring safe autonomous driving. However, data-driven approaches face challenges when encountering minority or novel objects in the 3D driving scene. In this paper, we propose VisLED, a language-driven active learning framework for diverse open-set 3D Object Detection. Our method leverages active learning techniques to query diverse and informative data samples from an unlabeled pool, enhancing the model's ability to detect underrepresented or novel objects. Specifically, we introduce the Vision-Language Embedding Diversity Querying (VisLED-Querying) algorithm, which operates in both open-world exploring and closed-world mining settings. In open-world exploring, VisLED-Querying selects data points most novel relative to existing data, while in closed-world mining, it mines novel instances of known classes. We evaluate our approach on the nuScenes dataset and demonstrate its efficiency compared to random sampling and entropy-querying methods. Our results show that VisLED-Querying consistently outperforms random sampling and offers competitive performance compared to entropy-querying despite the latter's model-optimality, highlighting the potential of VisLED for improving object detection in autonomous driving scenarios. We make our code publicly available at https://github.com/Bjork-crypto/VisLED-Querying
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Denmark > North Jutland > Aalborg (0.04)
Perception Without Vision for Trajectory Prediction: Ego Vehicle Dynamics as Scene Representation for Efficient Active Learning in Autonomous Driving
This study investigates the use of trajectory and dynamic state information for efficient data curation in autonomous driving machine learning tasks. We propose methods for clustering trajectory-states and sampling strategies in an active learning framework, aiming to reduce annotation and data costs while maintaining model performance. Our approach leverages trajectory information to guide data selection, promoting diversity in the training data. We demonstrate the effectiveness of our methods on the trajectory prediction task using the nuScenes dataset, showing consistent performance gains over random sampling across different data pool sizes, and even reaching sub-baseline displacement errors at just 50% of the data cost. Our results suggest that sampling typical data initially helps overcome the ''cold start problem,'' while introducing novelty becomes more beneficial as the training pool size increases. By integrating trajectory-state-informed active learning, we demonstrate that more efficient and robust autonomous driving systems are possible and practical using low-cost data curation strategies.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)
Driver Activity Classification Using Generalizable Representations from Vision-Language Models
Greer, Ross, Andersen, Mathias Viborg, Møgelmose, Andreas, Trivedi, Mohan
Driver activity classification is crucial for ensuring road safety, with applications ranging from driver assistance systems to autonomous vehicle control transitions. In this paper, we present a novel approach leveraging generalizable representations from vision-language models for driver activity classification. Our method employs a Semantic Representation Late Fusion Neural Network (SRLF-Net) to process synchronized video frames from multiple perspectives. Each frame is encoded using a pretrained vision-language encoder, and the resulting embeddings are fused to generate class probability predictions. By leveraging contrastively-learned vision-language representations, our approach achieves robust performance across diverse driver activities. We evaluate our method on the Naturalistic Driving Action Recognition Dataset, demonstrating strong accuracy across many classes. Our results suggest that vision-language representations offer a promising avenue for driver monitoring systems, providing both accuracy and interpretability through natural language descriptors.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Denmark > North Jutland > Aalborg (0.04)
- Europe > Denmark > Central Jutland > Viborg (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.49)
Learning to Find Missing Video Frames with Synthetic Data Augmentation: A General Framework and Application in Generating Thermal Images Using RGB Cameras
Andersen, Mathias Viborg, Greer, Ross, Møgelmose, Andreas, Trivedi, Mohan
Advanced Driver Assistance Systems (ADAS) in intelligent vehicles rely on accurate driver perception within the vehicle cabin, often leveraging a combination of sensing modalities. However, these modalities operate at varying rates, posing challenges for real-time, comprehensive driver state monitoring. This paper addresses the issue of missing data due to sensor frame rate mismatches, introducing a generative model approach to create synthetic yet realistic thermal imagery. We propose using conditional generative adversarial networks (cGANs), specifically comparing the pix2pix and CycleGAN architectures. Experimental results demonstrate that pix2pix outperforms CycleGAN, and utilizing multi-view input styles, especially stacked views, enhances the accuracy of thermal image generation. Moreover, the study evaluates the model's generalizability across different subjects, revealing the importance of individualized training for optimal performance. The findings suggest the potential of generative models in addressing missing frames, advancing driver state monitoring for intelligent vehicles, and underscoring the need for continued research in model generalization and customization.
- North America > United States > California > San Diego County > San Diego (0.05)
- Europe > Denmark > North Jutland > Aalborg (0.04)
- Europe > Denmark > Central Jutland > Viborg (0.04)
- Health & Medicine (0.68)
- Automobiles & Trucks (0.68)
- Transportation (0.49)
Towards Explainable, Safe Autonomous Driving with Language Embeddings for Novelty Identification and Active Learning: Framework and Experimental Analysis with Real-World Data Sets
This research explores the integration of language embeddings for active learning in autonomous driving datasets, with a focus on novelty detection. Novelty arises from unexpected scenarios that autonomous vehicles struggle to navigate, necessitating higher-level reasoning abilities. Our proposed method employs language-based representations to identify novel scenes, emphasizing the dual purpose of safety takeover responses and active learning. The research presents a clustering experiment using Contrastive Language-Image Pretrained (CLIP) embeddings to organize datasets and detect novelties. We find that the proposed algorithm effectively isolates novel scenes from a collection of subsets derived from two real-world driving datasets, one vehicle-mounted and one infrastructure-mounted. From the generated clusters, we further present methods for generating textual explanations of elements which differentiate scenes classified as novel from other scenes in the data pool, presenting qualitative examples from the clustered results. Our results demonstrate the effectiveness of language-driven embeddings in identifying novel elements and generating explanations of data, and we further discuss potential applications in safe takeovers, data curation, and multi-task active learning.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- Europe > Germany (0.04)
Navigating the Unknown: Uncertainty-Aware Compute-in-Memory Autonomy of Edge Robotics
Darabi, Nastaran, Shukla, Priyesh, Jayasuriya, Dinithi, Kumar, Divake, Stutts, Alex C., Trivedi, Amit Ranjan
This paper addresses the challenging problem of energy-efficient and uncertainty-aware pose estimation in insect-scale drones, which is crucial for tasks such as surveillance in constricted spaces and for enabling non-intrusive spatial intelligence in smart homes. Since tiny drones operate in highly dynamic environments, where factors like lighting and human movement impact their predictive accuracy, it is crucial to deploy uncertainty-aware prediction algorithms that can account for environmental variations and express not only the prediction but also confidence in the prediction. We address both of these challenges with Compute-in-Memory (CIM) which has become a pivotal technology for deep learning acceleration at the edge. While traditional CIM techniques are promising for energy-efficient deep learning, to bring in the robustness of uncertainty-aware predictions at the edge, we introduce a suite of novel techniques: First, we discuss CIM-based acceleration of Bayesian filtering methods uniquely by leveraging the Gaussian-like switching current of CMOS inverters along with co-design of kernel functions to operate with extreme parallelism and with extreme energy efficiency. Secondly, we discuss the CIM-based acceleration of variational inference of deep learning models through probabilistic processing while unfolding iterative computations of the method with a compute reuse strategy to significantly minimize the workload. Overall, our co-design methodologies demonstrate the potential of CIM to improve the processing efficiency of uncertainty-aware algorithms by orders of magnitude, thereby enabling edge robotics to access the robustness of sophisticated prediction frameworks within their extremely stringent area/power resources.
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
The Why, When, and How to Use Active Learning in Large-Data-Driven 3D Object Detection for Safe Autonomous Driving: An Empirical Exploration
Greer, Ross, Antoniussen, Bjørk, Andersen, Mathias V., Møgelmose, Andreas, Trivedi, Mohan M.
Abstract--Active learning strategies for 3D object detection in autonomous driving datasets may help to address challenges of data imbalance, redundancy, and high-dimensional data. We demonstrate the effectiveness of entropy querying to select informative samples, aiming to reduce annotation costs and improve model performance. We experiment using the BEVFusion model for 3D object detection on the nuScenes dataset, comparing active learning to random sampling and demonstrating that entropy querying outperforms in most cases. Many autonomous driving tasks rely on supervised learning, and task performance under such methods is heavily dependent on accurate, high-volume data annotation. The conventional A. Redundancy and Data Imbalance approach for most autonomous driving tasks, such as 3D object As a motivating example, consider a fleet which seeks to detection [1, 2, 3, 4, 5], is to ask humans to label (or supervise gather data in a particular region. By the nature of our roadway the labeling of) all data collected in driving, then train learning system, over time, vehicles will likely encounter the same roads machines using the labeled data.
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > United States > California > San Diego County > La Jolla (0.04)
- Europe > Germany (0.04)
- Europe > Denmark > North Jutland > Aalborg (0.04)
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (1.00)