semantic observation
ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments with Low-Cost Sensors
Agrawal, Shivendra, Brawer, Jake, Naik, Ashutosh, Roncone, Alessandro, Hayes, Bradley
Many indoor workspaces are quasi-static: global layout is stable but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat vision-based localization. We present ShelfAware, a semantic particle filter for robust global localization that treats scene semantics as statistical evidence over object categories rather than fixed landmarks. ShelfAware fuses a depth likelihood with a category-centric semantic similarity and uses a precomputed bank of semantic viewpoints to perform inverse semantic proposals inside MCL, yielding fast, targeted hypothesis generation on low-cost, vision-only hardware. Across 100 global-localization trials spanning four conditions (cart-mounted, wearable, dynamic obstacles, and sparse semantics) in a semantically dense, retail environment, ShelfAware achieves a 96% success rate (vs. 22% MCL and 10% AMCL) with a mean time-to-convergence of 1.91s, attains the lowest translational RMSE in all conditions, and maintains stable tracking in 80% of tested sequences, all while running in real time on a consumer laptop-class platform. By modeling semantics distributionally at the category level and leveraging inverse proposals, ShelfAware resolves geometric aliasing and semantic drift common to quasi-static domains. Because the method requires only vision sensors and VIO, it integrates as an infrastructure-free building block for mobile robots in warehouses, labs, and retail settings; as a representative application, it also supports the creation of assistive devices providing start-anytime, shared-control assistive navigation for people with visual impairments.
Semantic-Aware Particle Filter for Reliable Vineyard Robot Localisation
de Silva, Rajitha, Cox, Jonathan, Heselden, James R., Popovic, Marija, Cadena, Cesar, Polvara, Riccardo
Abstract-- Accurate localisation is critical for mobile robots in structured outdoor environments, yet LiDAR-based methods often fail in vineyards due to repetitive row geometry and perceptual aliasing. We propose a semantic particle filter that incorporates stable object-level detections, specifically vine trunks and support poles into the likelihood estimation process. Detected landmarks are projected into a bird's eye view and fused with LiDAR scans to generate semantic observations. A key innovation is the use of semantic walls, which connect adjacent landmarks into pseudo-structural constraints that mitigate row aliasing. T o maintain global consistency in headland regions where semantics are sparse, we introduce a noisy GPS prior that adaptively supports the filter . Experiments in a real vineyard demonstrate that our approach maintains localisation within the correct row, recovers from deviations where AMCL fails, and outperforms vision-based SLAM methods such as RT AB-Map. Accurate localisation is a critical component of mobile robot navigation in outdoor environments [1].
Observation-Augmented Contextual Multi-Armed Bandits for Robotic Exploration with Uncertain Semantic Data
Wakayama, Shohei, Ahmed, Nisar
For robotic decision-making under uncertainty, the balance between exploitation and exploration of available options must be carefully taken into account. In this study, we introduce a new variant of contextual multi-armed bandits called observation-augmented CMABs (OA-CMABs) wherein a decision-making agent can utilize extra outcome observations from an external information source. CMABs model the expected option outcomes as a function of context features and hidden parameters, which are inferred from previous option outcomes. In OA-CMABs, external observations are also a function of context features and thus provide additional evidence about the hidden parameters. Yet, if an external information source is error-prone, the resulting posterior updates can harm decision-making performance unless the presence of errors is considered. To this end, we propose a robust Bayesian inference process for OA-CMABs that is based on the concept of probabilistic data validation. Our approach handles complex mixture model parameter priors and hybrid observation likelihoods for semantic data sources, allowing us to develop validation algorithms based on recently develop probabilistic semantic data association techniques. Furthermore, to more effectively cope with the combined sources of uncertainty in OA-CMABs, we derive a new active inference algorithm for option selection based on expected free energy minimization. This generalizes previous work on active inference for bandit-based robotic decision-making by accounting for faulty observations and non-Gaussian inference. Our approaches are demonstrated on a simulated asynchronous search site selection problem for space exploration. The results show that even if incorrect observations are provided by external information sources, efficient decision-making and robust parameter inference are still achieved in a wide variety of experimental conditions.
Probabilistic Semantic Data Association for Collaborative Human-Robot Sensing
Wakayama, Shohei, Ahmed, Nisar
Humans cannot always be treated as oracles for collaborative sensing. Robots thus need to maintain beliefs over unknown world states when receiving semantic data from humans, as well as account for possible discrepancies between human-provided data and these beliefs. To this end, this paper introduces the problem of semantic data association (SDA) in relation to conventional data association problems for sensor fusion. It then develops a novel probabilistic semantic data association (PSDA) algorithm to rigorously address SDA in general settings, unlike previous work on semantic data fusion which developed heuristic techniques for specific settings. PSDA is further incorporated into a recursive hybrid Bayesian data fusion scheme which uses Gaussian mixture priors for object states and softmax functions for semantic human sensor data likelihoods. Simulations of a multi-object search task show that PSDA enables robust collaborative state estimation under a wide range of conditions where semantic human sensor data can be erroneous or contain significant reference ambiguities.
Hybrid Belief Pruning with Guarantees for Viewpoint-Dependent Semantic SLAM
Lemberg, Tuvy, Indelman, Vadim
Semantic simultaneous localization and mapping is a subject of increasing interest in robotics and AI that directly influences the autonomous vehicles industry, the army industries, and more. One of the challenges in this field is to obtain object classification jointly with robot trajectory estimation. Considering view-dependent semantic measurements, there is a coupling between different classes, resulting in a combinatorial number of hypotheses. A common solution is to prune hypotheses that have a sufficiently low probability and to retain only a limited number of hypotheses. However, after pruning and renormalization, the updated probability is overconfident with respect to the original probability. This is especially problematic for systems that require high accuracy. If the prior probability of the classes is independent, the original normalization factor can be computed efficiently without pruning hypotheses. To the best of our knowledge, this is the first work to present these results. If the prior probability of the classes is dependent, we propose a lower bound on the normalization factor that ensures cautious results. The bound is calculated incrementally and with similar efficiency as in the independent case. After pruning and updating based on the bound, this belief is shown empirically to be close to the original belief.
Learning Navigation Costs from Demonstration with Semantic Observations
Wang, Tianyu, Dhiman, Vikas, Atanasov, Nikolay
This paper focuses on inverse reinforcement learning (IRL) for autonomous robot navigation using semantic observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert's observations and state-control trajectory. We develop a map encoder, which infers semantic class probabilities from the observation sequence, and a cost encoder, defined as deep neural network over the semantic features. Since the expert cost is not directly observable, the representation parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. The error is optimized using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. We show that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of cars, sidewalks and road lanes.
sZoom: A Framework for Automatic Zoom into High Resolution Surveillance Videos
Saini, Mukesh, Guthier, Benjamin, Kuang, Hao, Mahapatra, Dwarikanath, Saddik, Abdulmotaleb El
Current cameras are capable of recording high resolution video. While viewing on a mobile device, a user can manually zoom into this high resolution video to get more detailed view of objects and activities. However, manual zooming is not suitable for surveillance and monitoring. It is tiring to continuously keep zooming into various regions of the video. Also, while viewing one region, the operator may miss activities in other regions. In this paper, we propose sZoom, a framework to automatically zoom into a high resolution surveillance video. The proposed framework selectively zooms into the sensitive regions of the video to present details of the scene, while still preserving the overall context required for situation assessment. A multi-variate Gaussian penalty is introduced to ensure full coverage of the scene. The method achieves near real-time performance through a number of timing optimizations. An extensive user study shows that, while watching a full HD video on a mobile device, the system enhances the security operator's efficiency in understanding the details of the scene by 99% on the average compared to a scaled version of the original high resolution video. The produced video achieved 46% higher ratings for usefulness in a surveillance task.
Optimal Continuous State POMDP Planning with Semantic Observations: A Variational Approach
Burks, Luke, Loefgren, Ian, Ahmed, Nisar
This work develops novel strategies for optimal planning with semantic observations using continuous state Partially Observable Markov Decision Processes (CPOMDPs). Two major innovations are presented in relation to Gaussian mixture (GM) CPOMDP policy approximation methods. While existing methods have many theoretically nice properties, they are hampered by the inability to efficiently represent and reason over hybrid continuous-discrete probabilistic models. The first major innovation is the derivation of closed-form variational Bayes GM approximations of Point-Based Value Iteration Bellman policy backups, using softmax models of continuous-discrete semantic observation probabilities. A key benefit of this approach is that dynamic decision-making tasks can be performed with complex non-Gaussian uncertainties, while also exploiting continuous dynamic state space models (thus avoiding cumbersome and costly discretization). The second major innovation is a new clustering-based technique for mixture condensation that scales well to very large GM policy functions and belief functions. Simulation results for a target search and interception task with semantic observations show that the GM policies resulting from these innovations are more effective than those produced by other state of the art GM and Monte Carlo based policy approximations, but require significantly less modeling overhead and runtime cost. Additional results demonstrate the robustness of this approach to model errors.
Closed-loop Bayesian Semantic Data Fusion for Collaborative Human-Autonomy Target Search
Burks, Luke, Loefgren, Ian, Barbier, Luke, Muesing, Jeremy, McGinley, Jamison, Vunnam, Sousheel, Ahmed, Nisar
In search applications, autonomous unmanned vehicles must be able to efficiently reacquire and localize mobile targets that can remain out of view for long periods of time in large spaces. As such, all available information sources must be actively leveraged -- including imprecise but readily available semantic observations provided by humans. To achieve this, this work develops and validates a novel collaborative human-machine sensing solution for dynamic target search. Our approach uses continuous partially observable Markov decision process (CPOMDP) planning to generate vehicle trajectories that optimally exploit imperfect detection data from onboard sensors, as well as semantic natural language observations that can be specifically requested from human sensors. The key innovation is a scalable hierarchical Gaussian mixture model formulation for efficiently solving CPOMDPs with semantic observations in continuous dynamic state spaces. The approach is demonstrated and validated with a real human-robot team engaged in dynamic indoor target search and capture scenarios on a custom testbed.