Spatial Language Likelihood Grounding Network for Bayesian Fusion of Human-Robot Observations
Sitdhipol, Supawich, Sukprasongdee, Waritwong, Chuangsuwanich, Ekapol, Tse, Rina
–arXiv.org Artificial Intelligence
Fusing information from human observations can help robots overcome sensing limitations in collaborative tasks. However, an uncertainty-aware fusion framework requires a grounded likelihood representing the uncertainty of human inputs. This paper presents a Feature Pyramid Likelihood Grounding Network (FP-LGN) that grounds spatial language by learning relevant map image features and their relationships with spatial relation semantics. The model is trained as a probability estimator to capture aleatoric uncertainty in human language using three-stage curriculum learning. Results showed that FP-LGN matched expert-designed rules in mean Negative Log-Likelihood (NLL) and demonstrated greater robustness with lower standard deviation. Collaborative sensing results demonstrated that the grounded likelihood successfully enabled uncertainty-aware fusion of heterogeneous human language observations and robot sensor measurements, achieving significant improvements in human-robot collaborative task performance.
arXiv.org Artificial Intelligence
Jul-31-2025
- Country:
- Asia
- North America > United States
- District of Columbia > Washington (0.04)
- New Mexico > San Juan County (0.04)
- Oceania > Australia (0.04)
- Genre:
- Research Report > New Finding (1.00)
- Industry: