Goto

Collaborating Authors

 grf model


Investigating Robotaxi Crash Severity with Geographical Random Forest and the Urban Environment

arXiv.org Artificial Intelligence

This paper quantitatively investigates the crash severity of Autonomous Vehicles (AVs) with spatially localized machine learning and macroscopic measures of the urban built environment. Extending beyond the microscopic effects of individual infrastructure elements, we focus on the city-scale land use and behavioral patterns, while addressing spatial heterogeneity and spatial autocorrelation. We implemented a spatially localized machine learning technique called Geographical Random Forest (GRF) on the California AV collision dataset. Analyzing multiple urban measures, including points of interest, building footprint, and land use, we built a GRF model and visualized it as a crash severity risk map of San Francisco. This paper presents three findings. First, spatially localized machine learning outperformed regular machine learning in predicting AV crash severity. The bias-variance tradeoff was evident as we adjusted the localization weight hyperparameter. Second, land use was the most important predictor, compared to intersections, building footprints, public transit stops, and Points Of Interest (POIs). Third, AV crashes were more likely to result in low-severity incidents in city center areas with greater diversity and commercial activities, than in residential neighborhoods. Residential land use is likely associated with higher severity due to human behavior and less restrictive environments. Counterintuitively, residential areas were associated with higher crash severity, compared to more complex areas such as commercial and mixed-use areas. When robotaxi operators train their AV systems, it is recommended to: (1) consider where their fleet operates and make localized algorithms for their perception system, and (2) design safety measures specific to residential neighborhoods, such as slower driving speeds and more alert sensors.


PyGRF: An improved Python Geographical Random Forest model and case studies in public health and natural disasters

arXiv.org Artificial Intelligence

Geographical random forest (GRF) is a recently developed and spatially explicit machine learning model. With the ability to provide more accurate predictions and local interpretations, GRF has already been used in many studies. The current GRF model, however, has limitations in its determination of the local model weight and bandwidth hyperparameters, potentially insufficient numbers of local training samples, and sometimes high local prediction errors. Also, implemented as an R package, GRF currently does not have a Python version which limits its adoption among machine learning practitioners who prefer Python. This work addresses these limitations by introducing theory-informed hyperparameter determination, local training sample expansion, and spatially-weighted local prediction. We also develop a Python-based GRF model and package, PyGRF, to facilitate the use of the model. We evaluate the performance of PyGRF on an example dataset and further demonstrate its use in two case studies in public health and natural disasters.


Submodularity in Batch Active Learning and Survey Problems on Gaussian Random Fields

arXiv.org Artificial Intelligence

Many real-world datasets can be represented in the form of a graph whose edge weights designate similarities between instances. A discrete Gaussian random field (GRF) model is a finite-dimensional Gaussian process (GP) whose prior covariance is the inverse of a graph Laplacian. Minimizing the trace of the predictive covariance Sigma (V-optimality) on GRFs has proven successful in batch active learning classification problems with budget constraints. However, its worst-case bound has been missing. We show that the V-optimality on GRFs as a function of the batch query set is submodular and hence its greedy selection algorithm guarantees an (1-1/e) approximation ratio. Moreover, GRF models have the absence-of-suppressor (AofS) condition. For active survey problems, we propose a similar survey criterion which minimizes 1'(Sigma)1. In practice, V-optimality criterion performs better than GPs with mutual information gain criteria and allows nonuniform costs for different nodes.