Goto

Collaborating Authors

 geoshapley


Improving the Computational Efficiency and Explainability of GeoAggregator

Deng, Rui, Li, Ziqi, Wang, Mingshu

arXiv.org Artificial Intelligence

Accurate modeling and explaining geospatial tabular data (GTD) are critical for understanding geospatial phenomena and their underlying processes. Recent work has proposed a novel transformer-based deep learning model named GeoAggregator (GA) for this purpose, and has demonstrated that it outperforms other statistical and machine learning approaches. In this short paper, we further improve GA by 1) developing an optimized pipeline that accelerates the dataloading process and streamlines the forward pass of GA to achieve better computational efficiency; and 2) incorporating a model ensembling strategy and a post-hoc model explanation function based on the GeoShapley framework to enhance model explainability. We validate the functionality and efficiency of the proposed strategies by applying the improved GA model to synthetic datasets. Experimental results show that our implementation improves the prediction accuracy and inference speed of GA compared to the original implementation. Moreover, explanation experiments indicate that GA can effectively captures the inherent spatial effects in the designed synthetic dataset. The complete pipeline has been made publicly available for community use (https://github.com/ruid7181/GA-sklearn).


Explainable AI in Spatial Analysis

Li, Ziqi

arXiv.org Artificial Intelligence

A key objective in spatial analysis is to model spatial relationships and infer spatial processes to generate knowledge from spatial data, which has been largely based on spatial statistical methods. More recently, machine learning offers scalable and flexible approach es that complement traditional methods and has been increasingly applied in spatial data science . Despite its advantages, machine learning is often criticized for being a black box, which limits our understanding of model behavior and output . Recognizing this limitation, XAI has emerged as a pivotal field in AI that provides methods to explain the output of machine learning models to enhance transparency and understanding. These methods are crucial for model diagnosis, bias detection, and ensuring the reliability of results obtained from machine learning models. This chapter introduces key concepts and methods in XAI with a focus on Shapley value - based approach es, which is arguably the most popular XAI method, and their integration with spatial analysis. An empirical example of county - level voting behaviors in the 2020 Presidential election is presented to demonstrate the use of Shapley values and spatial analysis with a comparison to multi - scale geograp hically weighted regression . The chapter concludes with a discussion on the challenges and limitations of current XAI techniques and proposes new directions .


Can Moran Eigenvectors Improve Machine Learning of Spatial Data? Insights from Synthetic Data Validation

Li, Ziqi, Peng, Zhan

arXiv.org Machine Learning

Moran Eigenvector Spatial Filtering (ESF) approaches have shown promise in accounting for spatial effects in statistical models. Can this extend to machine learning? This paper examines the effectiveness of using Moran Eigenvectors as additional spatial features in machine learning models. We generate synthetic datasets with known processes involving spatially varying and nonlinear effects across two different geometries. Moran Eigenvectors calculated from different spatial weights matrices, with and without a priori eigenvector selection, are tested. We assess the performance of popular machine learning models, including Random Forests, LightGBM, XGBoost, and TabNet, and benchmark their accuracies in terms of cross-validated R2 values against models that use only coordinates as features. We also extract coefficients and functions from the models using GeoShapley and compare them with the true processes. Results show that machine learning models using only location coordinates achieve better accuracies than eigenvector-based approaches across various experiments and datasets. Furthermore, we discuss that while these findings are relevant for spatial processes that exhibit positive spatial autocorrelation, they do not necessarily apply when modeling network autocorrelation and cases with negative spatial autocorrelation, where Moran Eigenvectors would still be useful.


An Ensemble Framework for Explainable Geospatial Machine Learning Models

Liu, Lingbo

arXiv.org Artificial Intelligence

The relationships between things can vary significantly across different spatial or geographical contexts, a phenomenon that manifests in various spatial events such as the disparate impacts of pandemics[1], the dynamics of poverty distribution[2], fluctuations in housing prices[3], etc. By optimizing spatial analysis methods, we can enhance the accuracy of predictions, improve the interpretability of models, and make more effective spatial decisions or interventions[4]. Nonetheless, the inherent complexity of spatial data and the potential for nonlinear relationships pose challenges to enhancing interpretability through traditional spatial analysis techniques.[5]. In terms of models for analyzing spatial varying effects such as spatial filtering models[6-8] and spatial Bayes models [9], Geographically Weighted Regression (GWR) and Multiscale Geographically Weighted Regression (MGWR) stand out for their application of local spatial weighting schemes, which are instrumental in capturing spatial features more accurately[10, 11]. These linear regression-based approaches, however, encounter significant hurdles in decoding complex spatial phenomena (Figure 1). Various Geographically Weighted (GW) models have been developed to tackle issues such as multicollinearity [12, 13] and to extend the utility of GW models to classification tasks[14-17]. The evolution of artificial intelligence (AI) methodologies, including Artificial Neural Networks (ANN) [18], Graph Neural Networks (GNN) [19, 20], and Convolution Neural Networks (CNN) [21], has introduced novel ways to mitigate uncertainties around spatial proximity and weighting kernels in GW models. Despite these advancements in marrying geospatial models with AI, challenges remain in addressing nonlinear correlations and deciphering underlying spatial mechanisms.


GeoShapley: A Game Theory Approach to Measuring Spatial Effects in Machine Learning Models

Li, Ziqi

arXiv.org Machine Learning

This paper introduces GeoShapley, a game theory approach to measuring spatial effects in machine learning models. GeoShapley extends the Nobel Prize-winning Shapley value framework in game theory by conceptualizing location as a player in a model prediction game, which enables the quantification of the importance of location and the synergies between location and other features in a model. GeoShapley is a model-agnostic approach and can be applied to statistical or black-box machine learning models in various structures. The interpretation of GeoShapley is directly linked with spatially varying coefficient models for explaining spatial effects and additive models for explaining non-spatial effects. Using simulated data, GeoShapley values are validated against known data-generating processes and are used for cross-comparison of seven statistical and machine learning models. An empirical example of house price modeling is used to illustrate GeoShapley's utility and interpretation with real world data. The method is available as an open-source Python package named geoshapley.