Goto

Collaborating Authors

 point pattern


GeoBS: Information-Theoretic Quantification of Geographic Bias in AI Models

Wang, Zhangyu, Wu, Nemin, Cao, Qian, Xia, Jiangnan, Liu, Zeping, Xie, Yiqun, Nambi, Akshay, Ganu, Tanuja, Lao, Ni, Liu, Ninghao, Mai, Gengchen

arXiv.org Artificial Intelligence

The widespread adoption of AI models, especially foundation models (FMs), has made a profound impact on numerous domains. However, it also raises significant ethical concerns, including bias issues. Although numerous efforts have been made to quantify and mitigate social bias in AI models, geographic bias (in short, geo-bias) receives much less attention, which presents unique challenges. While previous work has explored ways to quantify geo-bias, these measures are model-specific (e.g., mean absolute deviation of LLM ratings) or spatially implicit (e.g., average fairness scores of all spatial partitions). We lack a model-agnostic, universally applicable, and spatially explicit geo-bias evaluation framework that allows researchers to fairly compare the geo-bias of different AI models and to understand what spatial factors contribute to the geo-bias. In this paper, we establish an information-theoretic framework for geo-bias evaluation, called GeoBS (Geo-Bias Scores). We demonstrate the generalizability of the proposed framework by showing how to interpret and analyze existing geo-bias measures under this framework. Then, we propose three novel geo-bias scores that explicitly take intricate spatial factors (multi-scalability, distance decay, and anisotropy) into consideration. Finally, we conduct extensive experiments on 3 tasks, 8 datasets, and 8 models to demonstrate that both task-specific GeoAI models and general-purpose foundation models may suffer from various types of geo-bias. This framework will not only advance the technical understanding of geographic bias but will also establish a foundation for integrating spatial fairness into the design, deployment, and evaluation of AI systems.


The Mean of Multi-Object Trajectories

Nguyen, Tran Thien Dat, Vo, Ba Tuong, Vo, Ba-Ngu, Van Nguyen, Hoa, Shim, Changbeom

arXiv.org Artificial Intelligence

This paper introduces the concept of a mean for trajectories and multi-object trajectories (defined as sets or multi-sets of trajectories) along with algorithms for computing them. Specifically, we use the Fréchet mean, and metrics based on the optimal sub-pattern assignment (OSPA) construct, to extend the notion of average from vectors to trajectories and multi-object trajectories. Further, we develop efficient algorithms to compute these means using greedy search and Gibbs sampling. Using distributed multi-object tracking as an application, we demonstrate that the Fréchet mean approach to multi-object trajectory consensus significantly outperforms state-of-the-art distributed multi-object tracking methods.


Transformer-Based Spatial-Temporal Counterfactual Outcomes Estimation

Li, He, Chi, Haoang, Liu, Mingyu, Huang, Wanrong, Xu, Liyang, Yang, Wenjing

arXiv.org Artificial Intelligence

The real world naturally has dimensions of time and space. Therefore, estimating the counterfactual outcomes with spatial-temporal attributes is a crucial problem. However, previous methods are based on classical statistical models, which still have limitations in performance and generalization. This paper proposes a novel framework for estimating counterfactual outcomes with spatial-temporal attributes using the Transformer, exhibiting stronger estimation ability. Under mild assumptions, the proposed estimator within this framework is consistent and asymptotically normal. To validate the effectiveness of our approach, we conduct simulation experiments and real data experiments. Simulation experiments show that our estimator has a stronger estimation capability than baseline methods. Real data experiments provide a valuable conclusion to the causal effect of conflicts on forest loss in Colombia. The source code is available at https://github.com/lihe-maxsize/DeppSTCI_Release_Version-master.


Differentially private synthesis of Spatial Point Processes

Kim, Dangchan, Lim, Chae Young

arXiv.org Machine Learning

This paper proposes a method to generate synthetic data for spatial point patterns within the differential privacy (DP) framework. Specifically, we define a differentially private Poisson point synthesizer (PPS) and Cox point synthesizer (CPS) to generate synthetic point patterns with the concept of the $\alpha$-neighborhood that relaxes the original definition of DP. We present three example models to construct a differentially private PPS and CPS, providing sufficient conditions on their parameters to ensure the DP given a specified privacy budget. In addition, we demonstrate that the synthesizers can be applied to point patterns on the linear network. Simulation experiments demonstrate that the proposed approaches effectively maintain the privacy and utility of synthetic data.


Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery

Cui, Kangning, Tang, Wei, Zhu, Rongkun, Wang, Manqi, Larsen, Gregory D., Pauca, Victor P., Alqahtani, Sarra, Yang, Fan, Segurado, David, Fine, Paul, Karubian, Jordan, Chan, Raymond H., Plemmons, Robert J., Morel, Jean-Michel, Silman, Miles R.

arXiv.org Artificial Intelligence

Understanding the spatial distribution of palms within tropical forests is essential for effective ecological monitoring, conservation strategies, and the sustainable integration of natural forest products into local and global supply chains. However, the analysis of remotely sensed data in these environments faces significant challenges, such as overlapping palm and tree crowns, uneven shading across the canopy surface, and the heterogeneous nature of the forest landscapes, which often affect the performance of palm detection and segmentation algorithms. To overcome these issues, we introduce PalmDSNet, a deep learning framework for real-time detection, segmentation, and counting of canopy palms. Additionally, we employ a bimodal reproduction algorithm that simulates palm spatial propagation to further enhance the understanding of these point patterns using PalmDSNet's results. We used UAV-captured imagery to create orthomosaics from 21 sites across western Ecuadorian tropical forests, covering a gradient from the everwet Choc\'o forests near Colombia to the drier forests of southwestern Ecuador. Expert annotations were used to create a comprehensive dataset, including 7,356 bounding boxes on image patches and 7,603 palm centers across five orthomosaics, encompassing a total area of 449 hectares. By combining PalmDSNet with the bimodal reproduction algorithm, which optimizes parameters for both local and global spatial variability, we effectively simulate the spatial distribution of palms in diverse and dense tropical environments, validating its utility for advanced applications in tropical forest monitoring and remote sensing analysis.


Kernel Mean Embedding Based Hypothesis Tests for Comparing Spatial Point Patterns

Rustamov, Raif M., Klosowski, James T.

arXiv.org Machine Learning

This paper introduces an approach for detecting differences in the first-order structures of spatial point patterns. The proposed approach leverages the kernel mean embedding in a novel way by introducing its approximate version tailored to spatial point processes. While the original embedding is infinite-dimensional and implicit, our approximate embedding is finite-dimensional and comes with explicit closed-form formulas. With its help we reduce the pattern comparison problem to the comparison of means in the Euclidean space. Hypothesis testing is based on conducting $t$-tests on each dimension of the embedding and combining the resulting $p$-values using the harmonic mean $p$-value combination technique. The main advantages of the proposed approach are that it can be applied to both single and replicated pattern comparisons, and that neither bootstrap nor permutation procedures are needed to obtain or calibrate the $p$-values. Our experiments show that the resulting tests are powerful and the $p$-values are well-calibrated; two applications to real world data are presented.


Gaussian Process Modulated Cox Processes under Linear Inequality Constraints

López-Lopera, Andrés F., John, ST, Durrande, Nicolas

arXiv.org Machine Learning

Point processes are used in a variety of real-world problems for modelling temporal or spatiotemporal point patterns in fields such as astronomy, geography, and ecology (Baddeley et al., 2015; Møller and Waagepetersen, 2004). In reliability analysis, they are used as renewal processes to model the lifetime of items or failure (hazard) rates (Cha and Finkelstein, 2018). Poisson processes are the foundation for modelling point patterns (Kingman, 1992). Their extension to stochastic intensity functions, known as doubly stochastic Poisson processes or Cox processes (Cox, 1955), enables nonparametric inference on the intensity function and allows expressing uncertainties (Møller and Waagepetersen, 2004). Moreover, previous studies have shown that other classes of point processes may also be seen as Cox processes. For example, Yannaros (1988) proved that Gamma renewal processes are Cox processes under non-increasing conditions. A similar analysis was made later for Weibull processes (Yannaros, 1994).


Statistical learning of geometric characteristics of wireless networks

Brochard, Antoine, Błaszczyszyn, Bartłomiej, Mallat, Stéphane, Zhang, Sixin

arXiv.org Machine Learning

Motivated by the prediction of cell loads in cellular networks, we formulate the following new, fundamental problem of statistical learning of geometric marks of point processes: An unknown marking function, depending on the geometry of point patterns, produces characteristics (marks) of the points. One aims at learning this function from the examples of marked point patterns in order to predict the marks of new point patterns. To approximate (interpolate) the marking function, in our baseline approach, we build a statistical regression model of the marks with respect some local point distance representation. In a more advanced approach, we use a global data representation via the scattering moments of random measures, which build informative and stable to deformations data representation, already proven useful in image analysis and related application domains. In this case, the regression of the scattering moments of the marked point patterns with respect to the non-marked ones is combined with the numerical solution of the inverse problem, where the marks are recovered from the estimated scattering moments. Considering some simple, generic marks, often appearing in the modeling of wireless networks, such as the shot-noise values, nearest neighbour distance, and some characteristics of the Voronoi cells, we show that the scattering moments can capture similar geometry information as the baseline approach, and can reach even better performance, especially for non-local marking functions. Our results motivate further development of statistical learning tools for stochastic geometry and analysis of wireless networks, in particular to predict cell loads in cellular networks from the locations of base stations and traffic demand.


Model-Based Multiple Instance Learning

Vo, Ba-Ngu, Phung, Dinh, Tran, Quang N., Vo, Ba-Tuong

arXiv.org Machine Learning

While Multiple Instance (MI) data are point patterns -- sets or multi-sets of unordered points -- appropriate statistical point pattern models have not been used in MI learning. This article proposes a framework for model-based MI learning using point process theory. Likelihood functions for point pattern data derived from point process theory enable principled yet conceptually transparent extensions of learning tasks, such as classification, novelty detection and clustering, to point pattern data. Furthermore, tractable point pattern models as well as solutions for learning and decision making from point pattern data are developed.


Clustering For Point Pattern Data

Tran, Quang N., Vo, Ba-Ngu, Phung, Dinh, Vo, Ba-Tuong

arXiv.org Machine Learning

Clustering is one of the most common unsupervised learning tasks in machine learning and data mining. Clustering algorithms have been used in a plethora of applications across several scientific fields. However, there has been limited research in the clustering of point patterns - sets or multi-sets of unordered elements - that are found in numerous applications and data sources. In this paper, we propose two approaches for clustering point patterns. The first is a non-parametric method based on novel distances for sets. The second is a model-based approach, formulated via random finite set theory, and solved by the Expectation-Maximization algorithm. Numerical experiments show that the proposed methods perform well on both simulated and real data.