Goto

Collaborating Authors

 area estimation


Evaluation of A Spatial Microsimulation Framework for Small-Area Estimation of Population Health Outcomes Using the Behavioral Risk Factor Surveillance System

Von Hoene, Emma, Gupta, Aanya, Kavak, Hamdi, Roess, Amira, Anderson, Taylor

arXiv.org Artificial Intelligence

The field of population health addresses a wide spectrum of challenges, spanning infectious and chronic diseases to mental health and health risk behaviors such as smoking and alcohol consumption (Sharma et al., 2025). A common barrie r to addressing these issues is the lack of ground truth data capturing health outcomes and behaviors at fine geographic scales. This limits both local and national health decision - makers in planning and management efforts, such as identify ing health inequalities or targeting interventions where they are most needed (Rahman, 2017; Wang, 2018) . T o fill this gap, researchers use small area estimation (SAE), a collection of statistical methods that combine survey and geographic data to generate estimates of population - level health outcomes at various spatial scales (RTI International, 2025) . There are numerous methods for generating SAE of health outcomes, which can generally be grouped into two main approaches: direct and indirect model - based estimates (Rahman, 2017) . Direct estimates are calculated using only the survey responses from individuals or households sampled within the specified geographi c areas (counties, states) to estimate disease prevalence or other population characteristics.


Variational Autoencoded Multivariate Spatial Fay-Herriot Models

Wang, Zhenhua, Parker, Paul A., Holan, Scott H.

arXiv.org Machine Learning

Small area estimation models are essential for estimating population characteristics in regions with limited sample sizes, thereby supporting policy decisions, demographic studies, and resource allocation, among other use cases. The spatial Fay-Herriot model is one such approach that incorporates spatial dependence to improve estimation by borrowing strength from neighboring regions. However, this approach often requires substantial computational resources, limiting its scalability for high-dimensional datasets, especially when considering multiple (multivariate) responses. This paper proposes two methods that integrate the multivariate spatial Fay-Herriot model with spatial random effects, learned through variational autoencoders, to efficiently leverage spatial structure. Importantly, after training the variational autoencoder to represent spatial dependence for a given set of geographies, it may be used again in future modeling efforts, without the need for retraining. Additionally, the use of the variational autoencoder to represent spatial dependence results in extreme improvements in computational efficiency, even for massive datasets. We demonstrate the effectiveness of our approach using 5-year period estimates from the American Community Survey over all census tracts in California.


Deep Learning-Based Direct Leaf Area Estimation using Two RGBD Datasets for Model Development

Jayasuriya, Namal, Guo, Yi, Hu, Wen, Ghannoum, Oula

arXiv.org Artificial Intelligence

Estimation of a single leaf area can be a measure of crop growth and a phenotypic trait to breed new varieties. It has also been used to measure leaf area index and total leaf area. Some studies have used hand-held cameras, image processing 3D reconstruction and unsupervised learning-based methods to estimate the leaf area in plant images. Deep learning works well for object detection and segmentation tasks; however, direct area estimation of objects has not been explored. This work investigates deep learning-based leaf area estimation, for RGBD images taken using a mobile camera setup in real-world scenarios. A dataset for attached leaves captured with a top angle view and a dataset for detached single leaves were collected for model development and testing. First, image processing-based area estimation was tested on manually segmented leaves. Then a Mask R-CNN-based model was investigated, and modified to accept RGBD images and to estimate the leaf area. The detached-leaf data set was then mixed with the attached-leaf plant data set to estimate the single leaf area for plant images, and another network design with two backbones was proposed: one for segmentation and the other for area estimation. Instead of trying all possibilities or random values, an agile approach was used in hyperparameter tuning. The final model was cross-validated with 5-folds and tested with two unseen datasets: detached and attached leaves. The F1 score with 90% IoA for segmentation result on unseen detached-leaf data was 1.0, while R-squared of area estimation was 0.81. For unseen plant data segmentation, the F1 score with 90% IoA was 0.59, while the R-squared score was 0.57. The research suggests using attached leaves with ground truth area to improve the results.


SimPhony: A Device-Circuit-Architecture Cross-Layer Modeling and Simulation Framework for Heterogeneous Electronic-Photonic AI System

Yin, Ziang, Zhang, Meng, Begovic, Amir, Huang, Rena, Zhang, Jeff, Gu, Jiaqi

arXiv.org Artificial Intelligence

Electronic-photonic integrated circuits (EPICs) offer transformative potential for next-generation high-performance AI but require interdisciplinary advances across devices, circuits, architecture, and design automation. The complexity of hybrid systems makes it challenging even for domain experts to understand distinct behaviors and interactions across design stack. The lack of a flexible, accurate, fast, and easy-to-use EPIC AI system simulation framework significantly limits the exploration of hardware innovations and system evaluations on common benchmarks. To address this gap, we propose SimPhony, a cross-layer modeling and simulation framework for heterogeneous electronic-photonic AI systems. SimPhony offers a platform that enables (1) generic, extensible hardware topology representation that supports heterogeneous multi-core architectures with diverse photonic tensor core designs; (2) optics-specific dataflow modeling with unique multi-dimensional parallelism and reuse beyond spatial/temporal dimensions; (3) data-aware energy modeling with realistic device responses, layout-aware area estimation, link budget analysis, and bandwidth-adaptive memory modeling; and (4) seamless integration with model training framework for hardware/software co-simulation. By providing a unified, versatile, and high-fidelity simulation platform, SimPhony enables researchers to innovate and evaluate EPIC AI hardware across multiple domains, facilitating the next leap in emerging AI hardware. We open-source our codes at https://github.com/ScopeX-ASU/SimPhony


A step towards the integration of machine learning and small area estimation

Żądło, Tomasz, Chwila, Adam

arXiv.org Machine Learning

The use of machine-learning techniques has grown in numerous research areas. Currently, it is also widely used in statistics, including the official statistics for data collection (e.g. satellite imagery, web scraping and text mining, data cleaning, integration and imputation) but also for data analysis. However, the usage of these methods in survey sampling including small area estimation is still very limited. Therefore, we propose a predictor supported by these algorithms which can be used to predict any population or subpopulation characteristics based on cross-sectional and longitudinal data. Machine learning methods have already been shown to be very powerful in identifying and modelling complex and nonlinear relationships between the variables, which means that they have very good properties in case of strong departures from the classic assumptions. Therefore, we analyse the performance of our proposal under a different set-up, in our opinion of greater importance in real-life surveys. We study only small departures from the assumed model, to show that our proposal is a good alternative in this case as well, even in comparison with optimal methods under the model. What is more, we propose the method of the accuracy estimation of machine learning predictors, giving the possibility of the accuracy comparison with classic methods, where the accuracy is measured as in survey sampling practice. The solution of this problem is indicated in the literature as one of the key issues in integration of these approaches. The simulation studies are based on a real, longitudinal dataset, freely available from the Polish Local Data Bank, where the prediction problem of subpopulation characteristics in the last period, with "borrowing strength" from other subpopulations and time periods, is considered.