Goto

Collaborating Authors

Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data

arXiv.org Machine Learning

Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM). Different nested cross-validation methods including hyperparameter tuning methods are used to evaluate model performances with the aim to receive bias-reduced performance estimates. As a case study the spatial distribution of forest disease Diplodia sapinea in the Basque Country in Spain is investigated using common environmental variables such as temperature, precipitation, soil or lithology as predictors. Results show that GAM and RF (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) performance estimates of the GAM and RF are 0.167 (24%) and 0.213 (30%), respectively. It is recommended to also use spatial partitioning for cross-validation hyperparameter tuning of spatial data.


Input Parameter Calibration in Forest Fire Spread Prediction: Taking the Intelligent Way

AAAI Conferences

Imprecision and uncertainty in the large number of input parameters are serious problems in forest fire behaviour modelling. To obtain more reliable forecasts, fast and efficient computational input parameter estimation and calibration mechanisms should be integrated. These have to respect hard real-time constraints of simulations to prevent tragedy. We propose an Evolutionary Intelligent System (EIS) for parameter calibration. Depending on disaster size, required parameter precision, and available computing resources, the hybridisation of an evolutionary algorithm (EA) with an intelligent paradigm (IP) can be configured. Experiments show that EIS generates comparable estimations to standard evolutionary calibration approaches, clearly outperforming the latter in runtime.


Recent Applications of Artificial Neural Networks in Forest Resource Management: An Overview

AAAI Conferences

Making good decisions for adaptive forest management has become increasingly difficult. New artificial intelligence (AI) technology allows knowledge processing to be included in decision-support tool. The application of Artificial Neural Networks (ANN), known as Parallel Distributed Processing (PDP), to predict the behaviours of nonlinear systems has become an attractive alternative to traditional statistical methods. This paper aims to provide an up-to-date synthesis of the use of ANN in forest resource management. Current ANN applications include: (1) forest land mapping and classification, (2) forest growth and dynamics modeling (3) spatial data analysis and modeling (4) plant disease dynamics modeling, and (5) climate change research. The advantages and disadvantages of using ANNs are discussed. Although the ANN applications are at an early stage, they have demonstrated potential as a useful tool for forest resource management.


Fast Optimization of Wildfire Suppression Policies with SMAC

arXiv.org Machine Learning

Managers of US National Forests must decide what policy to apply for dealing with lightning-caused wildfires. Conflicts among stakeholders (e.g., timber companies, home owners, and wildlife biologists) have often led to spirited political debates and even violent eco-terrorism. One way to transform these conflicts into multi-stakeholder negotiations is to provide a high-fidelity simulation environment in which stakeholders can explore the space of alternative policies and understand the tradeoffs therein. Such an environment needs to support fast optimization of MDP policies so that users can adjust reward functions and analyze the resulting optimal policies. This paper assesses the suitability of SMAC---a black-box empirical function optimization algorithm---for rapid optimization of MDP policies. The paper describes five reward function components and four stakeholder constituencies. It then introduces a parameterized class of policies that can be easily understood by the stakeholders. SMAC is applied to find the optimal policy in this class for the reward functions of each of the stakeholder constituencies. The results confirm that SMAC is able to rapidly find good policies that make sense from the domain perspective. Because the full-fidelity forest fire simulator is far too expensive to support interactive optimization, SMAC is applied to a surrogate model constructed from a modest number of runs of the full-fidelity simulator. To check the quality of the SMAC-optimized policies, the policies are evaluated on the full-fidelity simulator. The results confirm that the surrogate values estimates are valid. This is the first successful optimization of wildfire management policies using a full-fidelity simulation. The same methodology should be applicable to other contentious natural resource management problems where high-fidelity simulation is extremely expensive.


Importance of spatial predictor variable selection in machine learning applications -- Moving from data reproduction to spatial prediction

arXiv.org Machine Learning

Machine learning algorithms find frequent application in spatial prediction of biotic and abiotic environmental variables. However, the characteristics of spatial data, especially spatial autocorrelation, are widely ignored. We hypothesize that this is problematic and results in models that can reproduce training data but are unable to make spatial predictions beyond the locations of the training samples. We assume that not only spatial validation strategies but also spatial variable selection is essential for reliable spatial predictions. We introduce two case studies that use remote sensing to predict land cover and the leaf area index for the "Marburg Open Forest", an open research and education site of Marburg University, Germany. We use the machine learning algorithm Random Forests to train models using non-spatial and spatial cross-validation strategies to understand how spatial variable selection affects the predictions. Our findings confirm that spatial cross-validation is essential in preventing overoptimistic model performance. We further show that highly autocorrelated predictors (such as geolocation variables, e.g. latitude, longitude) can lead to considerable overfitting and result in models that can reproduce the training data but fail in making spatial predictions. The problem becomes apparent in the visual assessment of the spatial predictions that show clear artefacts that can be traced back to a misinterpretation of the spatially autocorrelated predictors by the algorithm. Spatial variable selection could automatically detect and remove such variables that lead to overfitting, resulting in reliable spatial prediction patterns and improved statistical spatial model performance. We conclude that in addition to spatial validation, a spatial variable selection must be considered in spatial predictions of ecological data to produce reliable predictions.