Goto

Collaborating Authors

Performance evaluation and hyperparameter tuning of statistical and machine-learning models using spatial data

arXiv.org Machine Learning

Machine-learning algorithms have gained popularity in recent years in the field of ecological modeling due to their promising results in predictive performance of classification problems. While the application of such algorithms has been highly simplified in the last years due to their well-documented integration in commonly used statistical programming languages such as R, there are several practical challenges in the field of ecological modeling related to unbiased performance estimation, optimization of algorithms using hyperparameter tuning and spatial autocorrelation. We address these issues in the comparison of several widely used machine-learning algorithms such as Boosted Regression Trees (BRT), k-Nearest Neighbor (WKNN), Random Forest (RF) and Support Vector Machine (SVM) to traditional parametric algorithms such as logistic regression (GLM) and semi-parametric ones like generalized additive models (GAM). Different nested cross-validation methods including hyperparameter tuning methods are used to evaluate model performances with the aim to receive bias-reduced performance estimates. As a case study the spatial distribution of forest disease Diplodia sapinea in the Basque Country in Spain is investigated using common environmental variables such as temperature, precipitation, soil or lithology as predictors. Results show that GAM and RF (mean AUROC estimates 0.708 and 0.699) outperform all other methods in predictive accuracy. The effect of hyperparameter tuning saturates at around 50 iterations for this data set. The AUROC differences between the bias-reduced (spatial cross-validation) and overoptimistic (non-spatial cross-validation) performance estimates of the GAM and RF are 0.167 (24%) and 0.213 (30%), respectively. It is recommended to also use spatial partitioning for cross-validation hyperparameter tuning of spatial data.


Uncertainty Aware Wildfire Management

arXiv.org Artificial Intelligence

Recent wildfires in the United States have resulted in loss of life and billions of dollars, destroying countless structures and forests. Fighting wildfires is extremely complex. It is difficult to observe the true state of fires due to smoke and risk associated with ground surveillance. There are limited resources to be deployed over a massive area and the spread of the fire is challenging to predict. This paper proposes a decision-theoretic approach to combat wildfires. We model the resource allocation problem as a partially-observable Markov decision process. We also present a data-driven model that lets us simulate how fires spread as a function of relevant covariates. A major problem in using data-driven models to combat wildfires is the lack of comprehensive data sources that relate fires with relevant covariates. We present an algorithmic approach based on large-scale raster and vector analysis that can be used to create such a dataset. Our data with over 2 million data points is the first open-source dataset that combines existing fire databases with covariates extracted from satellite imagery. Through experiments using real-world wildfire data, we demonstrate that our forecasting model can accurately model the spread of wildfires. Finally, we use simulations to demonstrate that our response strategy can significantly reduce response times compared to baseline methods.


Input Parameter Calibration in Forest Fire Spread Prediction: Taking the Intelligent Way

AAAI Conferences

Imprecision and uncertainty in the large number of input parameters are serious problems in forest fire behaviour modelling. To obtain more reliable forecasts, fast and efficient computational input parameter estimation and calibration mechanisms should be integrated. These have to respect hard real-time constraints of simulations to prevent tragedy. We propose an Evolutionary Intelligent System (EIS) for parameter calibration. Depending on disaster size, required parameter precision, and available computing resources, the hybridisation of an evolutionary algorithm (EA) with an intelligent paradigm (IP) can be configured. Experiments show that EIS generates comparable estimations to standard evolutionary calibration approaches, clearly outperforming the latter in runtime.


Fast Optimization of Wildfire Suppression Policies with SMAC

arXiv.org Machine Learning

Managers of US National Forests must decide what policy to apply for dealing with lightning-caused wildfires. Conflicts among stakeholders (e.g., timber companies, home owners, and wildlife biologists) have often led to spirited political debates and even violent eco-terrorism. One way to transform these conflicts into multi-stakeholder negotiations is to provide a high-fidelity simulation environment in which stakeholders can explore the space of alternative policies and understand the tradeoffs therein. Such an environment needs to support fast optimization of MDP policies so that users can adjust reward functions and analyze the resulting optimal policies. This paper assesses the suitability of SMAC---a black-box empirical function optimization algorithm---for rapid optimization of MDP policies. The paper describes five reward function components and four stakeholder constituencies. It then introduces a parameterized class of policies that can be easily understood by the stakeholders. SMAC is applied to find the optimal policy in this class for the reward functions of each of the stakeholder constituencies. The results confirm that SMAC is able to rapidly find good policies that make sense from the domain perspective. Because the full-fidelity forest fire simulator is far too expensive to support interactive optimization, SMAC is applied to a surrogate model constructed from a modest number of runs of the full-fidelity simulator. To check the quality of the SMAC-optimized policies, the policies are evaluated on the full-fidelity simulator. The results confirm that the surrogate values estimates are valid. This is the first successful optimization of wildfire management policies using a full-fidelity simulation. The same methodology should be applicable to other contentious natural resource management problems where high-fidelity simulation is extremely expensive.


Modeling Dengue Vector Population Using Remotely Sensed Data and Machine Learning

arXiv.org Machine Learning

Mosquitoes are vectors of many human diseases. In particular, Aedes \ae gypti (Linnaeus) is the main vector for Chikungunya, Dengue, and Zika viruses in Latin America and it represents a global threat. Public health policies that aim at combating this vector require dependable and timely information, which is usually expensive to obtain with field campaigns. For this reason, several efforts have been done to use remote sensing due to its reduced cost. The present work includes the temporal modeling of the oviposition activity (measured weekly on 50 ovitraps in a north Argentinean city) of Aedes \ae gypti (Linnaeus), based on time series of data extracted from operational earth observation satellite images. We use are NDVI, NDWI, LST night, LST day and TRMM-GPM rain from 2012 to 2016 as predictive variables. In contrast to previous works which use linear models, we employ Machine Learning techniques using completely accessible open source toolkits. These models have the advantages of being non-parametric and capable of describing nonlinear relationships between variables. Specifically, in addition to two linear approaches, we assess a Support Vector Machine, an Artificial Neural Networks, a K-nearest neighbors and a Decision Tree Regressor. Considerations are made on parameter tuning and the validation and training approach. The results are compared to linear models used in previous works with similar data sets for generating temporal predictive models. These new tools perform better than linear approaches, in particular Nearest Neighbor Regression (KNNR) performs the best. These results provide better alternatives to be implemented operatively on the Argentine geospatial Risk system that is running since 2012.