Goto

Collaborating Authors

 Bonnet, Pierre


Applying the maximum entropy principle to neural networks enhances multi-species distribution models

arXiv.org Machine Learning

The rapid expansion of citizen science initiatives has led to a significant growth of biodiversity databases, and particularly presence-only (PO) observations. PO data are invaluable for understanding species distributions and their dynamics, but their use in a Species Distribution Model (SDM) is curtailed by sampling biases and the lack of information on absences. Poisson point processes are widely used for SDMs, with Maxent being one of the most popular methods. Maxent maximises the entropy of a probability distribution across sites as a function of predefined transformations of variables, called features. In contrast, neural networks and deep learning have emerged as a promising technique for automatic feature extraction from complex input variables. Arbitrarily complex transformations of input variables can be learned from the data efficiently through backpropagation and stochastic gradient descent (SGD). In this paper, we propose DeepMaxent, which harnesses neural networks to automatically learn shared features among species, using the maximum entropy principle. To do so, it employs a normalised Poisson loss where for each species, presence probabilities across sites are modelled by a neural network. We evaluate DeepMaxent on a benchmark dataset known for its spatial sampling biases, using PO data for calibration and presence-absence (PA) data for validation across six regions with different biological groups and covariates. Our results indicate that DeepMaxent performs better than Maxent and other leading SDMs across all regions and taxonomic groups. The method performs particularly well in regions of uneven sampling, demonstrating substantial potential to increase SDM performances. In particular, our approach yields more accurate predictions than traditional single-species models, which opens up new possibilities for methodological enhancement.


Fully automatic extraction of morphological traits from the Web: utopia or reality?

arXiv.org Artificial Intelligence

Plant morphological traits, their observable characteristics, are fundamental to understand the role played by each species within their ecosystem. However, compiling trait information for even a moderate number of species is a demanding task that may take experts years to accomplish. At the same time, massive amounts of information about species descriptions is available online in the form of text, although the lack of structure makes this source of data impossible to use at scale. To overcome this, we propose to leverage recent advances in large language models (LLMs) and devise a mechanism for gathering and processing information on plant traits in the form of unstructured textual descriptions, without manual curation. We evaluate our approach by automatically replicating three manually created species-trait matrices. Our method managed to find values for over half of all species-trait pairs, with an F1-score of over 75%. Our results suggest that large-scale creation of structured trait databases from unstructured online text is currently feasible thanks to the information extraction capabilities of LLMs, being limited by the availability of textual descriptions covering all the traits of interest.


Cooperative learning of Pl@ntNet's Artificial Intelligence algorithm: how does it work and how can we improve it?

arXiv.org Artificial Intelligence

Deep learning models for plant species identification rely on large annotated datasets. The PlantNet system enables global data collection by allowing users to upload and annotate plant observations, leading to noisy labels due to diverse user skills. Achieving consensus is crucial for training, but the vast scale of collected data makes traditional label aggregation strategies challenging. Existing methods either retain all observations, resulting in noisy training data or selectively keep those with sufficient votes, discarding valuable information. Additionally, as many species are rarely observed, user expertise can not be evaluated as an inter-user agreement: otherwise, botanical experts would have a lower weight in the AI training step than the average user. Our proposed label aggregation strategy aims to cooperatively train plant identification AI models. This strategy estimates user expertise as a trust score per user based on their ability to identify plant species from crowdsourced data. The trust score is recursively estimated from correctly identified species given the current estimated labels. This interpretable score exploits botanical experts' knowledge and the heterogeneity of users. Subsequently, our strategy removes unreliable observations but retains those with limited trusted annotations, unlike other approaches. We evaluate PlantNet's strategy on a released large subset of the PlantNet database focused on European flora, comprising over 6M observations and 800K users. We demonstrate that estimating users' skills based on the diversity of their expertise enhances labeling performance. Our findings emphasize the synergy of human annotation and data filtering in improving AI performance for a refined dataset. We explore incorporating AI-based votes alongside human input. This can further enhance human-AI interactions to detect unreliable observations.


Modelling Species Distributions with Deep Learning to Predict Plant Extinction Risk and Assess Climate Change Impacts

arXiv.org Artificial Intelligence

The post-2020 global biodiversity framework needs ambitious, research-based targets. Estimating the accelerated extinction risk due to climate change is critical. The International Union for Conservation of Nature (IUCN) measures the extinction risk of species. Automatic methods have been developed to provide information on the IUCN status of under-assessed taxa. However, these compensatory methods are based on current species characteristics, mainly geographical, which precludes their use in future projections. Here, we evaluate a novel method for classifying the IUCN status of species benefiting from the generalisation power of species distribution models based on deep learning. Our method matches state-of-the-art classification performance while relying on flexible SDM-based features that capture species' environmental preferences. Cross-validation yields average accuracies of 0.61 for status classification and 0.78 for binary classification. Climate change will reshape future species distributions. Under the species-environment equilibrium hypothesis, SDM projections approximate plausible future outcomes. Two extremes of species dispersal capacity are considered: unlimited or null. The projected species distributions are translated into features feeding our IUCN classification method. Finally, trends in threatened species are analysed over time and i) by continent and as a function of average ii) latitude or iii) altitude. The proportion of threatened species is increasing globally, with critical rates in Africa, Asia and South America. Furthermore, the proportion of threatened species is predicted to peak around the two Tropics, at the Equator, in the lowlands and at altitudes of 800-1,500 m.


AI-based Mapping of the Conservation Status of Orchid Assemblages at Global Scale

arXiv.org Artificial Intelligence

Although increasing threats on biodiversity are now widely recognised, there are no accurate global maps showing whether and where species assemblages are at risk. We hereby assess and map at kilometre resolution the conservation status of the iconic orchid family, and discuss the insights conveyed at multiple scales. We introduce a new Deep Species Distribution Model trained on 1M occurrences of 14K orchid species to predict their assemblages at global scale and at kilometre resolution. We propose two main indicators of the conservation status of the assemblages: (i) the proportion of threatened species, and (ii) the status of the most threatened species in the assemblage. We show and analyze the variation of these indicators at World scale and in relation to currently protected areas in Sumatra island. Global and interactive maps available online show the indicators of conservation status of orchid assemblages, with sharp spatial variations at all scales. The highest level of threat is found at Madagascar and the neighbouring islands. In Sumatra, we found good correspondence of protected areas with our indicators, but supplementing current IUCN assessments with status predictions results in alarming levels of species threat across the island. Recent advances in deep learning enable reliable mapping of the conservation status of species assemblages on a global scale. As an umbrella taxon, orchid family provides a reference for identifying vulnerable ecosystems worldwide, and prioritising conservation actions both at international and local levels.