Goto

Collaborating Authors

 Cantabria


Metric Privacy in Federated Learning for Medical Imaging: Improving Convergence and Preventing Client Inference Attacks

arXiv.org Artificial Intelligence

Federated learning is a distributed learning technique that allows training a global model with the participation of different data owners without the need to share raw data. This architecture is orchestrated by a central server that aggregates the local models from the clients. This server may be trusted, but not all nodes in the network. Then, differential privacy (DP) can be used to privatize the global model by adding noise. However, this may affect convergence across the rounds of the federated architecture, depending also on the aggregation strategy employed. In this work, we aim to introduce the notion of metric-privacy to mitigate the impact of classical server side global-DP on the convergence of the aggregated model. Metric-privacy is a relaxation of DP, suitable for domains provided with a notion of distance. We apply it from the server side by computing a distance for the difference between the local models. We compare our approach with standard DP by analyzing the impact on six classical aggregation strategies. The proposed methodology is applied to an example of medical imaging and different scenarios are simulated across homogeneous and non-i.i.d clients. Finally, we introduce a novel client inference attack, where a semi-honest client tries to find whether another client participated in the training and study how it can be mitigated using DP and metric-privacy. Our evaluation shows that metric-privacy can increase the performance of the model compared to standard DP, while offering similar protection against client inference attacks.


Enhancing the Convergence of Federated Learning Aggregation Strategies with Limited Data

arXiv.org Artificial Intelligence

The development of deep learning techniques is a leading field applied to cases in which medical data is used, particularly in cases of image diagnosis. This type of data has privacy and legal restrictions that in many cases prevent it from being processed from central servers. However, in this area collaboration between different research centers, in order to create models as robust as possible, trained with the largest quantity and diversity of data available, is a critical point to be taken into account. In this sense, the application of privacy aware distributed architectures, such as federated learning arises. When applying this type of architecture, the server aggregates the different local models trained with the data of each data owner to build a global model. This point is critical and therefore it is fundamental to analyze different ways of aggregation according to the use case, taking into account the distribution of the clients, the characteristics of the model, etc. In this paper we propose a novel aggregation strategy and we apply it to a use case of cerebral magnetic resonance image classification. In this use case the aggregation function proposed manages to improve the convergence obtained over the rounds of the federated learning process in relation to different aggregation strategies classically implemented and applied.


Are Deep Learning Methods Suitable for Downscaling Global Climate Projections? Review and Intercomparison of Existing Models

arXiv.org Machine Learning

Deep Learning (DL) has shown promise for downscaling global climate change projections under different approaches, including Perfect Prognosis (PP) and Regional Climate Model (RCM) emulation. Unlike emulators, PP downscaling models are trained on observational data, so it remains an open question whether they can plausibly extrapolate unseen conditions and changes in future emissions scenarios. Here we focus on this problem as the main drawback for the operationalization of these methods and present the results of 1) a literature review to identify state-of-the-art DL models for PP downscaling and 2) an intercomparison experiment to evaluate the performance of these models and to assess their extrapolation capability using a common experimental framework, taking into account the sensitivity of results to different training replicas. We focus on minimum and maximum temperatures and precipitation over Spain, a region with a range of climatic conditions with different influential regional processes. We conclude with a discussion of the findings, limitations of existing methods, and prospects for future development.


Transformer based super-resolution downscaling for regional reanalysis: Full domain vs tiling approaches

arXiv.org Artificial Intelligence

Reanalysis datasets constitute the main source of spatially homogeneous information for climate analysis since they provide long records (spanning several decades) of physically consistent hourly/daily gridded data for many variables produced globally with a particular atmospheric general circulation model (AGCM) assimilating the available observations (see https://reanalyses.org for an overview of the current reanalyses). Besides the historical records, in some cases reanalyses provide near real-time information that allows monitoring the state of the climate. For instance, ERA5 [Hersbach et al., 2020] is the latest ECMWF climate reanalysis, providing hourly data on many atmospheric and land-surface parameters at 0.25º resolution, from 1940 to near real-time. However, much of this data is generated at coarse spatial resolutions, typically on the order of tens of kilometres, hampering their application for local and regional climate analysis, including extreme weather events, which often occur on smaller spatial scales. Enhancing the spatial resolution of reanalyses datasets is therefore critical for improving its utility for local-scale climate analysis and decision-making. A number of downscaling methods have been developed over the last decades for improving the spatial resolution of AGCM outputs based on two main approaches [Maraun and Widmann, 2017]: dynamical and statistical downscaling. Dynamical downscaling employs regional atmospheric models (Limited Area Models, LAMs) over limited areas of interest, driven at the boundaries by the AGCM outputs, to increase their coarse-resolution. This approach allows to solve regional/local processes and provides physically consistent results, but is limited by its high computational demands. It has been recently applied to generate regional reanalysis over continental-wide areas, such as the CERRA renalysis over Europe using the HARMONIE-ALADIN regional model (driven by ERA5) at a 5.5km resolution.


A Likelihood-Based Generative Approach for Spatially Consistent Precipitation Downscaling

arXiv.org Machine Learning

Deep learning has emerged as a promising tool for precipitation downscaling. However, current models rely on likelihood-based loss functions to properly model the precipitation distribution, leading to spatially inconsistent projections when sampling. This work explores a novel approach by fusing the strengths of likelihood-based and adversarial losses used in generative models. As a result, we propose a likelihood-based generative approach for precipitation downscaling, leveraging the benefits of both methods.


Architecture of a Cortex Inspired Hierarchical Event Recaller

arXiv.org Artificial Intelligence

This paper proposes a new approach to Machine Learning (ML) that focuses on unsupervised continuous context-dependent learning of complex patterns. Although the proposal is partly inspired by some of the current knowledge about the structural and functional properties of the mammalian brain, we do not claim that biological systems work in an analogous way (nor the opposite). Based on some properties of the cerebellar cortex and adjacent structures, a proposal suitable for practical problems is presented. A synthetic structure capable of identifying and predicting complex temporal series will be defined and experimentally tested. The system relies heavily on prediction to help identify and learn patterns based on previously acquired contextual knowledge. As a proof of concept, the proposed system is shown to be able to learn, identify and predict a remarkably complex temporal series such as human speech, with no prior knowledge. From raw data, without any adaptation in the core algorithm, the system is able to identify certain speech structures from a set of Spanish sentences. Unlike conventional ML, the proposal can learn with a reduced training set. Although the idea can be applied to a constrained problem, such as the detection of unknown vocabulary in a speech, it could be used in more applications, such as vision, or (by incorporating the missing biological periphery) fit into other ML techniques. Given the trivial computational primitives used, a potential hardware implementation will be remarkably frugal. Coincidentally, the proposed model not only conforms to a plausible functional framework for biological systems but may also explain many elusive cognitive phenomena.


Comparison of machine learning models applied on anonymized data with different techniques

arXiv.org Artificial Intelligence

Anonymization techniques based on obfuscating the quasi-identifiers by means of value generalization hierarchies are widely used to achieve preset levels of privacy. To prevent different types of attacks against database privacy it is necessary to apply several anonymization techniques beyond the classical k-anonymity or $\ell$-diversity. However, the application of these methods is directly connected to a reduction of their utility in prediction and decision making tasks. In this work we study four classical machine learning methods currently used for classification purposes in order to analyze the results as a function of the anonymization techniques applied and the parameters selected for each of them. The performance of these models is studied when varying the value of k for k-anonymity and additional tools such as $\ell$-diversity, t-closeness and $\delta$-disclosure privacy are also deployed on the well-known adult dataset.


Deep Ensembles to Improve Uncertainty Quantification of Statistical Downscaling Models under Climate Change Conditions

arXiv.org Artificial Intelligence

Recently, deep learning has emerged as a promising tool for statistical downscaling, the set of methods for generating high-resolution climate fields from coarse low-resolution variables. Nevertheless, their ability to generalize to climate change conditions remains questionable, mainly due to the stationarity assumption. We propose deep ensembles as a simple method to improve the uncertainty quantification of statistical downscaling models. By better capturing uncertainty, statistical downscaling models allow for superior planning against extreme weather events, a source of various negative social and economic impacts. Since no observational future data exists, we rely on a pseudo reality experiment to assess the suitability of deep ensembles for quantifying the uncertainty of climate change projections. Deep ensembles allow for a better risk assessment, highly demanded by sectoral applications to tackle climate change.


On the use of Deep Generative Models for Perfect Prognosis Climate Downscaling

arXiv.org Artificial Intelligence

Deep Learning has recently emerged as a perfect prognosis downscaling technique to compute high-resolution fields from large-scale coarse atmospheric data. Despite their promising results to reproduce the observed local variability, they are based on the estimation of independent distributions at each location, which leads to deficient spatial structures, especially when downscaling precipitation. This study proposes the use of generative models to improve the spatial consistency of the high-resolution fields, very demanded by some sectoral applications (e.g., hydrology) to tackle climate change.


Using Explainability to Inform Statistical Downscaling Based on Deep Learning Beyond Standard Validation Approaches

arXiv.org Artificial Intelligence

Due to limitations in the computational resources available, General Circulation Models (GCMs) are advocated to simulate the climate system over coarse resolution grids. This hampers the applicability of GCM products in the regional-to-local scale, highly demanded by different socio-economic sectors. Statistical downscaling aims to solve this problem by generating high-resolution climate fields. Recently, machine learning techniques (particularly deep learning models) have shown promising results in this task. These models are first trained in a historical period through observational datasets, and then applied to the GCM outputs of plausible far-future scenarios, thus generating high-resolution climate change products. To assess the plausibility of the derived downscaled fields, several validation frameworks are performed, (e.g., skill to reproduce the present climate) which aim to assess the generalization of the models. Here, we present a novel evaluation protocol building on eXplainable Artificial Intelligence (XAI) to examine the suitability of certain deep learning models for climate downscaling.