Goto

Collaborating Authors

 Azzimonti, Laura


Flotta: a Secure and Flexible Spark-inspired Federated Learning Framework

arXiv.org Artificial Intelligence

We present Flotta, a Federated Learning framework designed to train machine learning models on sensitive data distributed across a multi-party consortium conducting research in contexts requiring high levels of security, such as the biomedical field. Flotta is a Python package, inspired in several aspects by Apache Spark, which provides both flexibility and security and allows conducting research using solely machines internal to the consortium. In this paper, we describe the main components of the framework together with a practical use case to illustrate the framework's capabilities and highlight its security, flexibility and user-friendliness.


Global Outlier Detection in a Federated Learning Setting with Isolation Forest

arXiv.org Artificial Intelligence

Across several domains, it is common to find examples of data points that are local outliers but not Federated learning (FL) is a machine learning paradigm global outliers. For example, in the medical field, a given where multiple parties collaborate to train a shared machine medical condition may be common in one region and rare in learning model without centralizing data at a single location another [8]. Therefore, in a study conducted at a center located [1]. During model training, data holders refrain from directly in a low-prevalence region, individuals suffering from that exchanging raw data; instead, they share model parameters condition may appear as local outliers. However, if the center such as gradients, weights, or other forms of processed participates in a FL multicenter study including centers in information. This distributed learning paradigm is typically areas where the condition is more common, those individuals facilitated by a coordinating server, often referred to as the would not appear as global outliers. In most cases, for the aggregator, which collects local contributions from data holders, training of FL models, a consortium would be interested in commonly known as clients, and aggregates them to create discarding global outliers and retaining local ones.


An exact kernel framework for spatio-temporal dynamics

arXiv.org Machine Learning

A kernel-based framework for spatio-temporal data analysis is introduced that applies in situations when the underlying system dynamics are governed by a dynamic equation. The key ingredient is a representer theorem that involves time-dependent kernels. Such kernels occur commonly in the expansion of solutions of partial differential equations. The representer theorem is applied to find among all solutions of a dynamic equation the one that minimizes the error with given spatio-temporal samples. This is motivated by the fact that very often a differential equation is given a priori (e.g.~by the laws of physics) and a practitioner seeks the best solution that is compatible with her noisy measurements. Our guiding example is the Fokker-Planck equation, which describes the evolution of density in stochastic diffusion processes. A regression and density estimation framework is introduced for spatio-temporal modeling under Fokker-Planck dynamics with initial and boundary conditions.


Structure Learning from Related Data Sets with a Hierarchical Bayesian Score

arXiv.org Machine Learning

Score functions for learning the structure of Bayesian networks in the literature assume that data are a homogeneous set of observations; whereas it is often the case that they comprise different related, but not homogeneous, data sets collected in different ways. In this paper we propose a new Bayesian Dirichlet score, which we call Bayesian Hierarchical Dirichlet (BHD). The proposed score is based on a hierarchical model that pools information across data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. We derive a closed-form expression for BHD using a variational approximation of the marginal likelihood and we study its performance using simulated data. We find that, when data comprise multiple related data sets, BHD outperforms the Bayesian Dirichlet equivalent uniform (BDeu) score in terms of reconstruction accuracy as measured by the Structural Hamming distance, and that it is as accurate as BDeu when data are homogeneous. Moreover, the estimated networks are sparser and therefore more interpretable than those obtained with BDeu, thanks to a lower number of false positive arcs.