Vertical Federated Learning: Concepts, Advances and Challenges

Liu, Yang, Kang, Yan, Zou, Tianyuan, Pu, Yanhong, He, Yuanqin, Ye, Xiaozhou, Ouyang, Ye, Zhang, Ya-Qin, Yang, Qiang

arXiv.org Artificial Intelligence 

Federated Learning (FL) [1] is a novel machine learning paradigm where multiple parties collaboratively build machine learning models without centralizing their data. The concept of FL was first proposed by Google in 2016 [2] to describe a cross-device scenario where millions of mobile devices are coordinated by a central server while local data are not transferred. This concept is soon extended to a cross-silo collaboration scenario among organizations [3], where a small number of reliable organizations join a federation to train a machine learning model. In [3], FL is, for the first time, categorized into three categories based on how data is partitioned in the sample and feature space: Horizontal Federated Learning (HFL), Vertical Federated Learning (VFL) and Federated Transfer Learning (FTL) (See Figure 1). HFL refers to the FL setting where participants share the same feature space while holding different samples. For example, Google uses HFL to allow mobile phone users to use their dataset to collaboratively train a next-word prediction model [2]. VFL refers to the FL setting where datasets share the same samples/users while holding different features. For example, Webank uses VFL to collaborate with an invoice agency to build financial risk models for their enterprise customers [4].