On the Gini-impurity Preservation For Privacy Random Forests
–Neural Information Processing Systems
Random forests have been one of the successful ensemble algorithms in machine learning. Various techniques have been utilized to preserve the privacy of random forests, such as anonymization, differential privacy, homomorphic encryption, etc. This work takes one step towards data encryption by incorporating some crucial ingredients of learning algorithm. Specifically, we develop a new encryption to preserve data's Gini impurity, which plays an important role during the construction of random forests. The basic idea is to modify the structure of binary search tree to store several examples in each node, and encrypt the data features by incorporating label and order information. Theoretically, our scheme is proven to preserve the minimum Gini impurity in ciphertexts without decrypting, and we also present the security guarantee for encryption. For random forests, we encrypt data features based on our Gini-impurity-preserving scheme, and take the homomorphic encryption scheme CKKS to encrypt data labels owing to their importance and privacy. We finally present extensive empirical studies to validate the effectiveness, efficiency and security of our proposed method.
Neural Information Processing Systems
Feb-11-2025, 03:55:32 GMT