Goto

Collaborating Authors

 ipss


Efficient Data Valuation Approximation in Federated Learning: A Sampling-based Approach

Wei, Shuyue, Tong, Yongxin, Zhou, Zimu, He, Tianran, Xu, Yi

arXiv.org Artificial Intelligence

Federated learning paradigm to utilize datasets across multiple data providers. In FL, cross-silo data providers often hesitate to share their high-quality dataset unless their data value can be fairly assessed. Shapley value (SV) has been advocated as the standard metric for data valuation in FL due to its desirable properties. However, the computational overhead of SV is prohibitive in practice, as it inherently requires training and evaluating an FL model across an exponential number of dataset combinations. Furthermore, existing solutions fail to achieve high accuracy and efficiency, making practical use of SV still out of reach, because they ignore choosing suitable computation scheme for approximation framework and overlook the property of utility function in FL. We first propose a unified stratified-sampling framework for two widely-used schemes. Then, we analyze and choose the more promising scheme under the FL linear regression assumption. After that, we identify a phenomenon termed key combinations, where only limited dataset combinations have a high-impact on final data value. Building on these insights, we propose a practical approximation algorithm, IPSS, which strategically selects high-impact dataset combinations rather than evaluating all possible combinations, thus substantially reducing time cost with minor approximation error. Furthermore, we conduct extensive evaluations on the FL benchmark datasets to demonstrate that our proposed algorithm outperforms a series of representative baselines in terms of efficiency and effectiveness.


Fast nonparametric feature selection with error control using integrated path stability selection

Melikechi, Omar, Dunson, David B., Miller, Jeffrey W.

arXiv.org Machine Learning

Feature selection can greatly improve performance and interpretability in machine learning problems. However, existing nonparametric feature selection methods either lack theoretical error control or fail to accurately control errors in practice. Many methods are also slow, especially in high dimensions. In this paper, we introduce a general feature selection method that applies integrated path stability selection to thresholding to control false positives and the false discovery rate. The method also estimates q-values, which are better suited to high-dimensional data than p-values. We focus on two special cases of the general method based on gradient boosting (IPSSGB) and random forests (IPSSRF). Extensive simulations with RNA sequencing data show that IPSSGB and IPSSRF have better error control, detect more true positives, and are faster than existing methods. We also use both methods to detect microRNAs and genes related to ovarian cancer, finding that they make better predictions with fewer features than other methods.


Integrated path stability selection

Melikechi, Omar, Miller, Jeffrey W.

arXiv.org Machine Learning

Stability selection is a widely used method for improving the performance of feature selection algorithms. However, stability selection has been found to be highly conservative, resulting in low sensitivity. Further, the theoretical bound on the expected number of false positives, E(FP), is relatively loose, making it difficult to know how many false positives to expect in practice. In this paper, we introduce a novel method for stability selection based on integrating the stability paths rather than maximizing over them. This yields a tighter bound on E(FP), resulting in a feature selection criterion that has higher sensitivity in practice and is better calibrated in terms of matching the target E(FP). Our proposed method requires the same amount of computation as the original stability selection algorithm, and only requires the user to specify one input parameter, a target value for E(FP). We provide theoretical bounds on performance, and demonstrate the method on simulations and real data from cancer gene expression studies.


Knowledge Transfer between Automated Planners

AI Magazine

More specifically, we demonstrate how to transfer the domain-dependent heuristics acquired by one planner into a second planner. Our motivation is to improve the efficiency and the efficacy of the second planner by allowing it to use the transferred heuristics to capture domain regularities that it would not otherwise recognize. Our experimental results show that the transferred knowledge does improve the second planner's performance on novel tasks over a set of seven benchmark planning domains. Recently, the artificial intelligence community has attempted to model this transfer in an effort to improve learning on new tasks by using knowledge from related tasks. For example, classification and inference algorithms have been extended to support transfer of conceptual knowledge (for a survey see Torrey and Shavlik [2009]).


Knowledge Transfer between Automated Planners

Fernandez, Susana (Universidad Carlos III de Madrid) | Aler, Ricardo (Universidad Carlos III de Madrid) | Borrajo, Daniel (Universidad Carlos III de Madrid)

AI Magazine

In this article, we discuss the problem of transferring search heuristics from one planner to another. More specifically, we demonstrate how to transfer the domain-dependent heuristics acquired by one planner into a second planner. Our motivation is to improve the efficiency and the efficacy of the second planner by allowing it to use the transferred heuristics to capture domain regularities that it would not otherwise recognize. Our experimental results show that the transferred knowledge does improve the second planner's performance on novel tasks over a set of seven benchmark planning domains.