Outsourcing Training without Uploading Data via Efficient Collaborative Open-Source Sampling

Jan-15-2025, 16:35:31 GMT–Neural Information Processing Systems

As deep learning blooms with growing demand for computation and data resources, outsourcing model training to a powerful cloud server becomes an attractive alternative to training at a low-power and cost-effective end device. Traditional outsourcing requires uploading device data to the cloud server, which can be infeasible in many real-world applications due to the often sensitive nature of the collected data and the limited communication bandwidth. To tackle these challenges, we propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources (e.g., Internet images). We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training, in lieu of client data. ECOS probes open-source data on the cloud server to sense the distribution of client data via a communication- and computation-efficient sampling process, which only communicates a few compressed public features and client scalar responses.

efficient collaborative open-source sampling, open-source data, outsourcing training, (4 more...)

Neural Information Processing Systems

Jan-15-2025, 16:35:31 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Software (1.00)
  - Cloud Computing (1.00)
  - Artificial Intelligence > Machine Learning (0.43)