AITopics

arXiv.org Artificial IntelligenceOct-28-2024

Differentially Private Learned Indexes

Du, Jianzhang, Mudgal, Tilak, Gadre, Rutvi Rahul, Luo, Yukui, Wang, Chenghong

In this paper, we address the problem of efficiently answering predicate queries on encrypted databases, those secured by Trusted Execution Environments (TEEs), which enable untrusted providers to process encrypted user data without revealing its contents. A common strategy in modern databases to accelerate predicate queries is the use of indexes, which map attribute values (keys) to their corresponding positions in a sorted data array. This allows for fast lookup and retrieval of data subsets that satisfy specific predicates. Unfortunately, indexes cannot be directly applied to encrypted databases due to strong data dependent leakages. Recent approaches apply differential privacy (DP) to construct noisy indexes that enable faster access to encrypted data while maintaining provable privacy guarantees. However, these methods often suffer from large storage costs, with index sizes typically scaling linearly with the key space. To address this challenge, we propose leveraging learned indexes, a trending technique that repurposes machine learning models as indexing structures, to build more compact DP indexes.

artificial intelligence, machine learning, overhead, (17 more...)

2410.21164

Country:

North America > United States > New York > Broome County > Binghamton (0.04)
North America > United States > Indiana (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceSep-26-2024

Federated Learning under Attack: Improving Gradient Inversion for Batch of Images

Leite, Luiz, Santo, Yuri, Dalmazo, Bruno L., Riker, André

Federated Learning (FL) has emerged as a machine learning approach able to preserve the privacy of user's data. Applying FL, clients train machine learning models on a local dataset and a central server aggregates the learned parameters coming from the clients, training a global machine learning model without sharing user's data. However, the state-of-the-art shows several approaches to promote attacks on FL systems. For instance, inverting or leaking gradient attacks can find, with high precision, the local dataset used during the training phase of the FL. This paper presents an approach, called Deep Leakage from Gradients with Feedback Blending (DLG-FB), which is able to improve the inverting gradient attack, considering the spatial correlation that typically exists in batches of images. The performed evaluation shows an improvement of 19.18% and 48,82% in terms of attack success rate and the number of iterations per attacked image, respectively.

dataset, deep leakage, gradient, (12 more...)

2409.17767

Country:

South America > Brazil > Pará (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

arXiv.org Artificial IntelligenceMay-30-2024

Gradient Inversion of Federated Diffusion Models

Huang, Jiyue, Hong, Chi, Chen, Lydia Y., Roos, Stefanie

Diffusion models are becoming defector generative models, which generate exceptionally high-resolution image data. Training effective diffusion models require massive real data, which is privately owned by distributed parties. Each data party can collaboratively train diffusion models in a federated learning manner by sharing gradients instead of the raw data. In this paper, we study the privacy leakage risk of gradient inversion attacks. First, we design a two-phase fusion optimization, GIDM, to leverage the well-trained generative model itself as prior knowledge to constrain the inversion search (latent) space, followed by pixel-wise fine-tuning. GIDM is shown to be able to reconstruct images almost identical to the original ones. Considering a more privacy-preserving training scenario, we then argue that locally initialized private training noise $\epsilon$ and sampling step t may raise additional challenges for the inversion attack. To solve this, we propose a triple-optimization GIDM+ that coordinates the optimization of the unknown data, $\epsilon$ and $t$. Our extensive evaluation results demonstrate the vulnerability of sharing gradient for data protection of diffusion models, even high-resolution images can be reconstructed with high quality.

diffusion model, gradient, optimization, (16 more...)

2405.2038

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(10 more...)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Artificial IntelligenceApr-14-2024

On the Efficiency of Privacy Attacks in Federated Learning

Tabassum, Nawrin, Chow, Ka-Ho, Wang, Xuyu, Zhang, Wenbin, Wu, Yanzhao

Recent studies have revealed severe privacy risks in federated learning, represented by Gradient Leakage Attacks. However, existing studies mainly aim at increasing the privacy attack success rate and overlook the high computation costs for recovering private data, making the privacy attack impractical in real applications. In this study, we examine privacy attacks from the perspective of efficiency and propose a framework for improving the Efficiency of Privacy Attacks in Federated Learning (EPAFL). We make three novel contributions. First, we systematically evaluate the computational costs for representative privacy attacks in federated learning, which exhibits a high potential to optimize efficiency. Second, we propose three early-stopping techniques to effectively reduce the computational costs of these privacy attacks. Third, we perform experiments on benchmark datasets and show that our proposed method can significantly reduce computational costs and maintain comparable attack success rates for state-of-the-art privacy attacks in federated learning. We provide the codes on GitHub at https://github.com/mlsysx/EPAFL.

gradient, iteration, privacy attack, (13 more...)

2404.0943

Country:

North America > United States > Virginia (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Surrey > Guildford (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

#artificialintelligenceFeb-16-2022, 21:46:03 GMT

Introduction to Probabilistic Classification: A Machine Learning Perspective

You are capable of training and evaluating classification models, both linear and non-linear model structures. Now, you want class probabilities instead of class labels. This is the article you are looking for. This article walks you through the different evaluation metrics, its pros and cons and optimal model training for multiple ML models. Imagine creating a model with the sole purpose of classifying cats and dogs.

calibration, ml model, probability, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)

#artificialintelligenceOct-9-2021, 15:45:05 GMT

How to Create Dummy Data in Python

Dummy data is randomly generated data that can be substituted for live data. Whether you are a Developer, Software Engineer, or Data Scientist, sometimes you need dummy data to test what you have built, it can be a web app, mobile app, or machine learning model. If you are using python language, you can use a faker python package to create dummy data of any type, for example, dates, transactions, names, texts, time, and others. Faker is a simple python package that generates fake data with different data types. Faker package is heavily inspired by PHP Faker, Perl Faker, and by Ruby Faker.

create dummy data, dummy data, faker, (15 more...)

Country:

Europe > Slovakia (0.16)
South America > Brazil (0.05)
Oceania > Wallis and Futuna (0.05)
(10 more...)

Industry: Information Technology > Software (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.56)
Information Technology > Communications (0.51)
Information Technology > Software > Programming Languages (0.36)

#artificialintelligenceSep-8-2020, 16:56:18 GMT

Machine Learning Algorithms. Here's the End-to-End.

While there are several documents and articles on machine learning algorithms, I wanted to provide a summary of the most common ones I use as a professional data scientist. Additionally, I will include some sample code with dummy data so that you can start executing various models! Whereas unsupervised learning, like the commonly used K-means algorithm, aims to groups similar groups of data together without labels, supervised learning, or classification -- well, classifies data into various categories. A simple example of classification is described below. The classification model learns from the features about the fruits to suggest an input food a fruit label.

algorithm, artificial intelligence, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

#artificialintelligenceJan-25-2020, 14:15:00 GMT

Understanding K-Means Clustering using Python the easy way

In the previous article, we studied the k-NN. One thing that I believe is that if we can correlate anything with us or our lives, there are greater chances of understanding the concept. So I will try to explain everything by relating it to humans. It tries to make the inter-cluster data points as similar as possible while also keeping the clusters as different or as far as possible. It assigns data points to a cluster such that the sum of the squared distance between the data points and the cluster's centroid is at the minimum.

algorithm, centroid, classification, (13 more...)

Industry:

Media > Music (0.40)
Leisure & Entertainment (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)