AITopics | Li, Ninghui

Collaborating Authors

Li, Ninghui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

Zhang, Kaiyuan, Cheng, Siyuan, Shen, Guangyu, Ribeiro, Bruno, An, Shengwei, Chen, Pin-Yu, Zhang, Xiangyu, Li, Ninghui

arXiv.org Artificial IntelligenceJan-26-2025

Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client's private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private training instances from a client's gradient vectors. Recently, researchers have proposed advanced gradient inversion techniques that existing defenses struggle to handle effectively. In this work, we present a novel defense tailored for large neural network models. Our defense capitalizes on the high dimensionality of the model parameters to perturb gradients within a subspace orthogonal to the original gradient. By leveraging cold posteriors over orthogonal subspaces, our defense implements a refined gradient update mechanism. This enables the selection of an optimal gradient that not only safeguards against gradient inversion attacks but also maintains model utility. We conduct comprehensive experiments across three different datasets and evaluate our defense against various state-of-the-art attacks and defenses. Code is available at https://censor-gradient.github.io.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.14722/ndss.2025.230915

2501.15718

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.92)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

Federated Learning Privacy: Attacks, Defenses, Applications, and Policy Landscape - A Survey

Zhao, Joshua C., Bagchi, Saurabh, Avestimehr, Salman, Chan, Kevin S., Chaterji, Somali, Dimitriadis, Dimitris, Li, Jiacheng, Li, Ninghui, Nourian, Arash, Roth, Holger R.

arXiv.org Artificial IntelligenceMay-6-2024

Deep learning has shown incredible potential across a vast array of tasks and accompanying this growth has been an insatiable appetite for data. However, a large amount of data needed for enabling deep learning is stored on personal devices and recent concerns on privacy have further highlighted challenges for accessing such data. As a result, federated learning (FL) has emerged as an important privacy-preserving technology enabling collaborative training of machine learning models without the need to send the raw, potentially sensitive, data to a central server. However, the fundamental premise that sending model updates to a server is privacy-preserving only holds if the updates cannot be "reverse engineered" to infer information about the private training data. It has been shown under a wide variety of settings that this premise for privacy does {\em not} hold. In this survey paper, we provide a comprehensive literature review of the different privacy attacks and defense methods in FL. We identify the current limitations of these attacks and highlight the settings in which FL client privacy can be broken. We dissect some of the successful industry applications of FL and draw lessons for future successful adoption. We survey the emerging landscape of privacy regulation for FL. We conclude with future directions for taking FL toward the cherished goal of generating accurate models while preserving the privacy of the data from its participants.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.03636

Country:

Europe (1.00)
North America > United States > Indiana > Tippecanoe County (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)
Research Report > Experimental Study (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.45)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

Towards Principled Assessment of Tabular Data Synthesis Algorithms

Du, Yuntao, Li, Ninghui

arXiv.org Artificial IntelligenceFeb-9-2024

Data synthesis has been advocated as an important approach for utilizing data while protecting data privacy. A large number of tabular data synthesis algorithms (which we call synthesizers) have been proposed. Some synthesizers satisfy Differential Privacy, while others aim to provide privacy in a heuristic fashion. A comprehensive understanding of the strengths and weaknesses of these synthesizers remains elusive due to lacking principled evaluation metrics and missing head-to-head comparisons of newly developed synthesizers that take advantage of diffusion models and large language models with state-of-the-art marginal-based synthesizers. In this paper, we present a principled and systematic evaluation framework for assessing tabular data synthesis algorithms. Specifically, we examine and critique existing evaluation metrics, and introduce a set of new metrics in terms of fidelity, privacy, and utility to address their limitations. Based on the proposed metrics, we also devise a unified objective for tuning, which can consistently improve the quality of synthetic data for all methods. We conducted extensive evaluations of 8 different types of synthesizers on 12 datasets and identified some interesting findings, which offer new directions for privacy-preserving data synthesis.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2402.06806

Genre: Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Continuous Release of Data Streams under both Centralized and Local Differential Privacy

Wang, Tianhao, Chen, Joann Qiongna, Zhang, Zhikun, Su, Dong, Cheng, Yueqiang, Li, Zhou, Li, Ninghui, Jha, Somesh

arXiv.org Artificial IntelligenceDec-7-2023

In this paper, we study the problem of publishing a stream of real-valued data satisfying differential privacy (DP). One major challenge is that the maximal possible value can be quite large; thus it is necessary to estimate a threshold so that numbers above it are truncated to reduce the amount of noise that is required to all the data. The estimation must be done based on the data in a private fashion. We develop such a method that uses the Exponential Mechanism with a quality function that approximates well the utility goal while maintaining a low sensitivity. Given the threshold, we then propose a novel online hierarchical method and several post-processing techniques. Building on these ideas, we formalize the steps into a framework for private publishing of stream data. Our framework consists of three components: a threshold optimizer that privately estimates the threshold, a perturber that adds calibrated noises to the stream, and a smoother that improves the result using post-processing. Within our framework, we design an algorithm satisfying the more stringent setting of DP called local DP (LDP). To our knowledge, this is the first LDP algorithm for publishing streaming data. Using four real-world datasets, we demonstrate that our mechanism outperforms the state-of-the-art by a factor of 6-10 orders of magnitude in terms of utility (measured by the mean squared error of answering a random range query).

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2005.11753

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Information Management (0.92)
(2 more...)

Add feedback

MIST: Defending Against Membership Inference Attacks Through Membership-Invariant Subspace Training

Li, Jiacheng, Li, Ninghui, Ribeiro, Bruno

arXiv.org Artificial IntelligenceNov-1-2023

In Member Inference (MI) attacks, the adversary try to determine whether an instance is used to train a machine learning (ML) model. MI attacks are a major privacy concern when using private data to train ML models. Most MI attacks in the literature take advantage of the fact that ML models are trained to fit the training data well, and thus have very low loss on training instances. Most defenses against MI attacks therefore try to make the model fit the training data less well. Doing so, however, generally results in lower accuracy. We observe that training instances have different degrees of vulnerability to MI attacks. Most instances will have low loss even when not included in training. For these instances, the model can fit them well without concerns of MI attacks. An effective defense only needs to (possibly implicitly) identify instances that are vulnerable to MI attacks and avoids overfitting them. A major challenge is how to achieve such an effect in an efficient training process. Leveraging two distinct recent advancements in representation learning: counterfactually-invariant representations and subspace learning methods, we introduce a novel Membership-Invariant Subspace Training (MIST) method to defend against MI attacks. MIST avoids overfitting the vulnerable instances without significant impact on other instances. We have conducted extensive experimental studies, comparing MIST with various other state-of-the-art (SOTA) MI defenses against several SOTA MI attacks. We find that MIST outperforms other defenses while resulting in minimal reduction in testing accuracy.

artificial intelligence, machine learning, membership-invariant subspace training, (2 more...)

arXiv.org Artificial Intelligence

2311.00919

Genre: Research Report (0.89)

Industry: Information Technology > Security & Privacy (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.93)

Add feedback

Differentially Private Vertical Federated Clustering

Li, Zitao, Wang, Tianhao, Li, Ninghui

arXiv.org Artificial IntelligenceMar-30-2023

In many applications, multiple parties have private data regarding the same set of users but on disjoint sets of attributes, and a server wants to leverage the data to train a model. To enable model learning while protecting the privacy of the data subjects, we need vertical federated learning (VFL) techniques, where the data parties share only information for training the model, instead of the private data. However, it is challenging to ensure that the shared information maintains privacy while learning accurate models. To the best of our knowledge, the algorithm proposed in this paper is the first practical solution for differentially private vertical federated k-means clustering, where the server can obtain a set of global centers with a provable differential privacy guarantee. Our algorithm assumes an untrusted central server that aggregates differentially private local centers and membership encodings from local data parties. It builds a weighted grid as the synopsis of the global dataset based on the received information. Final centers are generated by running any k-means algorithm on the weighted grid. Our approach for grid weight estimation uses a novel, light-weight, and differentially private set intersection cardinality estimation algorithm based on the Flajolet-Martin sketch. To improve the estimation accuracy in the setting with more than two data parties, we further propose a refined version of the weights estimation algorithm and a parameter tuning strategy to reduce the final k-means utility to be close to that in the central private setting. We provide theoretical utility analysis and experimental evaluation results for the cluster centers computed by our algorithm and show that our approach performs better both theoretically and empirically than the two baselines based on existing techniques.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.14778/3583140.3583146 10.14778/3583140.3583146

2208.017

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

Random Spiking and Systematic Evaluation of Defenses Against Adversarial Examples

Ge, Huangyi, Chau, Sze Yiu, Li, Ninghui

arXiv.org Artificial IntelligenceDec-4-2018

Abstract--Image classifiers often suffer from adversarial examples, whichare generated by adding a small amount of noises to input images to trick classifiers into misclassification. Over the years, many defense mechanisms have been proposed, and different researchers have made seemingly contradictory claims on their effectiveness. We argue that such discrepancies are primarily dueto inconsistent assumptions on the attacker's knowledge. To this end, we present an analysis of possible adversarial models, and propose an evaluation framework for comparing different defense mechanisms. As part of the framework, we introduced a more powerful and realistic adversary strategy. We propose a new defense mechanism called Random Spiking (RS), which generalizes dropout and introduces random noises in the training process in a controlled manner. With a carefully chosen placement, RS incurs negligible negative impact on prediction accuracy. Evaluations under our proposed framework suggest RS delivers better protection against adversarial examples than many existing schemes. I. INTRODUCTION Modern society is increasingly reliant upon software systems trainedby machine learning techniques. Many such techniques, however,were designed under the implicit assumption that both the training and test data follow the same static (although possibly unknown) distribution. In the presence of intelligent and resourceful adversaries, this assumption may no longer hold. A malicious adversary can deliberately manipulate aninput instance to make it deviate from the distribution of the training/testing dataset, and cause the learning algorithms and the trained models to behave unexpectedly. For example, it is found that existing image classifiers based on Deep Neural Networks are highly vulnerable to adversarial examples [1], [2]. Often times, by modifying an image in a way that is barely noticeable by humans, the classifier will confidently classify it as something else. This phenomenon also exists for classifiers that do not use neural networks, and has been called "optical illusions for machines". Many approaches have since been proposed to help defend against adversarial examples. For example, Goodfellow et al. [1] proposed adversarial training, in which one trains a neural network using both the original training dataset and the newly generated adversarial examples. When given an input instance, one generates multiple instances by adding small amount of randomly generated noises to the original instance, collects the predictions on all perturbed instances, and uses majority voting to make the final prediction. Some approaches attempt to train additional neural network models to identify and reject adversarial examples [4], [5].

adversarial example, deep learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

1812.01804

Country: North America > United States > Indiana > Tippecanoe County (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback