AITopics | Wang, Chendi

Collaborating Authors

Wang, Chendi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The 2020 United States Decennial Census Is More Private Than You (Might) Think

Su, Buxin, Su, Weijie J., Wang, Chendi

arXiv.org Machine LearningOct-11-2024

The U.S. Decennial Census serves as the foundation for many high-profile policy decision-making processes, including federal funding allocation and redistricting. In 2020, the Census Bureau adopted differential privacy to protect the confidentiality of individual responses through a disclosure avoidance system that injects noise into census data tabulations. The Bureau subsequently posed an open question: Could sharper privacy guarantees be obtained for the 2020 U.S. Census compared to their published guarantees, or equivalently, had the nominal privacy budgets been fully utilized? In this paper, we affirmatively address this open problem by demonstrating that between 8.50% and 13.76% of the privacy budget for the 2020 U.S. Census remains unused for each of the eight geographical levels, from the national level down to the block level. This finding is made possible through our precise tracking of privacy losses using $f$-differential privacy, applied to the composition of private queries across various geographical levels. Our analysis indicates that the Census Bureau introduced unnecessarily high levels of injected noise to achieve the claimed privacy guarantee for the 2020 U.S. Census. Consequently, our results enable the Bureau to reduce noise variances by 15.08% to 24.82% while maintaining the same privacy budget for each geographical level, thereby enhancing the accuracy of privatized census statistics. We empirically demonstrate that reducing noise injection into census statistics mitigates distortion caused by privacy constraints in downstream applications of private census data, illustrated through a study examining the relationship between earnings and education.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

2410.09296

Country:

North America > United States (1.00)
Europe > Italy > Sicily (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Neural Collapse Meets Differential Privacy: Curious Behaviors of NoisyGD with Near-perfect Representation Learning

Wang, Chendi, Zhu, Yuqing, Su, Weijie J., Wang, Yu-Xiang

arXiv.org Machine LearningMay-16-2024

A recent study by De et al. (2022) has reported that large-scale representation learning through pre-training on a public dataset significantly enhances differentially private (DP) learning in downstream tasks, despite the high dimensionality of the feature space. To theoretically explain this phenomenon, we consider the setting of a layer-peeled model in representation learning, which results in interesting phenomena related to learned features in deep learning and transfer learning, known as Neural Collapse (NC). Within the framework of NC, we establish an error bound indicating that the misclassification error is independent of dimension when the distance between actual features and the ideal ones is smaller than a threshold. Additionally, the quality of the features in the last layer is empirically evaluated under different pre-trained models within the framework of NC, showing that a more powerful transformer leads to a better feature representation. Furthermore, we reveal that DP fine-tuning is less robust compared to fine-tuning without DP, particularly in the presence of perturbations. These observations are supported by both theoretical analyses and experimental evaluation. Moreover, to enhance the robustness of DP fine-tuning, we suggest several strategies, such as feature normalization or employing dimension reduction methods like Principal Component Analysis (PCA). Empirically, we demonstrate a significant improvement in testing accuracy by conducting PCA on the last-layer features.

artificial intelligence, machine learning, misclassification error, (17 more...)

arXiv.org Machine Learning

2405.0892

Country:

North America > United States > California (0.67)
Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Unified Enhancement of Privacy Bounds for Mixture Mechanisms via $f$-Differential Privacy

Wang, Chendi, Su, Buxin, Ye, Jiayuan, Shokri, Reza, Su, Weijie J.

arXiv.org Machine LearningNov-1-2023

Differentially private (DP) machine learning algorithms incur many sources of randomness, such as random initialization, random batch subsampling, and shuffling. However, such randomness is difficult to take into account when proving differential privacy bounds because it induces mixture distributions for the algorithm's output that are difficult to analyze. This paper focuses on improving privacy bounds for shuffling models and one-iteration differentially private gradient descent (DP-GD) with random initializations using $f$-DP. We derive a closed-form expression of the trade-off function for shuffling models that outperforms the most up-to-date results based on $(\epsilon,\delta)$-DP. Moreover, we investigate the effects of random initialization on the privacy of one-iteration DP-GD. Our numerical computations of the trade-off function indicate that random initialization can enhance the privacy of DP-GD. Our analysis of $f$-DP guarantees for these mixture mechanisms relies on an inequality for trade-off functions introduced in this paper. This inequality implies the joint convexity of $F$-divergences. Finally, we study an $f$-DP analog of the advanced joint convexity of the hockey-stick divergence related to $(\epsilon,\delta)$-DP and apply it to analyze the privacy of mixture mechanisms.

artificial intelligence, machine learning, trade-off function, (16 more...)

arXiv.org Machine Learning

2310.19973

Country:

Europe (1.00)
Asia (1.00)
North America > United States > California > Los Angeles County (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Statistical Theory of Differentially Private Marginal-based Data Synthesis Algorithms

Li, Ximing, Wang, Chendi, Cheng, Guang

arXiv.org Artificial IntelligenceJan-24-2023

Marginal-based methods achieve promising performance in the synthetic data competition hosted by the National Institute of Standards and Technology (NIST). To deal with high-dimensional data, the distribution of synthetic data is represented by a probabilistic graphical model (e.g., a Bayesian network), while the raw data distribution is approximated by a collection of low-dimensional marginals. Differential privacy (DP) is guaranteed by introducing random noise to each lowdimensional marginal distribution. Despite its promising performance in practice, the statistical properties of marginal-based methods are rarely studied in the literature. In this paper, we study DP data synthesis algorithms based on Bayesian networks (BN) from a statistical perspective. Related to downstream machine learning tasks, an upper bound for the utility error of the DP synthetic data is also derived. To complete the picture, we establish a lower bound for TV accuracy that holds for everyǫ-DP synthetic data generator. In recent years, the problem of privacy-preserving data analysis has become increasingly important and differential privacy (Dwork et al., 2006) appears as the foundation of data privacy. Differential privacy (DP) techniques are widely adopted by industrial companies and the U.S. Census Bureau (Johnson et al., 2017; Erlingsson et al., 2014; Nguyên et al., 2016; The U.S. Census Bureau, 2020; Abowd, 2018).

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2301.08844

Country:

North America > United States > Minnesota (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.56)

Add feedback