AITopics | Hu, Yaxi

Collaborating Authors

Hu, Yaxi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Differentially Private Steering for Large Language Model Alignment

Goel, Anmol, Hu, Yaxi, Gurevych, Iryna, Sanyal, Amartya

arXiv.org Artificial IntelligenceJan-30-2025

Aligning Large Language Models (LLMs) with human values and away from undesirable behaviors (such as hallucination) has become increasingly important. Recently, steering LLMs towards a desired behavior via activation editing has emerged as an effective method to mitigate harmful generations at inference-time. Activation editing modifies LLM representations by preserving information from positive demonstrations (e.g., truthful) and minimising information from negative demonstrations (e.g., hallucinations). When these demonstrations come from a private dataset, the aligned LLM may leak private information contained in those private samples. In this work, we present the first study of aligning LLM behavior with private datasets. Our work proposes the Private Steering for LLM Alignment (PSA) algorithm to edit LLM activations with differential privacy (DP) guarantees. We conduct extensive experiments on seven different benchmarks with opensource LLMs of different sizes (0.5B to 7B) and model families (LlaMa, Qwen, Mistral and Gemma). Our results show that PSA achieves DP guarantees for LLM alignment with minimal loss in performance, including alignment metrics, openended text generation quality, and general-purpose reasoning. We also develop the first Membership Inference Attack (MIA) for evaluating and auditing the empirical privacy for the problem of LLM steering via activation editing. Our attack is tailored for activation editing and relies solely on the generated texts without their associated probabilities. Our experiments support the theoretical guarantees by showing improved guarantees for our PSA algorithm compared to several existing non-private techniques. LLMs often generate inaccurate, biased or even harmful information that violates human values and preferences (Rawte et al., 2023). In response, recent research has increasingly focused on aligning LLMs towards certain desired behaviors (Konen et al., 2024) while preventing potentially harmful and unsafe outcomes. This has led to the development of several techniques for aligning LLMs, such as Reinforcement Learning from Human Feedback (RLHF) (Ouyang et al., 2022), instruction tuning (Wei et al., 2022), In-Context Learning (ICL) (Dong et al., 2022), and prompt engineering (Cheng et al., 2024). Nevertheless, several challenges remain, including the lack of diverse and representative datasets for alignment (Liu et al., 2024c), difficulties in addressing out-of-distribution issues (Liu et al., 2024a), the choice of alignment strategy (Ivison et al., 2024) and the lack of interpretability in traditional alignment methods (Lee et al., 2024). The linear representation hypothesis (Park et al., 2024b) suggests that high-level concepts are linearly represented as directions in the representation space of LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.18532

Country:

Asia (0.93)
North America > United States > California (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (0.93)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation

Sanyal, Amartya, Hu, Yaxi, Yu, Yaodong, Ma, Yian, Wang, Yixin, Schölkopf, Bernhard

arXiv.org Machine LearningJun-27-2024

"Accuracy-on-the-line" is a widely observed phenomenon in machine learning, where a model's accuracy on in-distribution (ID) and out-of-distribution (OOD) data is positively correlated across different hyperparameters and data configurations. But when does this useful relationship break down? In this work, we explore its robustness. The key observation is that noisy data and the presence of nuisance features can be sufficient to shatter the Accuracy-on-the-line phenomenon. In these cases, ID and OOD accuracy can become negatively correlated, leading to "Accuracy-on-the-wrong-line". This phenomenon can also occur in the presence of spurious (shortcut) features, which tend to overshadow the more complex signal (core, non-spurious) features, resulting in a large nuisance feature space. Moreover, scaling to larger datasets does not mitigate this undesirable behavior and may even exacerbate it. We formally prove a lower bound on Out-of-distribution (OOD) error in a linear classification model, characterizing the conditions on the noise and nuisance features for a large OOD error. We finally demonstrate this phenomenon across both synthetic and real datasets with noisy data and nuisance features.

artificial intelligence, data quality, machine learning, (17 more...)

arXiv.org Machine Learning

2406.19049

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Energy (0.93)
Transportation > Infrastructure & Services (0.67)

Technology:

Information Technology > Data Science > Data Quality > Data Cleaning (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Provable Privacy with Non-Private Pre-Processing

Hu, Yaxi, Sanyal, Amartya, Schölkopf, Bernhard

arXiv.org Machine LearningApr-10-2024

When analysing Differentially Private (DP) machine learning pipelines, the potential privacy cost of data-dependent pre-processing is frequently overlooked in privacy accounting. In this work, we propose a general framework to evaluate the additional privacy cost incurred by non-private data-dependent pre-processing algorithms. Our framework establishes upper bounds on the overall privacy guarantees by utilising two new technical notions: a variant of DP termed Smooth DP and the bounded sensitivity of the pre-processing algorithms. In addition to the generic framework, we provide explicit overall privacy guarantees for multiple data-dependent pre-processing algorithms, such as data imputation, quantization, deduplication and PCA, when used in combination with several DP algorithms. Notably, this framework is also simple to implement, allowing direct integration into existing DP pipelines.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2403.13041

Country:

North America > United States (0.28)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

PILLAR: How to make semi-private learning more effective

Pinto, Francesco, Hu, Yaxi, Yang, Fanny, Sanyal, Amartya

arXiv.org Artificial IntelligenceJun-6-2023

In Semi-Supervised Semi-Private (SP) learning, the learner has access to both public unlabelled and private labelled data. We propose a computationally efficient algorithm that, under mild assumptions on the data, provably achieves significantly lower private labelled sample complexity and can be efficiently run on real-world datasets. For this purpose, we leverage the features extracted by networks pre-trained on public (labelled or unlabelled) data, whose distribution can significantly differ from the one on which SP learning is performed. To validate its empirical effectiveness, we propose a wide variety of experiments under tight privacy constraints ($\epsilon = 0.1$) and with a focus on low-data regimes. In all of these settings, our algorithm exhibits significantly improved performance over available baselines that use similar amounts of public data.

artificial intelligence, dataset, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2306.03962

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

How unfair is private learning ?

Sanyal, Amartya, Hu, Yaxi, Yang, Fanny

arXiv.org Artificial IntelligenceDec-24-2022

As machine learning algorithms are deployed on sensitive data in critical decision making processes, it is becoming increasingly important that they are also private and fair. In this paper, we show that, when the data has a long-tailed structure, it is not possible to build accurate learning algorithms that are both private and results in higher accuracy on minority subpopulations. We further show that relaxing overall accuracy can lead to good fairness even with strict privacy requirements. To corroborate our theoretical results in practice, we provide an extensive set of experimental results using a variety of synthetic, vision (CIFAR10 and CelebA), and tabular (Law School) datasets and learning algorithms.

artificial intelligence, machine learning, null, (18 more...)

arXiv.org Artificial Intelligence

2206.03985

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.88)
Education > Educational Setting > Higher Education (0.54)
Education > Curriculum > Subject-Specific Education (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback