AITopics | Wu, Zhiwei Steven

Collaborating Authors

Wu, Zhiwei Steven

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Validating LLM-as-a-Judge Systems in the Absence of Gold Labels

Guerdan, Luke, Barocas, Solon, Holstein, Kenneth, Wallach, Hanna, Wu, Zhiwei Steven, Chouldechova, Alexandra

arXiv.org Artificial IntelligenceMar-11-2025

The LLM-as-a-judge paradigm, in which a judge LLM system replaces human raters in rating the outputs of other generative AI (GenAI) systems, has come to play a critical role in scaling and standardizing GenAI evaluations. To validate judge systems, evaluators collect multiple human ratings for each item in a validation corpus, and then aggregate the ratings into a single, per-item gold label rating. High agreement rates between these gold labels and judge system ratings are then taken as a sign of good judge system performance. In many cases, however, items or rating criteria may be ambiguous, or there may be principled disagreement among human raters. In such settings, gold labels may not exist for many of the items. In this paper, we introduce a framework for LLM-as-a-judge validation in the absence of gold labels. We present a theoretical analysis drawing connections between different measures of judge system performance under different rating elicitation and aggregation schemes. We also demonstrate empirically that existing validation approaches can select judge systems that are highly suboptimal, performing as much as 34% worse than the systems selected by alternative approaches that we describe. Based on our findings, we provide concrete recommendations for developing more reliable approaches to LLM-as-a-judge validation.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.05965

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

All Roads Lead to Likelihood: The Value of Reinforcement Learning in Fine-Tuning

Swamy, Gokul, Choudhury, Sanjiban, Sun, Wen, Wu, Zhiwei Steven, Bagnell, J. Andrew

arXiv.org Artificial IntelligenceMar-2-2025

From a first-principles perspective, it may seem odd that the strongest results in foundation model fine-tuning (FT) are achieved via a relatively complex, two-stage training procedure. Specifically, one first trains a reward model (RM) on some dataset (e.g. human preferences) before using it to provide online feedback as part of a downstream reinforcement learning (RL) procedure, rather than directly optimizing the policy parameters on the dataset via offline maximum likelihood estimation. In fact, from an information-theoretic perspective, we can only lose information via passing through a reward model and cannot create any new information via on-policy sampling. To explain this discrepancy, we scrutinize several hypotheses on the value of RL in FT through both theoretical and empirical lenses. Of the hypotheses considered, we find the most support for the explanation that on problems with a generation-verification gap, the combination of the ease of learning the relatively simple RM (verifier) from the preference data, coupled with the ability of the downstream RL procedure to then filter its search space to the subset of policies (generators) that are optimal for relatively simple verifiers is what leads to the superior performance of online FT.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.01067

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Kandinsky Conformal Prediction: Beyond Class- and Covariate-Conditional Coverage

Bairaktari, Konstantina, Wu, Jiayun, Wu, Zhiwei Steven

arXiv.org Machine LearningFeb-24-2025

Conformal prediction is a powerful distribution-free framework for constructing prediction sets with coverage guarantees. Classical methods, such as split conformal prediction, provide marginal coverage, ensuring that the prediction set contains the label of a random test point with a target probability. However, these guarantees may not hold uniformly across different subpopulations, leading to disparities in coverage. Prior work has explored coverage guarantees conditioned on events related to the covariates and label of the test point. We present Kandinsky conformal prediction, a framework that significantly expands the scope of conditional coverage guarantees. In contrast to Mondrian conformal prediction, which restricts its coverage guarantees to disjoint groups -- reminiscent of the rigid, structured grids of Piet Mondrian's art -- our framework flexibly handles overlapping and fractional group memberships defined jointly on covariates and labels, reflecting the layered, intersecting forms in Wassily Kandinsky's compositions. Our algorithm unifies and extends existing methods, encompassing covariate-based group conditional, class conditional, and Mondrian conformal prediction as special cases, while achieving a minimax-optimal high-probability conditional coverage bound. Finally, we demonstrate the practicality of our approach through empirical evaluation on real-world datasets.

artificial intelligence, machine learning, prediction, (14 more...)

arXiv.org Machine Learning

2502.17264

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Sicily (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

The Cost of Shuffling in Private Gradient Based Optimization

Jiang, Shuli, Sharma, Pranay, Wu, Zhiwei Steven, Joshi, Gauri

arXiv.org Artificial IntelligenceFeb-5-2025

We consider the problem of differentially private (DP) convex empirical risk minimization (ERM). While the standard DP-SGD algorithm is theoretically well-established, practical implementations often rely on shuffled gradient methods that traverse the training data sequentially rather than sampling with replacement in each iteration. Despite their widespread use, the theoretical privacy-accuracy trade-offs of private shuffled gradient methods (\textit{DP-ShuffleG}) remain poorly understood, leading to a gap between theory and practice. In this work, we leverage privacy amplification by iteration (PABI) and a novel application of Stein's lemma to provide the first empirical excess risk bound of \textit{DP-ShuffleG}. Our result shows that data shuffling results in worse empirical excess risk for \textit{DP-ShuffleG} compared to DP-SGD. To address this limitation, we propose \textit{Interleaved-ShuffleG}, a hybrid approach that integrates public data samples in private optimization. By alternating optimization steps that use private and public samples, \textit{Interleaved-ShuffleG} effectively reduces empirical excess risk. Our analysis introduces a new optimization framework with surrogate objectives, adaptive noise injection, and a dissimilarity metric, which can be of independent interest. Our experiments on diverse datasets and tasks demonstrate the superiority of \textit{Interleaved-ShuffleG} over several baselines.

artificial intelligence, machine learning, priv, (17 more...)

arXiv.org Artificial Intelligence

2502.03652

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information

Balcan, Maria-Florina, Bernasconi, Martino, Castiglioni, Matteo, Celli, Andrea, Harris, Keegan, Wu, Zhiwei Steven

arXiv.org Artificial IntelligenceJan-31-2025

We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. In every round the leader observes contextual information and commits to a mixed strategy, after which the follower best-responds. We provide learning algorithms for the leader which achieve $O(T^{1/2})$ regret under bandit feedback, an improvement from the previously best-known rates of $O(T^{2/3})$. Our algorithms rely on a reduction to linear contextual bandits in the utility space: In each round, a linear contextual bandit algorithm recommends a utility vector, which our algorithm inverts to determine the leader's mixed strategy. We extend our algorithms to the setting in which the leader's utility function is unknown, and also apply it to the problems of bidding in second-price auctions with side information and online Bayesian persuasion with public and private states. Finally, we observe that our algorithms empirically outperform previous results on numerical simulations.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.00204

Country: Europe > Middle East > Cyprus (0.14)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.46)
Education (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.35)

Add feedback

Position: LLM Unlearning Benchmarks are Weak Measures of Progress

Thaker, Pratiksha, Hu, Shengyuan, Kale, Neil, Maurya, Yash, Wu, Zhiwei Steven, Smith, Virginia

arXiv.org Artificial IntelligenceOct-3-2024

Unlearning methods have the potential to improve the privacy and safety of large language models (LLMs) by removing sensitive or harmful information post hoc. The LLM unlearning research community has increasingly turned toward empirical benchmarks to assess the effectiveness of such methods. In this paper, we find that existing benchmarks provide an overly optimistic and potentially misleading view on the effectiveness of candidate unlearning methods. By introducing simple, benign modifications to a number of popular benchmarks, we expose instances where supposedly unlearned information remains accessible, or where the unlearning process has degraded the model's performance on retained information to a much greater extent than indicated by the original benchmark. We identify that existing benchmarks are particularly vulnerable to modifications that introduce even loose dependencies between the forget and retain information. Further, we show that ambiguity in unlearning targets in existing benchmarks can easily lead to the design of methods that overfit to the given test queries. Based on our findings, we urge the community to be cautious when interpreting benchmark results as reliable measures of progress, and we provide several recommendations to guide future LLM unlearning research.

benchmark, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.02879

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Jogging the Memory of Unlearned Model Through Targeted Relearning Attack

Hu, Shengyuan, Fu, Yiwei, Wu, Zhiwei Steven, Smith, Virginia

arXiv.org Artificial IntelligenceJun-19-2024

Machine unlearning is a promising approach to mitigate undesirable memorization of training data in ML models. However, in this work we show that existing approaches for unlearning in LLMs are surprisingly susceptible to a simple set of targeted relearning attacks. With access to only a small and potentially loosely related set of data, we find that we can 'jog' the memory of unlearned models to reverse the effects of unlearning. We formalize this unlearning-relearning pipeline, explore the attack across three popular unlearning benchmarks, and discuss future directions and guidelines that result from our study.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.13356

Country:

North America > United States (0.28)
Asia (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.35)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Guardrail Baselines for Unlearning in LLMs

Thaker, Pratiksha, Maurya, Yash, Hu, Shengyuan, Wu, Zhiwei Steven, Smith, Virginia

arXiv.org Artificial IntelligenceJun-11-2024

Recent years have seen two trends emerge simultaneously: large language models (LLMs) trained on increasing amounts of user data (generally scraped indiscriminately from the web), in parallel with increasing legal protections on digital data use including data revocation ("right to be forgotten") laws. In order to support data revocation for models that have already been trained on potentially sensitive data, a number of works have proposed approaches for data "unlearning" (Bourtoule et al., 2021; Gupta et al., 2021; Ginart et al., 2019), which aims to remove the influence of specific subsets of training data without entirely retraining a model. Unlearning in LLMs is particularly challenging because individuals' information may not be contained to specific data points (Brown et al., 2022; Tramèr et al., 2022). Nevertheless, recent work has shown that model finetuning is a promising approach to forget, for example, information corresponding to the book series Harry Potter (Eldan and Russinovich, 2023); information about specific individuals in a synthetic dataset (Maini et al., 2024); or knowledge that could give information to malicious agents Li et al. (2024). While finetuning is a promising approach, a number of recent works have shown that simple modifications to the input prompt or output postprocessing filters (which we collectively call "guardrails") can also be effective for generating a desirable output distribution from a model (Pawelczyk et al., 2023; Brown et al., 2020; Chowdhery et al., 2023; Wei et al., 2021; Kim et al., 2024). Prompt prefixes and postprocessing filters do not update the model weights, so the resulting model itself would not satisfy definitions of unlearning that require the distribution of model weights to match a model retrained from scratch Bourtoule et al. (2021). However, in practical settings where users can only access the model through an API, modifying the output distribution alone can suffice. In fact, most existing unlearning benchmarks (Eldan and Russinovich, 2023; Maini et al., 2024; unl, 2023; Li et al., 2024) only examine the model outputs when evaluating unlearning, which is consistent with a threat model in which users have only API access (see Section 3). In this paper, we investigate how existing benchmarks fare under guardrail-based approaches, and show that in three popular unlearning benchmarks, guardrails not only give strong performance comparable to finetuning baselines, but can also surface weaknesses or inconsistencies in the benchmarks or metrics themselves.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2403.03329

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)
Government (0.93)
Law (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Predictive Performance Comparison of Decision Policies Under Confounding

Guerdan, Luke, Coston, Amanda, Holstein, Kenneth, Wu, Zhiwei Steven

arXiv.org Artificial IntelligenceJun-11-2024

Predictive models are often introduced to decision-making tasks under the rationale that they improve performance over an existing decision-making policy. However, it is challenging to compare predictive performance against an existing decision-making policy that is generally under-specified and dependent on unobservable factors. These sources of uncertainty are often addressed in practice by making strong assumptions about the data-generating mechanism. In this work, we propose a method to compare the predictive performance of decision policies under a variety of modern identification approaches from the causal inference and off-policy evaluation literatures (e.g., instrumental variable, marginal sensitivity model, proximal variable). Key to our method is the insight that there are regions of uncertainty that we can safely ignore in the policy comparison. We develop a practical approach for finite-sample estimation of regret intervals under no assumptions on the parametric form of the status quo policy. We verify our framework theoretically and via synthetic data experiments. We conclude with a real-world application using our framework to support a pre-deployment evaluation of a proposed modification to a healthcare enrollment policy.

data mining, machine learning, predictive performance comparison, (17 more...)

arXiv.org Artificial Intelligence

2404.00848

Country: Europe > Austria > Vienna (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Orthogonal Causal Calibration

Whitehouse, Justin, Jung, Christopher, Syrgkanis, Vasilis, Wilder, Bryan, Wu, Zhiwei Steven

arXiv.org Machine LearningJun-3-2024

Estimates of causal parameters such as conditional average treatment effects and conditional quantile treatment effects play an important role in real-world decision making. Given this importance, one should ensure these estimators are calibrated. While there is a rich literature on calibrating estimators of non-causal parameters, very few methods have been derived for calibrating estimators of causal parameters, or more generally estimators of quantities involving nuisance parameters. In this work, we provide a general framework for calibrating predictors involving nuisance estimation. We consider a notion of calibration defined with respect to an arbitrary, nuisance-dependent loss $\ell$, under which we say an estimator $\theta$ is calibrated if its predictions cannot be changed on any level set to decrease loss. We prove generic upper bounds on the calibration error of any causal parameter estimate $\theta$ with respect to any loss $\ell$ using a concept called Neyman Orthogonality. Our bounds involve two decoupled terms - one measuring the error in estimating the unknown nuisance parameters, and the other representing the calibration error in a hypothetical world where the learned nuisance estimates were true. We use our bound to analyze the convergence of two sample splitting algorithms for causal calibration. One algorithm, which applies to universally orthogonalizable loss functions, transforms the data into generalized pseudo-outcomes and applies an off-the-shelf calibration procedure. The other algorithm, which applies to conditionally orthogonalizable loss functions, extends the classical uniform mass binning algorithm to include nuisance estimation. Our results are exceedingly general, showing that essentially any existing calibration algorithm can be used in causal settings, with additional loss only arising from errors in nuisance estimation.

artificial intelligence, calibration error, machine learning, (17 more...)

arXiv.org Machine Learning

2406.01933

Country:

Europe (0.45)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback