AITopics | Zhao, Eric

Collaborating Authors

Zhao, Eric

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

From Style to Facts: Mapping the Boundaries of Knowledge Injection with Finetuning

Zhao, Eric, Awasthi, Pranjal, Haghtalab, Nika

arXiv.org Artificial IntelligenceMar-7-2025

Finetuning provides a scalable and cost-effective means of customizing language models for specific tasks or response styles, with greater reliability than prompting or in-context learning. In contrast, the conventional wisdom is that injecting knowledge via finetuning results in brittle performance and poor generalization. We argue that the dichotomy of "task customization" (e.g., instruction tuning) and "knowledge injection" (e.g., teaching new facts) is a distinction without a difference. We instead identify concrete factors that explain the heterogeneous effectiveness observed with finetuning. To this end, we conduct a large-scale experimental study of finetuning the frontier Gemini v1.5 model family on a spectrum of datasets that are artificially engineered to interpolate between the strengths and failure modes of finetuning. Our findings indicate that question-answer training data formats provide much stronger knowledge generalization than document/article-style training data, numerical information can be harder for finetuning to retain than categorical information, and models struggle to apply finetuned knowledge during multi-step reasoning even when trained on similar examples -- all factors that render "knowledge injection" to be especially difficult, even after controlling for considerations like data augmentation and information volume. On the other hand, our findings also indicate that it is not fundamentally more difficult to finetune information about a real-world event than information about what a model's writing style should be.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.05919

Country:

North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)
Asia > Russia > Far Eastern Federal District (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Government (1.00)
Leisure & Entertainment > Sports > Soccer (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)

Add feedback

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

Zhao, Eric, Awasthi, Pranjal, Gollapudi, Sreenivas

arXiv.org Artificial IntelligenceFeb-3-2025

Sampling-based search, a simple paradigm for utilizing test-time compute, involves generating multiple candidate responses and selecting the best one--typically by having models self-verify each response for correctness. In this paper, we study the scaling trends governing sampling-based search. Among our findings is that simply scaling up a minimalist implementation of sampling-based search, using only random sampling and direct self-verification, provides a practical inference method that, for example, elevates the reasoning capabilities of Gemini v1.5 Pro above that of o1-Preview on popular benchmarks. We partially attribute the scalability of sampling-based search to a phenomenon of implicit scaling, where sampling a larger pool of responses in turn improves self-verification accuracy. We further identify two useful principles for improving self-verification capabilities with test-time compute: (1) comparing across responses provides helpful signals about the locations of errors and hallucinations, and (2) different model output styles are useful for different contexts--chains of thought are useful for reasoning but harder to verify. We also find that, though accurate verification can be elicited, frontier models demonstrate remarkably weak out-of-box verification capabilities and introduce a benchmark to measure progress on these deficiencies.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.01839

Country:

Asia (0.67)
North America > United States (0.46)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Relational Programming with Foundation Models

Li, Ziyang, Huang, Jiani, Liu, Jason, Zhu, Felix, Zhao, Eric, Dodds, William, Velingker, Neelay, Alur, Rajeev, Naik, Mayur

arXiv.org Artificial IntelligenceDec-18-2024

Foundation models have vast potential to enable diverse AI applications. The powerful yet incomplete nature of these models has spurred a wide range of mechanisms to augment them with capabilities such as in-context learning, information retrieval, and code interpreting. We propose Vieira, a declarative framework that unifies these mechanisms in a general solution for programming with foundation models. Vieira follows a probabilistic relational paradigm and treats foundation models as stateless functions with relational inputs and outputs. It supports neuro-symbolic applications by enabling the seamless combination of such models with logic programs, as well as complex, multi-modal applications by streamlining the composition of diverse sub-models. We implement Vieira by extending the Scallop compiler with a foreign interface that supports foundation models as plugins. We implement plugins for 12 foundation models including GPT, CLIP, and SAM. We evaluate Vieira on 9 challenging tasks that span language, vision, and structured and vector databases. Our evaluation shows that programs in Vieira are concise, can incorporate modern foundation models, and have comparable or better accuracy than competitive baselines.

large language model, logic & formal reasoning, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1609/aaai.v38i9.28934

2412.14515

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.98)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
(2 more...)

Add feedback

Learning Variational Inequalities from Data: Fast Generalization Rates under Strong Monotonicity

Zhao, Eric, Chavdarova, Tatjana, Jordan, Michael

arXiv.org Machine LearningDec-10-2024

Variational inequalities (VIs) are a broad class of optimization problems encompassing machine learning problems ranging from standard convex minimization to more complex scenarios like min-max optimization and computing the equilibria of multi-player games. In convex optimization, strong convexity allows for fast statistical learning rates requiring only $\Theta(1/\epsilon)$ stochastic first-order oracle calls to find an $\epsilon$-optimal solution, rather than the standard $\Theta(1/\epsilon^2)$ calls. In this paper, we explain how one can similarly obtain fast $\Theta(1/\epsilon)$ rates for learning VIs that satisfy strong monotonicity, a generalization of strong convexity. Specifically, we demonstrate that standard stability-based generalization arguments for convex minimization extend directly to VIs when the domain admits a small covering, or when the operator is integrable and suboptimality is measured by potential functions; such as when finding equilibria in multi-player games.

artificial intelligence, inequality, machine learning, (16 more...)

arXiv.org Machine Learning

2410.20649

Country:

Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

Learning With Multi-Group Guarantees For Clusterable Subpopulations

Dai, Jessica, Haghtalab, Nika, Zhao, Eric

arXiv.org Artificial IntelligenceDec-6-2024

A canonical desideratum for prediction problems is that performance guarantees should hold not just on average over the population, but also for meaningful subpopulations within the overall population. But what constitutes a meaningful subpopulation? In this work, we take the perspective that relevant subpopulations should be defined with respect to the clusters that naturally emerge from the distribution of individuals for which predictions are being made. In this view, a population refers to a mixture model whose components constitute the relevant subpopulations. We suggest two formalisms for capturing per-subgroup guarantees: first, by attributing each individual to the component from which they were most likely drawn, given their features; and second, by attributing each individual to all components in proportion to their relative likelihood of having been drawn from each component. Using online calibration as a case study, we study a multi-objective algorithm that provides guarantees for each of these formalisms by handling all plausible underlying subpopulation structures simultaneously, and achieve an $O(T^{1/2})$ rate even when the subpopulations are not well-separated. In comparison, the more natural cluster-then-predict approach that first recovers the structure of the subpopulations and then makes predictions suffers from a $O(T^{2/3})$ rate and requires the subpopulations to be separable. Along the way, we prove that providing per-subgroup calibration guarantees for underlying clusters can be easier than learning the clusters: separation between median subgroup features is required for the latter but not the former.

artificial intelligence, calibration error, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2410.14588

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Algorithmic Content Selection and the Impact of User Disengagement

Calvano, Emilio, Haghtalab, Nika, Vitercik, Ellen, Zhao, Eric

arXiv.org Artificial IntelligenceOct-16-2024

The content selection problem of digital services is often modeled as a decision-process where a service chooses, over multiple rounds, an arm to pull from a set of arms that each return a certain reward. This classical model does not account for the possibility that users disengage when dissatisfied and thus fails to capture an important trade-off between choosing content that promotes future engagement versus immediate reward. In this work, we introduce a model for the content selection problem where dissatisfied users may disengage and where the content that maximizes immediate reward does not necessarily maximize the odds of future user engagement. We show that when the relationship between each arm's expected reward and effect on user satisfaction are linearly related, an optimal content selection policy can be computed efficiently with dynamic programming under natural assumptions about the complexity of the users' engagement patterns. Moreover, we show that in an online learning setting where users with unknown engagement patterns arrive, there is a variant of Hedge that attains a $\tfrac 12$-competitive ratio regret bound. We also use our model to identify key primitives that determine how digital services should weigh engagement against revenue. For example, when it is more difficult for users to rejoin a service they are disengaged from, digital services naturally see a reduced payoff but user engagement may -- counterintuitively -- increase.

artificial intelligence, machine learning, user state, (19 more...)

arXiv.org Artificial Intelligence

2410.13108

Country: North America > United States > California (0.28)

Genre: Research Report (0.81)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Human Computer Interaction (0.90)
(2 more...)

Add feedback

Stacking as Accelerated Gradient Descent

Agarwal, Naman, Awasthi, Pranjal, Kale, Satyen, Zhao, Eric

arXiv.org Machine LearningMar-7-2024

Stacking, a heuristic technique for training deep residual networks by progressively increasing the number of layers and initializing new layers by copying parameters from older layers, has proven quite successful in improving the efficiency of training deep neural networks. In this paper, we propose a theoretical explanation for the efficacy of stacking: viz., stacking implements a form of Nesterov's accelerated gradient descent. The theory also covers simpler models such as the additive ensembles constructed in boosting methods, and provides an explanation for a similar widely-used practical heuristic for initializing the new classifier in each round of boosting. We also prove that for certain deep linear residual networks, stacking does provide accelerated training, via a new potential function analysis of the Nesterov's accelerated gradient method which allows errors in updates. We conduct proof-of-concept experiments to validate our theory as well.

artificial intelligence, machine learning, nesterov, (15 more...)

arXiv.org Machine Learning

2403.04978

Country: North America > United States (0.14)

Genre:

Research Report (0.50)
Instructional Material (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Sample Complexity of Multi-Distribution Learning for VC Classes

Awasthi, Pranjal, Haghtalab, Nika, Zhao, Eric

arXiv.org Artificial IntelligenceJul-22-2023

Multi-distribution learning is a natural generalization of PAC learning to settings with multiple data distributions. There remains a significant gap between the known upper and lower bounds for PAC-learnable classes. In particular, though we understand the sample complexity of learning a VC dimension d class on $k$ distributions to be $O(\epsilon^{-2} \ln(k)(d + k) + \min\{\epsilon^{-1} dk, \epsilon^{-4} \ln(k) d\})$, the best lower bound is $\Omega(\epsilon^{-2}(d + k \ln(k)))$. We discuss recent progress on this problem and some hurdles that are fundamental to the use of game dynamics in statistical learning.

artificial intelligence, machine learning, sample complexity, (13 more...)

arXiv.org Artificial Intelligence

2307.12135

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.40)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.88)

Add feedback

On-Demand Sampling: Learning Optimally from Multiple Distributions

Haghtalab, Nika, Jordan, Michael I., Zhao, Eric

arXiv.org Artificial IntelligenceFeb-27-2023

Social and real-world considerations such as robustness, fairness, social welfare and multi-agent tradeoffs have given rise to multi-distribution learning paradigms, such as collaborative, group distributionally robust, and fair federated learning. In each of these settings, a learner seeks to minimize its worst-case loss over a set of $n$ predefined distributions, while using as few samples as possible. In this paper, we establish the optimal sample complexity of these learning paradigms and give algorithms that meet this sample complexity. Importantly, our sample complexity bounds exceed that of the sample complexity of learning a single distribution only by an additive factor of $n \log(n) / \epsilon^2$. These improve upon the best known sample complexity of agnostic federated learning by Mohri et al. by a multiplicative factor of $n$, the sample complexity of collaborative learning by Nguyen and Zakynthinou by a multiplicative factor $\log n / \epsilon^3$, and give the first sample complexity bounds for the group DRO objective of Sagawa et al. To achieve optimal sample complexity, our algorithms learn to sample and learn from distributions on demand. Our algorithm design and analysis is enabled by our extensions of stochastic optimization techniques for solving stochastic zero-sum games. In particular, we contribute variants of Stochastic Mirror Descent that can trade off between players' access to cheap one-off samples or more expensive reusable ones.

artificial intelligence, machine learning, survey article, (17 more...)

arXiv.org Artificial Intelligence

2210.12529

Country: North America > United States (0.28)

Genre:

Research Report (0.82)
Overview (0.67)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Learning to Play General-Sum Games Against Multiple Boundedly Rational Agents

Zhao, Eric, Trott, Alexander R., Xiong, Caiming, Zheng, Stephan

arXiv.org Artificial IntelligenceDec-19-2022

We study the problem of training a principal in a multi-agent general-sum game using reinforcement learning (RL). Learning a robust principal policy requires anticipating the worst possible strategic responses of other agents, which is generally NP-hard. However, we show that no-regret dynamics can identify these worst-case responses in poly-time in smooth games. We propose a framework that uses this policy evaluation method for efficiently learning a robust principal policy using RL. This framework can be extended to provide robustness to boundedly rational agents too. Our motivating application is automated mechanism design: we empirically demonstrate our framework learns robust mechanisms in both matrix games and complex spatiotemporal games. In particular, we learn a dynamic tax policy that improves the welfare of a simulated trade-and-barter economy by 15%, even when facing previously unseen boundedly rational RL taxpayers.

agent, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2106.05492

Country: North America > United States > California (0.68)

Genre: Research Report (0.82)

Industry:

Government > Tax (0.67)
Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback