AITopics | guarantee

Collaborating Authors

guarantee

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Guarantees for Self-Play in Multiplayer Games via Polymatrix Decomposability

Neural Information Processing SystemsDec-24-2025, 20:48:40 GMT

Self-play is a technique for machine learning in multi-agent systems where a learning algorithm learns by interacting with copies of itself. Self-play is useful for generating large quantities of data for learning, but has the drawback that the agents the learner will face post-training may have dramatically different behavior than the learner came to expect by interacting with itself. For the special case of two-player constant-sum games, self-play that reaches Nash equilibrium is guaranteed to produce strategies that perform well against any post-training opponent; however, no such guarantee exists for multiplayer games. We show that in games that approximately decompose into a set of two-player constant-sum games (called constant-sum polymatrix games) where global $\epsilon$-Nash equilibria are boundedly far from Nash equilibria in each subgame (called subgame stability), any no-external-regret algorithm that learns by self-play will produce a strategy with bounded vulnerability. For the first time, our results identify a structural property of multiplayer games that enable performance guarantees for the strategies produced by a broad class of self-play algorithms. We demonstrate our findings through experiments on Leduc poker.

multiplayer game, name change, polymatrix decomposability, (8 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Industry: Leisure & Entertainment > Games (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Risk Minimization from Adaptively Collected Data: Guarantees for Supervised and Policy Learning

Neural Information Processing SystemsDec-24-2025, 14:53:17 GMT

Empirical risk minimization (ERM) is the workhorse of machine learning, whether for classification and regression or for off-policy policy learning, but its model-agnostic guarantees can fail when we use adaptively collected data, such as the result of running a contextual bandit algorithm. We study a generic importance sampling weighted ERM algorithm for using adaptively collected data to minimize the average of a loss function over a hypothesis class and provide first-of-their-kind generalization guarantees and fast convergence rates. Our results are based on a new maximal inequality that carefully leverages the importance sampling structure to obtain rates with the good dependence on the exploration rate in the data. For regression, we provide fast rates that leverage the strong convexity of squared-error loss. For policy learning, we provide regret guarantees that close an open gap in the existing literature whenever exploration decays to zero, as is the case for bandit-collected data.

adaptively collected data, risk minimization, supervised and policy learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fine-tuning Language Models over Slow Networks using Activation Quantization with Guarantees

Neural Information Processing SystemsDec-24-2025, 13:06:01 GMT

Communication compression is a crucial technique for modern distributed learning systems to alleviate their communication bottlenecks over slower networks. Despite recent intensive studies of gradient compression for data parallel-style training, compressing the activations for models trained with pipeline parallelism is still an open problem. In this paper, we propose AQ-SGD, a novel activation compression algorithm for communication-efficient pipeline parallelism training over slow networks.

activation quantization, compression, fine-tuning language model, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.35)

Add feedback

Scalable Distributional Robustness in a Class of Non-Convex Optimization with Guarantees

Neural Information Processing SystemsDec-24-2025, 06:27:42 GMT

Distributionally robust optimization (DRO) has shown a lot of promise in providing robustness in learning as well as sample-based optimization problems. We endeavor to provide DRO solutions for a class of sum of fractionals, non-convex optimization which is used for decision making in prominent areas such as facility location and security games. In contrast to previous work, we find it more tractable to optimize the equivalent variance regularized form of DRO rather than the minimax form. We transform the variance regularized form to a mixed-integer second-order cone program (MISOCP), which, while guaranteeing global optimality, does not scale enough to solve problems with real-world datasets. We further propose two abstraction approaches based on clustering and stratified sampling to increase scalability, which we then use for real-world datasets. Importantly, we provide global optimality guarantees for our approach and show experimentally that our solution quality is better than the locally optimal ones achieved by state-of-the-art gradient-based methods. We experimentally compare our different approaches and baselines and reveal nuanced properties of a DRO solution.

name change, non-convex optimization, scalable distributional robustness, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.60)

Add feedback

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Neural Information Processing SystemsDec-23-2025, 21:51:25 GMT

We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from incoherent weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre-and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several existing quantization algorithms and yields the first LLM quantization methods that produce viable results using only two bits per weight.

2-bit quantization, language model, quip, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.71)

Add feedback

Prior-Aligned Meta-RL: Thompson Sampling with Learned Priors and Guarantees in Finite-Horizon MDPs

Zhou, Runlin, Chen, Chixiang, Chen, Elynn

arXiv.org Machine LearningOct-8-2025

We study meta-reinforcement learning in finite-horizon MDPs where related tasks share similar structures in their optimal action-value functions. Specifically, we posit a linear representation $Q^*_h(s,a)=Φ_h(s,a)\,θ^{(k)}_h$ and place a Gaussian meta-prior $ \mathcal{N}(θ^*_h,Σ^*_h)$ over the task-specific parameters $θ^{(k)}_h$. Building on randomized value functions, we propose two Thompson-style algorithms: (i) MTSRL, which learns only the prior mean and performs posterior sampling with the learned mean and known covariance; and (ii) $\text{MTSRL}^{+}$, which additionally estimates the covariance and employs prior widening to control finite-sample estimation error. Further, we develop a prior-alignment technique that couples the posterior under the learned prior with a meta-oracle that knows the true prior, yielding meta-regret guarantees: we match prior-independent Thompson sampling in the small-task regime and strictly improve with more tasks once the prior is learned. Concretely, for known covariance we obtain $\tilde{O}(H^{4}S^{3/2}\sqrt{ANK})$ meta-regret, and with learned covariance $\tilde{O}(H^{4}S^{3/2}\sqrt{AN^3K})$; both recover a better behavior than prior-independent after $K \gtrsim \tilde{O}(H^2)$ and $K \gtrsim \tilde{O}(N^2H^2)$, respectively. Simulations on a stateful recommendation environment (with feature and prior misspecification) show that after brief exploration, MTSRL/MTSRL$^+$ track the meta-oracle and substantially outperform prior-independent RL and bandit-only meta-baselines. Our results give the first meta-regret guarantees for Thompson-style RL with learned Q-priors, and provide practical recipes (warm-start via RLSVI, OLS aggregation, covariance widening) for experiment-rich settings.

algorithm, lemma ec, thompson sampling, (13 more...)

arXiv.org Machine Learning

2510.05446

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > New York (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees

Neural Information Processing SystemsMay-27-2025, 07:37:03 GMT

Before deploying outputs from foundation models in high-stakes tasks, it is imperative to ensure that they align with human values.For instance, in radiology report generation, reports generated by a vision-language model must align with human evaluations before their use in medical decision-making. This paper presents Conformal Alignment, a general framework for identifying units whose outputs meet a user-specified alignment criterion. It is guaranteed that on average, a prescribed fraction of selected units indeed meet the alignment criterion, regardless of the foundation model or the data distribution. Given any pre-trained model and new units with model-generated outputs, Conformal Alignment leverages a set of reference data with ground-truth alignment status to train an alignment predictor. It then selects new units whose predicted alignment scores surpass a data-dependent threshold, certifying their corresponding outputs as trustworthy. Through applications to question answering and radiology report generation, we demonstrate that our method is able to accurately identify units with trustworthy outputs via lightweight training over a moderate amount of reference data.

conformal alignment, guarantee, trust foundation model, (5 more...)

Neural Information Processing Systems

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.63)

Add feedback

Reviews: Single-Agent Policy Tree Search With Guarantees

Neural Information Processing SystemsOct-7-2024, 10:26:43 GMT

The paper presents two planning algorithms based on tree search. The novelty of these algorithms is the use of a policy based criterion instead of a standard heuristic to guide the search. The first algorithm (LevinTS) uses a cost function d(n)/\pi(n) which ensures nodes are expanded in a best-first order. This allows the authors to derive an upper bound to the number of expansions performed before to reach a target node. The second algorithm (LubyTS) is a randomized algorithm based on the sampling of trajectories.

algorithm, expansion, single-agent policy tree search, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.52)

Add feedback

Solving Multiagent Networks Using Distributed Constraint Optimization

AI MagazineJan-4-2018, 12:30:35 GMT

In many cooperative multiagent domains, the effect of local interactions between agents can be compactly represented as a network structure. Given that agents are spread across such a network, agents directly interact only with a small group of neighbors. A distributed constraint optimization problem (DCOP) is a useful framework to reason about such networks of agents. Given agents' inability to communicate and collaborate in large groups in such networks, we focus on an approach called k-optimality for solving DCOPs. In this approach, agents form groups of one or more agents until no group of k or fewer agents can possibly improve the DCOP solution; we define this type of local optimum, and any algorithm guaranteed to reach such a local optimum, as k-optimal.

agent, artificial intelligence, constraint, (16 more...)

AI Magazine

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

A Tale of Two Knowledge Servers

AI MagazineJan-4-2018, 06:52:44 GMT

An alien arriving on earth in the late twentieth century was fascinated but confused by what it saw and wanted answers to a few questions. The alien decided to give it a try, dropped five Universal Galactic Credits into the slot and asked, "I've noticed that there seem to be two disjoint kinds of individuals on this planet. Is it true that every creature is either male or female?" Whereupon KayEl replied: Question not askable. So the alien tried again: "Is there any creature that does not belong to either gender?"

artificial intelligence, knowledge, question, (17 more...)

AI Magazine

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.32)

Add feedback