AITopics | payoff

We prove that no reinforcement learning policy with confidence-gated autonomy can simultaneously achieve maximum helpfulness, optimal calibration, and full autonomy under rational oversight, whenever some tasks exceed the agent's reliable competence: the Behavioral Credibility Trilemma. The impossibility is geometric -- adding any non-affine autonomy incentive to a strictly proper scoring rule destroys strict properness, so an agent rewarded for both calibrated confidence and autonomous action systematically inflates its reported confidence on tasks below the principal's approval threshold. The Behavioral Perturbation Lemma quantifies the inflation (scaling as $w_A/(2 w_C)$ for the Brier score) and shows detection requires $Ω(1/Δ^2)$ observations. We prove the principal's optimal oversight rule is necessarily non-affine, making the impossibility unconditional and optimizer-independent across log-concave-density policy families. We formalize the Confidence-Gated Decision Problem, map existing methods onto the trilemma, and identify two constructive resolution pathways (commitment, domain separation). A 540-configuration Best-of-N experiment tests five pre-registered hypotheses, all strongly confirmed (effect sizes $d = 1.10$ to $5.32$), and adds a descriptive analysis of the achievable-$(H, C, A)$ surface geometry showing a plateau-truncated frontier consistent with the predicted inflation saturation.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.25739

Country:

North America > United States (0.45)
Europe > Finland (0.28)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry: Government > Regional Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

55563844bcd4bba067fe86ac1f008c7e-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 23:25:01 GMT

artificial intelligence, machine learning, zero-sum game, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

4f87658ef0de194413056248a00ce009-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 20:51:09 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.67)

Industry: Leisure & Entertainment > Games (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving

Neural Information Processing SystemsApr-25-2026, 20:51:07 GMT

We develop new parameter-free and scale-free algorithms for solving convexconcave saddle-point problems. Our results are based on a new simple regret minimizer, the Conic Blackwell Algorithm+ (CBA+), which attains O(1/ T) average regret. Intuitively, our approach generalizes to other decision sets of interest ideas from the Counterfactual Regret minimization (CFR+) algorithm, which has very strong practical performance for solving sequential games on simplexes. We show how to implement CBA+ for the simplex, `p norm balls, and ellipsoidal confidence regions in the simplex, and we present numerical experiments for solving matrix games and distributionally robust optimization problems. Our empirical results show that CBA+ is a simple algorithm that outperforms state-ofthe-art methods on synthetic data and real data instances, without the need for any choice of step sizes or other algorithmic parameters.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

1577ea3eaf8dacb99f64e4496c3ecddf-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 05:10:11 GMT

artificial intelligence, experiment, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

AApproximate Target Maximum Welfare Minimum Relative Entropy Equilbiria We use a Minimum Relative Entropy (RME) (also known as minimum KL divergence) Pa (a)ln

Neural Information Processing SystemsApr-25-2026, 02:58:14 GMT

This objective is similar to Maximum Entropy Correlated Equilibrium (MECE) [48], and the proofs here are similar to the framework set out there. A drawback of MECE is that it is not easy to determine the minimum p permissible. If we choose p that does not permit a valid solution, then the parameters will diverge. We can circumvent this problem by optimizing the distance to a target ˆ p. And µis for balancing the linear objective.

artificial intelligence, machine learning, payoff, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

24f420aa4c99642dbb9aae18b166bbbc-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 02:58:12 GMT

artificial intelligence, machine learning, payoff, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report (0.47)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Safe Exploitative Play with Untrusted Type Beliefs

Neural Information Processing SystemsMar-21-2026, 10:20:11 GMT

The combination of the Bayesian game and learning has a rich history, with the idea of controlling a single agent in a system composed of multiple agents with unknown behaviors given a set of types, each specifying a possible behavior for the other agents. The idea is to plan an agent's own actions with respect to those types which it believes are most likely to maximize the payoff. However, the type beliefs are often learned from past actions and likely to be incorrect. With this perspective in mind, we consider an agent in a game with type predictions of other components, and investigate the impact of incorrect beliefs to the agent's payoff. In particular, we formally define a tradeoff between risk and opportunity by comparing the payoff obtained against the optimal payoff, which is represented by a gap caused by trusting or distrusting the learned beliefs.Our main results characterize the tradeoff by establishing upper and lower bounds on the Pareto front for both normal-form and stochastic Bayesian games, with numerical results provided.

agent, artificial intelligence, proceedings, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.40)

Add feedback

Learning in Games: Robustness of Fast Convergence

Neural Information Processing SystemsMar-17-2026, 10:34:41 GMT

We show that learning algorithms satisfying a low approximate regret property experience fast convergence to approximate optimality in a large class of repeated games. Our property, which simply requires that each learner has small regret compared to a (1+eps)-multiplicative approximation to the best action in hindsight, is ubiquitous among learning algorithms; it is satisfied even by the vanilla Hedge forecaster. Our results improve upon recent work of Syrgkanis et al. in a number of ways. We require only that players observe payoffs under other players' realized actions, as opposed to expected payoffs. We further show that convergence occurs with high probability, and show convergence under bandit feedback.

algorithm, artificial intelligence, machine learning, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.64)

Add feedback