AITopics | equilibrium

Every adaptive learning system must alternate between two operations: consolidating what it already knows and expanding into new evidence. We propose \emph{Consolidation-Expansion Operator Mechanics} (OpMech), a framework that makes this structure precise. The central object is the \emph{order-gap} $\Ogap(θ; e)$, the degree to which a consolidation operator~$Q$ and an expansion operator~$P_e$ fail to commute at a given knowledge state. Because the order-gap is computable from the system's own trajectory, it serves as a real-time control signal: large values indicate that the system is still sensitive to the ordering of consolidation and expansion; once the order-gap falls and stays small, further processing is unlikely to change the outcome. Three results give the signal precise meaning: the order-gap decays along convergent trajectories; a persistently large order-gap implies the system is far from its settled state; and an order-gap-based stopping rule terminates with provable guarantees in both noiseless and bounded-noise settings. The framework applies across five domains: bandits, reinforcement learning, stochastic optimization, continual learning, and recursive language models. We give conditions under which the order-gap reliably tracks convergence in three representative cases. We develop the recursive language model application in detail, showing how OpMech replaces heuristic stopping rules and fixed recursion budgets with principled, evidence-driven alternatives.

assumption 4, machine learning, reinforcement learning, (17 more...)

arXiv.org Machine Learning

2605.09968

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

e32349fe7e3cd4f9ef598c2b7b7a31f4-Paper-Conference.pdf

Neural Information Processing SystemsApr-30-2026, 02:18:16 GMT

artificial intelligence, constraint, machine learning, (18 more...)

Neural Information Processing Systems

Country: Europe (0.67)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

d79c1390baa2e4835586b094d82e5ffb-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 22:52:09 GMT

artificial intelligence, machine learning, neural-pi, (18 more...)

Neural Information Processing Systems

Industry: Energy > Power Industry (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.30)

Add feedback

d37c9ad425fe5b65304d500c6edcba00-Paper-Conference.pdf

Neural Information Processing SystemsApr-29-2026, 21:21:07 GMT

artificial intelligence, equilibrium, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

statements and

Neural Information Processing SystemsApr-29-2026, 14:25:40 GMT

Let a two-player Markov game where both players affect the transition. We will effectively show that the problem of best-responding to a correlated policy σ is526 equivalent to best-responding to the marginal policy of σ for the opponent. The proof follows from527 the equivalence of the two MDPs.528 Before that, given a (possibly correlated) joint policy σ we define a nonlinear program, (PBR), whose539 optimal solutions are best-response policies of each agent k to σ k and the values for each state s540 and timestep h:541 A.2 Proof of Theorem 3.2542 The best-response program. First, we state the following lemma that will prove useful for several543 of our arguments,544 Lemma A.1 (Best-response LP).

artificial intelligence, global minimum, value function, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

55563844bcd4bba067fe86ac1f008c7e-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 23:25:01 GMT

artificial intelligence, machine learning, zero-sum game, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.68)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

Add feedback

No-Regret Learning in Dynamic Competition with Reference Effects Under Logit Demand

Neural Information Processing SystemsApr-25-2026, 20:33:01 GMT

This work is dedicated to the algorithm design in a competitive framework, with the primary goal of learning a stable equilibrium. We consider the dynamic price competition between two firms operating within an opaque marketplace, where each firm lacks information about its competitor. The demand follows the multinomial logit (MNL) choice model, which depends on the consumers' observed price and their reference price, and consecutive periods in the repeated games are connected by reference price updates. We use the notion of stationary Nash equilibrium (SNE), defined as the fixed point of the equilibrium pricing policy for the single-period game, to simultaneously capture the long-run market equilibrium and stability. We propose the online projected gradient ascent algorithm (OPGA), where the firms adjust prices using the first-order derivatives of their log-revenues that can be obtained from the market feedback mechanism. Despite the absence of typical properties required for the convergence of online games, such as strong monotonicity and variational stability, we demonstrate that under diminishing step-sizes, the price and reference price paths generated by OPGA converge to the unique SNE, thereby achieving the no-regret learning and a stable market. Moreover, with appropriate step-sizes, we prove that this convergence exhibits a rate of O(1/t).

artificial intelligence, machine learning, price path, (17 more...)

Neural Information Processing Systems

Genre: