AITopics

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Costa Rica > Heredia Province > Heredia (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Ram, Ashwin, Ramdas, Aaditya

An Asymptotic Law of the Iterated Logarithm for $\mathrm{KL}_{\inf}$

arXiv.org Machine LearningFeb-6-2026

The population $\mathrm{KL}_{\inf}$ is a fundamental quantity that appears in lower bounds for (asymptotically) optimal regret of pure-exploration stochastic bandit algorithms, and optimal stopping time of sequential tests. Motivated by this, an empirical $\mathrm{KL}_{\inf}$ statistic is frequently used in the design of (asymptotically) optimal bandit algorithms and sequential tests. While nonasymptotic concentration bounds for the empirical $\mathrm{KL}_{\inf}$ have been developed, their optimality in terms of constants and rates is questionable, and their generality is limited (usually to bounded observations). The fundamental limits of nonasymptotic concentration are often described by the asymptotic fluctuations of the statistics. With that motivation, this paper presents a tight (upper and lower) law of the iterated logarithm for empirical $\mathrm{KL}_{\inf}$ applying to extremely general (unbounded) data.

artificial intelligence, big data, data mining, (15 more...)

2602.05259

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.86)

Neural Information Processing SystemsDec-24-2025, 12:46:03 GMT

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, communication, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense. Importantly, our result holds under significantly weaker assumptions: neither the gossip matrix needs to be doubly stochastic nor the stepsizes square summable. As an application, we show that, for the stepsize $n^{-\gamma}$ with $\gamma \in (0, 1),$ the distributed TD(0) algorithm with linear function approximation has a convergence rate of $O(\sqrt{n^{-\gamma} \ln n })$ a.s.; for the $1/n$ type stepsize, the same is $O(\sqrt{n^{-1} \ln \ln n})$ a.s. These decay rates do not depend on the graph depicting the interactions among the different agents.

iterated logarithm, multi-agent reinforcement learning, name change, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Neural Information Processing SystemsAug-16-2025, 03:24:25 GMT

955fd82131e15e7b5199cbc8f983306a-Paper.pdf

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Costa Rica > Heredia Province > Heredia (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsMay-29-2025, 11:24:06 GMT

955fd82131e15e7b5199cbc8f983306a-Supplemental.pdf

lemma 4, machine learning, reinforcement learning, (13 more...)

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Costa Rica > Heredia Province > Heredia (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsJan-17-2025, 17:13:46 GMT

A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning

In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. It has wide-ranging applications in gaming, robotics, finance, communication, etc. In this work, we derive a novel law of iterated logarithm for a family of distributed nonlinear stochastic approximation schemes that is useful in MARL. In particular, our result describes the convergence rate on almost every sample path where the algorithm converges. This result is the first of its kind in the distributed setup and provides deeper insights than the existing ones, which only discuss convergence rates in the expected or the CLT sense.

convergence rate, iterated logarithm, multi-agent reinforcement learning, (4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bhatt, Sujay, Fang, Guanhua, Li, Ping, Samorodnitsky, Gennady

Catoni-style Confidence Sequences under Infinite Variance

arXiv.org Artificial IntelligenceAug-5-2022

In this paper, we provide an extension of confidence sequences for settings where the variance of the data-generating distribution does not exist or is infinite. Confidence sequences furnish confidence intervals that are valid at arbitrary data-dependent stopping times, naturally having a wide range of applications. We first establish a lower bound for the width of the Catoni-style confidence sequences for the finite variance case to highlight the looseness of the existing results. Next, we derive tight Catoni-style confidence sequences for data distributions having a relaxed bounded~$p^{th}-$moment, where~$p \in (1,2]$, and strengthen the results for the finite variance case of~$p =2$. The derived results are shown to better than confidence sequences obtained using Dubins-Savage inequality.

catoni-style confidence sequence, confidence sequence, sequence, (12 more...)

arXiv.org Artificial Intelligence

2208.03185

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)

Balsubramani, Akshay, Ramdas, Aaditya

Sequential Nonparametric Testing with the Law of the Iterated Logarithm

arXiv.org Machine LearningMar-1-2016

We propose a new algorithmic framework for sequential hypothesis testing with i.i.d. data, which includes A/B testing, nonparametric two-sample testing, and independence testing as special cases. It is novel in several ways: (a) it takes linear time and constant space to compute on the fly, (b) it has the same power guarantee as a non-sequential version of the test with the same computational constraints up to a small factor, and (c) it accesses only as many samples as are required - its stopping time adapts to the unknown difficulty of the problem. All our test statistics are constructed to be zero-mean martingales under the null hypothesis, and the rejection threshold is governed by a uniform non-asymptotic law of the iterated logarithm (LIL). For the case of nonparametric two-sample mean testing, we also provide a finite sample power analysis, and the first non-asymptotic stopping time calculations for this class of problems. We verify our predictions for type I and II errors and stopping times using simulations.

artificial intelligence, machine learning, sequential test, (19 more...)

1506.03486

Country: North America > United States > California (0.28)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Cowan, Wesley, Katehakis, Michael N.

Asymptotic Behavior of Minimal-Exploration Allocation Policies: Almost Sure, Arbitrarily Slow Growing Regret

arXiv.org Machine LearningDec-17-2015

The purpose of this paper is to provide further understanding into the structure of the sequential allocation ("stochastic multi-armed bandit", or MAB) problem by establishing probability one finite horizon bounds and convergence rates for the sample (or "pseudo") regret associated with two simple classes of allocation policies $\pi$. For any slowly increasing function $g$, subject to mild regularity constraints, we construct two policies (the $g$-Forcing, and the $g$-Inflated Sample Mean) that achieve a measure of regret of order $ O(g(n))$ almost surely as $n \to \infty$, bound from above and below. Additionally, almost sure upper and lower bounds on the remainder term are established. In the constructions herein, the function $g$ effectively controls the "exploration" of the classical "exploration/exploitation" tradeoff.

artificial intelligence, bandit, big data, (18 more...)

1505.02865

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Industry: Energy > Oil & Gas (0.36)

Technology:

Information Technology > Artificial Intelligence (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.66)

arXiv.org Machine LearningDec-1-2015

Sharp Finite-Time Iterated-Logarithm Martingale Concentration

Balsubramani, Akshay

Martingales are indispensable in studying the temporal dynamics of stochastic processes arising in a multitude of fields [10, 14]. Particularly when such processes have complex long-range dependences, it is often of interest to concentrate martingales uniformly over time. On the theoretical side, a fundamental limit to such concentration is expressed by the law of the iterated logarithm (LIL). However, this only concerns asymptotic behavior. In many applications, it is more natural to instead consider concentration that holds uniformly over all finite times. This manuscript presents such bounds for the large classes of martingales which are addressed by Hoeffding [11] and Bernstein [8] inequalities. These new results are optimal within small constants, and can be viewed as finite-time generalizations of the upper half of the LIL.

concentration, martingale, theorem 2, (16 more...)