AITopics | sample complexity bound

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Neural Information Processing SystemsDec-23-2025, 21:56:47 GMT

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an $\epsilon$-accurate stationary point improves the best known sample complexity of AC by an order of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$, and the overall sample complexity for a mini-batch NAC to attain an $\epsilon$-accurate globally optimal point improves the existing sample complexity of NAC by an order of $\mathcal{O}(\epsilon^{-2}/\log(1/\epsilon))$. Moreover, the sample complexity of AC and NAC characterized in this work outperforms that of policy gradient (PG) and natural policy gradient (NPG) by a factor of $\mathcal{O}((1-\gamma)^{-3})$ and $\mathcal{O}((1-\gamma)^{-4}\epsilon^{-2}/\log(1/\epsilon))$, respectively. This is the first theoretical study establishing that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.

name change, sample complexity, sample complexity bound, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling

Neural Information Processing SystemsDec-23-2025, 20:22:14 GMT

This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery method of Rolland et al. [2022], assuming a sufficiently good estimation of the score function. Finally, we analyze the upper bound of score-matching estimation within the score-based generative modeling, which has been applied for causal discovery but is also of independent interest within the domain of generative models.

causal discovery and generative modeling, sample complexity bound, score-matching, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.62)

Add feedback

Sample Complexity Bounds for Iterative Stochastic Policy Optimization

Marin Kobilarov

Neural Information Processing SystemsOct-2-2025, 11:23:41 GMT

This paper is concerned with robustness analysis of decision making under uncertainty. We consider a class of iterative stochastic policy optimization problems and analyze the resulting expected performance for each newly updated policy at each iteration. In particular, we employ concentration-of-measure inequalities to compute future expected cost and probability of constraint violation using empirical runs. A novel inequality bound is derived that accounts for the possibly unbounded change-of-measure likelihood ratio resulting from iterative policy adaptation. The bound serves as a high-confidence certificate for providing future performance or safety guarantees. The approach is illustrated with a simple robot control scenario and initial steps towards applications to challenging aerial vehicle navigation problems are presented.

artificial intelligence, iteration, machine learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)

Add feedback

On The Sample Complexity Bounds In Bilevel Reinforcement Learning

Gaur, Mudit, Bedi, Amrit Singh, Pasupathu, Raghu, Aggarwal, Vaneet

arXiv.org Artificial IntelligenceMar-22-2025

Bilevel reinforcement learning (BRL) has emerged as a powerful mathematical framework for studying generative AI alignment and related problems. While several principled algorithmic frameworks have been proposed, key theoretical foundations, particularly those related to sample complexity, remain underexplored. Understanding and deriving tight sample complexity bounds are crucial for bridging the gap between theory and practice, guiding the development of more efficient algorithms. In this work, we present the first sample complexity result for BRL, achieving a bound of $\epsilon^{-4}$. This result extends to standard bilevel optimization problems, providing an interesting theoretical contribution with practical implications. To address the computational challenges associated with hypergradient estimation in bilevel optimization, we develop a first-order Hessian-free algorithm that does not rely on costly hypergradient computations. By leveraging matrix-free techniques and constrained optimization methods, our approach ensures scalability and practicality. Our findings pave the way for improved methods in AI alignment and other fields reliant on bilevel optimization.

artificial intelligence, machine learning, optimization problem, (2 more...)

arXiv.org Artificial Intelligence

2503.17644

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.60)

Add feedback

Review for NeurIPS paper: Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Neural Information Processing SystemsJan-23-2025, 02:04:22 GMT

Additional Feedback: The authors' response has addressed my questions. I will keep my score. This is a natural question to ask, so it could be worth an explanation somewhere. However, this paper suggests a slower rate by a factor of (1-\gamma) {-2}. What could cause the difference and how could the theory here guide development of deep RL algorithms?

actor-critic algorithm, neurips paper, sample complexity bound

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Sample Complexity Bounds for Iterative Stochastic Policy Optimization

Neural Information Processing SystemsJan-13-2025, 14:12:09 GMT

This paper is concerned with robustness analysis of decision making under uncertainty. We consider a class of iterative stochastic policy optimization problems and analyze the resulting expected performance for each newly updated policy at each iteration. In particular, we employ concentration-of-measure inequalities to compute future expected cost and probability of constraint violation using empirical runs. A novel inequality bound is derived that accounts for the possibly unbounded change-of-measure likelihood ratio resulting from iterative policy adaptation. The bound serves as a high-confidence certificate for providing future performance or safety guarantees.

inequality, iterative stochastic policy optimization, sample complexity bound

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Neural Information Processing SystemsOct-9-2024, 21:00:10 GMT

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an \epsilon -accurate stationary point improves the best known sample complexity of AC by an order of \mathcal{O}(\epsilon {-1}\log(1/\epsilon)), and the overall sample complexity for a mini-batch NAC to attain an \epsilon -accurate globally optimal point improves the existing sample complexity of NAC by an order of \mathcal{O}(\epsilon {-2}/\log(1/\epsilon)) . Moreover, the sample complexity of AC and NAC characterized in this work outperforms that of policy gradient (PG) and natural policy gradient (NPG) by a factor of \mathcal{O}((1-\gamma) {-3}) and \mathcal{O}((1-\gamma) {-4}\epsilon {-2}/\log(1/\epsilon)), respectively.

complexity, sample complexity, sample complexity bound, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.41)
Information Technology > Mathematics of Computing (0.40)

Add feedback

Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling

Neural Information Processing SystemsOct-9-2024, 14:00:52 GMT

This paper provides statistical sample complexity bounds for score-matching and its applications in causal discovery. We demonstrate that accurate estimation of the score function is achievable by training a standard deep ReLU neural network using stochastic gradient descent. We establish bounds on the error rate of recovering causal relationships using the score-matching-based causal discovery method of Rolland et al. [2022], assuming a sufficiently good estimation of the score function. Finally, we analyze the upper bound of score-matching estimation within the score-based generative modeling, which has been applied for causal discovery but is also of independent interest within the domain of generative models.

causal discovery and generative modeling, sample complexity bound, score-matching, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Sample Complexity Bounds for Iterative Stochastic Policy Optimization

Neural Information Processing SystemsMar-13-2024, 01:45:24 GMT

This paper is concerned with robustness analysis of decision making under uncertainty. We consider a class of iterative stochastic policy optimization problems and analyze the resulting expected performance for each newly updated policy at each iteration. In particular, we employ concentration-of-measure inequalities to compute future expected cost and probability of constraint violation using empirical runs. A novel inequality bound is derived that accounts for the possibly unbounded change-of-measure likelihood ratio resulting from iterative policy adaptation. The bound serves as a high-confidence certificate for providing future performance or safety guarantees. The approach is illustrated with a simple robot control scenario and initial steps towards applications to challenging aerial vehicle navigation problems are presented.

algorithm, iteration, probability, (11 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.66)

Add feedback

Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes

Saberi, Amir Hossein, Najafi, Amir, Motahari, Seyed Abolfazl, Khalaj, Babak H.

arXiv.org Artificial IntelligenceApr-28-2023

In this paper, we find a sample complexity bound for learning a simplex from noisy samples. Assume a dataset of size $n$ is given which includes i.i.d. samples drawn from a uniform distribution over an unknown simplex in $\mathbb{R}^K$, where samples are assumed to be corrupted by a multi-variate additive Gaussian noise of an arbitrary magnitude. We prove the existence of an algorithm that with high probability outputs a simplex having a $\ell_2$ distance of at most $\varepsilon$ from the true simplex (for any $\varepsilon>0$). Also, we theoretically show that in order to achieve this bound, it is sufficient to have $n\ge\left(K^2/\varepsilon^2\right)e^{\Omega\left(K/\mathrm{SNR}^2\right)}$ samples, where $\mathrm{SNR}$ stands for the signal-to-noise ratio. This result solves an important open problem and shows as long as $\mathrm{SNR}\ge\Omega\left(K^{1/2}\right)$, the sample complexity of the noisy regime has the same order to that of the noiseless case. Our proofs are a combination of the so-called sample compression technique in \citep{ashtiani2018nearly}, mathematical tools from high-dimensional geometry, and Fourier analysis. In particular, we have proposed a general Fourier-based technique for recovery of a more general class of distribution families from additive Gaussian noise, which can be further used in a variety of other related problems.

artificial intelligence, machine learning, simplex, (11 more...)

arXiv.org Artificial Intelligence

2209.05953

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.42)

Add feedback

Collaborating Authors

sample complexity bound

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling

Sample Complexity Bounds for Iterative Stochastic Policy Optimization

On The Sample Complexity Bounds In Bilevel Reinforcement Learning

Review for NeurIPS paper: Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Sample Complexity Bounds for Iterative Stochastic Policy Optimization

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Sample Complexity Bounds for Score-Matching: Causal Discovery and Generative Modeling

Sample Complexity Bounds for Iterative Stochastic Policy Optimization

Sample Complexity Bounds for Learning High-dimensional Simplices in Noisy Regimes