AITopics

2203.15797

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Liu, Tianchen, Chakrabarti, Kushal, Chopra, Nikhil

Iteratively Preconditioned Gradient-Descent Approach for Moving Horizon Estimation Problems

arXiv.org Artificial IntelligenceJun-22-2023

Moving horizon estimation (MHE) is a widely studied state estimation approach in several practical applications. In the MHE problem, the state estimates are obtained via the solution of an approximated nonlinear optimization problem. However, this optimization step is known to be computationally complex. Given this limitation, this paper investigates the idea of iteratively preconditioned gradient-descent (IPG) to solve MHE problem with the aim of an improved performance than the existing solution techniques. To our knowledge, the preconditioning technique is used for the first time in this paper to reduce the computational cost and accelerate the crucial optimization step for MHE. The convergence guarantee of the proposed iterative approach for a class of MHE problems is presented. Additionally, sufficient conditions for the MHE problem to be convex are also derived. Finally, the proposed method is implemented on a unicycle localization example. The simulation results demonstrate that the proposed approach can achieve better accuracy with reduced computational costs.

artificial intelligence, estimation, machine learning, (16 more...)

2306.13194

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
Asia > India > Maharashtra > Mumbai (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)

Hutzenthaler, Martin, Jentzen, Arnulf, Pohl, Katharina, Riekert, Adrian, Scarpa, Luca

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

arXiv.org Artificial IntelligenceJun-22-2023

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs. In this work we study SGD type optimization methods in the training of fully-connected feedforward DNNs with rectified linear unit (ReLU) activation. We first establish general regularity properties for the risk functions and their generalized gradient functions appearing in the training of such DNNs and, thereafter, we investigate the plain vanilla SGD optimization method in the training of such DNNs under the assumption that the target function under consideration is a constant function. Specifically, we prove under the assumption that the learning rates (the step sizes of the SGD optimization method) are sufficiently small but not $L^1$-summable and under the assumption that the target function is a constant function that the expectation of the riskof the considered SGD process converges in the training of such DNNs to zero as the number of SGD steps increases to infinity.

artificial intelligence, lemma 2, machine learning, (19 more...)

2112.07369

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > Germany > North Rhine-Westphalia > Münster Region > Münster (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Chen, Xuxing, Xiao, Tesi, Balasubramanian, Krishnakumar

Optimal Algorithms for Stochastic Bilevel Optimization under Relaxed Smoothness Conditions

arXiv.org Artificial IntelligenceJun-21-2023

Stochastic Bilevel optimization usually involves minimizing an upper-level (UL) function that is dependent on the arg-min of a strongly-convex lower-level (LL) function. Several algorithms utilize Neumann series to approximate certain matrix inverses involved in estimating the implicit gradient of the UL function (hypergradient). The state-of-the-art StOchastic Bilevel Algorithm (SOBA) [16] instead uses stochastic gradient descent steps to solve the linear system associated with the explicit matrix inversion. This modification enables SOBA to match the lower bound of sample complexity for the single-level counterpart in non-convex settings. Unfortunately, the current analysis of SOBA relies on the assumption of higher-order smoothness for the UL and LL functions to achieve optimality. In this paper, we introduce a novel fully single-loop and Hessian-inversion-free algorithmic framework for stochastic bilevel optimization and present a tighter analysis under standard smoothness assumptions (first-order Lipschitzness of the UL function and second-order Lipschitzness of the LL function). Furthermore, we show that by a slight modification of our approach, our algorithm can handle a more general multi-objective robust bilevel optimization problem. For this case, we obtain the state-of-the-art oracle complexity results demonstrating the generality of both the proposed algorithmic and analytic frameworks. Numerical experiments demonstrate the performance gain of the proposed algorithms over existing ones.

algorithm, assumption, optimization, (14 more...)

2306.12067

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.68)

Cai, Xufeng, Lin, Cheuk Yin, Diakonikolas, Jelena

Empirical Risk Minimization with Shuffled SGD: A Primal-Dual Perspective and Improved Bounds

arXiv.org Artificial IntelligenceJun-21-2023

Stochastic gradient descent (SGD) is perhaps the most prevalent optimization method in modern machine learning. Contrary to the empirical practice of sampling from the datasets without replacement and with (possible) reshuffling at each epoch, the theoretical counterpart of SGD usually relies on the assumption of sampling with replacement. It is only very recently that SGD with sampling without replacement -- shuffled SGD -- has been analyzed. For convex finite sum problems with $n$ components and under the $L$-smoothness assumption for each component function, there are matching upper and lower bounds, under sufficiently small -- $\mathcal{O}(\frac{1}{nL})$ -- step sizes. Yet those bounds appear too pessimistic -- in fact, the predicted performance is generally no better than for full gradient descent -- and do not agree with the empirical observations. In this work, to narrow the gap between the theory and practice of shuffled SGD, we sharpen the focus from general finite sum problems to empirical risk minimization with linear predictors. This allows us to take a primal-dual perspective and interpret shuffled SGD as a primal-dual method with cyclic coordinate updates on the dual side. Leveraging this perspective, we prove a fine-grained complexity bound that depends on the data matrix and is never worse than what is predicted by the existing bounds. Notably, our bound can predict much faster convergence than the existing analyses -- by a factor of the order of $\sqrt{n}$ in some cases. We empirically demonstrate that on common machine learning datasets our bound is indeed much tighter. We further show how to extend our analysis to convex nonsmooth problems, with similar improvements.

artificial intelligence, machine learning, nullx 0, (17 more...)

2306.12498

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Unsupervised Deep Unfolded PGD for Transmit Power Allocation in Wireless Systems

Adeogun, Ramoni

Transmit power control (TPC) is a key mechanism for managing interference, energy utilization, and connectivity in wireless systems. In this paper, we propose a simple low-complexity TPC algorithm based on the deep unfolding of the iterative projected gradient descent (PGD) algorithm into layers of a deep neural network and learning the step-size parameters. An unsupervised learning method with either online learning or offline pretraining is applied for optimizing the weights of the DNN. Performance evaluation in dense device-to-device (D2D) communication scenarios showed that the proposed method can achieve better performance than the iterative algorithm with more than a factor of 2 lower number of iterations.

algorithm, artificial intelligence, machine learning, (16 more...)

2306.11865

Country: Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.49)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Ding, Dongsheng, Wei, Chen-Yu, Zhang, Kaiqing, Ribeiro, Alejandro

Last-Iterate Convergent Policy Gradient Primal-Dual Methods for Constrained MDPs

We study the problem of computing an optimal policy of an infinite-horizon discounted constrained Markov decision process (constrained MDP). Despite the popularity of Lagrangian-based policy search methods used in practice, the oscillation of policy iterates in these methods has not been fully understood, bringing out issues such as violation of constraints and sensitivity to hyper-parameters. To fill this gap, we employ the Lagrangian method to cast a constrained MDP into a constrained saddle-point problem in which max/min players correspond to primal/dual variables, respectively, and develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy. Specifically, we first propose a regularized policy gradient primal-dual (RPG-PD) method that updates the policy using an entropy-regularized policy gradient, and the dual via a quadratic-regularized gradient ascent, simultaneously. We prove that the policy primal-dual iterates of RPG-PD converge to a regularized saddle point with a sublinear rate, while the policy iterates converge sublinearly to an optimal constrained policy. We further instantiate RPG-PD in large state or action spaces by including function approximation in policy parametrization, and establish similar sublinear last-iterate policy convergence. Second, we propose an optimistic policy gradient primal-dual (OPG-PD) method that employs the optimistic gradient method to update primal/dual variables, simultaneously. We prove that the policy primal-dual iterates of OPG-PD converge to a saddle point that contains an optimal constrained policy, with a linear rate. To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2306.117

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
(4 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Callaghan, Adam, Mason, Karl, Mannion, Patrick

Evolutionary Strategy Guided Reinforcement Learning via MultiBuffer Communication

Evolutionary Algorithms and Deep Reinforcement Learning have both successfully solved control problems across a variety of domains. Recently, algorithms have been proposed which combine these two methods, aiming to leverage the strengths and mitigate the weaknesses of both approaches. In this paper we introduce a new Evolutionary Reinforcement Learning model which combines a particular family of Evolutionary algorithm called Evolutionary Strategies with the off-policy Deep Reinforcement Learning algorithm TD3. The framework utilises a multi-buffer system instead of using a single shared replay buffer. The multi-buffer system allows for the Evolutionary Strategy to search freely in the search space of policies, without running the risk of overpopulating the replay buffer with poorly performing trajectories which limit the number of desirable policy behaviour examples thus negatively impacting the potential of the Deep Reinforcement Learning within the shared framework. The proposed algorithm is demonstrated to perform competitively with current Evolutionary Reinforcement Learning algorithms on MuJoCo control tasks, outperforming the well known state-of-the-art CEM-RL on 3 of the 4 environments tested.

evolutionary algorithm, machine learning, reinforcement learning, (14 more...)

2306.11535

Country:

Europe > Ireland > Connaught > County Galway > Galway (0.04)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Cayci, Semih, Eryilmaz, Atilla

Provably Robust Temporal Difference Learning for Heavy-Tailed Rewards

In a broad class of reinforcement learning applications, stochastic rewards have heavy-tailed distributions, which lead to infinite second-order moments for stochastic (semi)gradients in policy evaluation and direct policy optimization. In such instances, the existing RL methods may fail miserably due to frequent statistical outliers. In this work, we establish that temporal difference (TD) learning with a dynamic gradient clipping mechanism, and correspondingly operated natural actor-critic (NAC), can be provably robustified against heavy-tailed reward distributions. It is shown in the framework of linear function approximation that a favorable tradeoff between bias and variability of the stochastic gradients can be achieved with this dynamic gradient clipping mechanism. In particular, we prove that robust versions of TD learning achieve sample complexities of order $\mathcal{O}(\varepsilon^{-\frac{1}{p}})$ and $\mathcal{O}(\varepsilon^{-1-\frac{1}{p}})$ with and without the full-rank assumption on the feature matrix, respectively, under heavy-tailed rewards with finite moments of order $(1+p)$ for some $p\in(0,1]$, both in expectation and with high probability. We show that a robust variant of NAC based on Robust TD learning achieves $\tilde{\mathcal{O}}(\varepsilon^{-4-\frac{2}{p}})$ sample complexity. We corroborate our theoretical results with numerical experiments.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2306.11455

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Alvo, Matias, Russo, Daniel, Kanoria, Yash

Neural Inventory Control in Networks via Hindsight Differentiable Policy Optimization

arXiv.org Artificial IntelligenceJun-19-2023

Inventory management offers unique opportunities for reliably evaluating and applying deep reinforcement learning (DRL). Rather than evaluate DRL algorithms by comparing against one another or against human experts, we can compare to the optimum itself in several problem classes with hidden structure. Our DRL methods consistently recover near-optimal policies in such settings, despite being applied with up to 600-dimensional raw state vectors. In others, they can vastly outperform problem-specific heuristics. To reliably apply DRL, we leverage two insights. First, one can directly optimize the hindsight performance of any policy using stochastic gradient descent. This uses (i) an ability to backtest any policy's performance on a subsample of historical demand observations, and (ii) the differentiability of the total cost incurred on any subsample with respect to policy parameters. Second, we propose a natural neural network architecture to address problems with weak (or aggregate) coupling constraints between locations in an inventory network. This architecture employs weight duplication for ``sibling'' locations in the network, and state summarization. We justify this architecture through an asymptotic guarantee, and empirically affirm its value in handling large-scale problems.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

2306.11246

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)