AITopics | svec

Collaborating Authors

svec

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

da6ea77475918a3d83c7e49223d453cc-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 16:46:30 GMT

Intuitively, if thei-th measurement yi is an inlier (i.e., r2 c2β2i), then θi = +1 and the corresponding term in(A1) reduces to least squares; ifyi is an outlier (i.e., r2 > c2β2i), then θi = 1andthecorrespondingtermin(A1)becomesaconstant c2,whencetheoutlierisirrelevant to the optimization. Directly developing the residual function r2(x,yi) = kbi sΠRBik 2 leads to a quartic polynomial (degree 4) ins and R, which is not suitable for moment relaxation because itwouldincrease theminimum relaxation orderκ[14]. T2 for t (the translation is bounded byaknownvalueT). Towards this goal, we introduce the notion of moments, moment matricesandlocalizingmatrices. Given a probability measureµ supported onP R n, its moment of orderα Z n+ is the scalarzα .

artificial intelligence, matrix, relaxation, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

Supplementary Material: Probabilistic Linear Solvers for Machine Learning

Neural Information Processing SystemsOct-2-2025, 20:47:20 GMT

Overviews discussing motivations and historical perspectives can be found in Hennig et al.

aa 1 0, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany (0.28)
North America (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Nonconvex Optimization Framework for Group-Sparse Feedback Linear-Quadratic Optimal Control: Penalty Approach

Feng, Lechen, Li, Xun, Ni, Yuan-Hua

arXiv.org Artificial IntelligenceAug-6-2025

This paper develops a unified nonconvex optimization framework for the design of group-sparse feedback controllers in infinite-horizon linear-quadratic (LQ) problems. We address two prominent extensions of the classical LQ problem: the distributed LQ problem with fixed communication topology (DFT-LQ) and the sparse feedback LQ problem (SF-LQ), both of which are motivated by the need for scalable and structure-aware control in large-scale systems. Unlike existing approaches that rely on convex relaxations or are limited to block-diagonal structures, we directly formulate the controller synthesis as a finite-dimensional nonconvex optimization problem with group $\ell_0$-norm regularization, capturing general sparsity patterns. We establish a connection between DFT-LQ and SF-LQ problems, showing that both can be addressed within our unified framework. Furthermore, we propose a penalty-based proximal alternating linearized minimization (PALM) algorithm and provide a rigorous convergence analysis under mild assumptions, overcoming the lack of coercivity in the objective function. The proposed method admits efficient solvers for all subproblems and guarantees global convergence to critical points. Our results fill a key gap in the literature by enabling the direct design of group-sparse feedback gains with theoretical guarantees, without resorting to convex surrogates or restrictive structural assumptions.

algorithm, artificial intelligence, optimization problem, (14 more...)

arXiv.org Artificial Intelligence

2507.18114

Country: Asia > China (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Global Optimality of Single-Timescale Actor-Critic under Continuous State-Action Space: A Study on Linear Quadratic Regulator

Chen, Xuyang, Duan, Jingliang, Zhao, Lin

arXiv.org Artificial IntelligenceMay-9-2025

In addition to a policy update, AC methods employ a parallel critic update to bootstrap the Q-value for policy gradient estimation, which often enjoys reduced variance and fast convergence in training. Despite the empirical success, theoretical analysis of AC in the most practical form remains challenging. Existing works mostly focus on either the double-loop or the two-timescale variants. In double-loop AC, the actor is updated in the outer loop only after the critic takes sufficiently many steps to have an accurate estimation of the Q-value in the inner loop [ Y anget al., 2019; Kumar et al., 2019; Wang et al., 2019 ] . Hence, the convergence of the critic is decoupled from that of the actor. The analysis is separated into a policy evaluation sub-problem in the inner loop and a perturbed gradient descent in the outer loop. In two-timescale AC, the actor and the critic are updated simultaneously in each iteration using stepsizes of different timescales. The actor stepsize (denoted by α t in the sequel) is typically smaller than that of the critic (denoted by β t in the sequel), with their ratio going to zero as the iteration number goes to infinity (i.e., lim t α t/β t = 0). The two-timescale allows the critic to approximate the correct Q-value asymptotically.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.24963/ijcai.2024/422

2505.01041

Country:

Asia > Singapore (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Two-Timescale Optimization Framework for Decentralized Linear-Quadratic Optimal Control

Feng, Lechen, Ni, Yuan-Hua, Zhang, Xuebo

arXiv.org Artificial IntelligenceJun-16-2024

This study investigates a decentralized linear-quadratic optimal control problem, and several approximate separable constrained optimization problems are formulated for the first time based on the selection of sparsity promoting functions. First, for the optimization problem with weighted $\ell_1$ sparsity promoting function, a two-timescale algorithm is adopted that is based on the BSUM (Block Successive Upper-bound Minimization) framework and a differential equation solver. Second, a piecewise quadratic sparsity promoting function is introduced, and the induced optimization problem demonstrates an accelerated convergence rate by performing the same two-timescale algorithm. Finally, the optimization problem with $\ell_0$ sparsity promoting function is considered that is nonconvex and discontinuous, and can be approximated by successive coordinatewise convex optimization problems.

algorithm, optimization problem, svec, (13 more...)

arXiv.org Artificial Intelligence

2406.11168

Country:

Asia > China > Tianjin Province > Tianjin (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

Fast, Scalable, Warm-Start Semidefinite Programming with Spectral Bundling and Sketching

Angell, Rico, McCallum, Andrew

arXiv.org Artificial IntelligenceDec-18-2023

While semidefinite programming (SDP) has traditionally been limited to moderate-sized problems, recent algorithms augmented with matrix sketching techniques have enabled solving larger SDPs. However, these methods achieve scalability at the cost of an increase in the number of necessary iterations, resulting in slower convergence as the problem size grows. Furthermore, they require iteration-dependent parameter schedules that prohibit effective utilization of warm-start initializations important in practical applications with incrementally-arriving data or mixed-integer programming. We present SpecBM, a provably correct, fast and scalable algorithm for solving massive SDPs that can leverage a warm-start initialization to further accelerate convergence. Our proposed algorithm is a spectral bundle method for solving general SDPs containing both equality and inequality constraints. Moveover, when augmented with an optional matrix sketching technique, our algorithm achieves the dramatically improved scalability of previous work while sustaining convergence speed. We empirically demonstrate the effectiveness of our method, both with and without warm-starting, across multiple applications with large instances. For example, on a problem with 600 million decision variables, SpecBM achieved a solution of standard accuracy in less than 7 minutes, where the previous state-of-the-art scalable SDP solver requires more than 16 hours. Our method solves an SDP with more than 10^13 decision variables on a single machine with 16 cores and no more than 128GB RAM; the previous state-of-the-art method had not achieved an accurate solution after 72 hours on the same instance. We make our implementation in pure JAX publicly available.

algorithm, specbm, svec, (14 more...)

arXiv.org Artificial Intelligence

2312.11801

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Global Convergence of Two-timescale Actor-Critic for Solving Linear Quadratic Regulator

Chen, Xuyang, Duan, Jingliang, Liang, Yingbin, Zhao, Lin

arXiv.org Artificial IntelligenceFeb-27-2023

The actor-critic (AC) reinforcement learning algorithms have been the powerhouse behind many challenging applications. Nevertheless, its convergence is fragile in general. To study its instability, existing works mostly consider the uncommon double-loop variant or basic models with finite state and action space. We investigate the more practical single-sample two-timescale AC for solving the canonical linear quadratic regulator (LQR) problem, where the actor and the critic update only once with a single sample in each iteration on an unbounded continuous state and action space. Existing analysis cannot conclude the convergence for such a challenging case. We develop a new analysis framework that allows establishing the global convergence to an $\epsilon$-optimal solution with at most an $\mathcal{O}(\epsilon^{-2.5})$ sample complexity. To our knowledge, this is the first finite-time convergence analysis for the single sample two-timescale AC for solving LQR with global optimality. The sample complexity improves those of other variants by orders, which sheds light on the practical wisdom of single sample algorithms. We also further validate our theoretical findings via comprehensive simulation comparisons.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2208.08744

Country:

Asia > Singapore (0.04)
North America > United States > Ohio (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(3 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Robust Reinforcement Learning: A Case Study in Linear Quadratic Regulation

Pang, Bo, Jiang, Zhong-Ping

arXiv.org Artificial IntelligenceAug-31-2020

This paper studies the robustness aspect of reinforcement learning algorithms in the presence of errors. Specifically, we revisit the benchmark problem of discrete-time linear quadratic regulation (LQR) and study the long-standing open question: Under what conditions is the policy iteration method robustly stable for dynamical systems with unbounded, continuous state and action spaces? Using advanced stability results in control theory, it is shown that policy iteration for LQR is inherently robust to small errors and enjoys local input-to-state stability: whenever the error in each iteration is bounded and small, the solutions of the policy iteration algorithm are also bounded, and, moreover, enter and stay in a small neighborhood of the optimal LQR solution. As an application, a novel off-policy optimistic least-squares policy iteration for the LQR problem is proposed, when the system dynamics are subjected to additive stochastic disturbances. The proposed new results in robust reinforcement learning are validated by a numerical example.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2008.11592

Country:

North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > New York > Kings County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Actor-Critic Provably Finds Nash Equilibria of Linear-Quadratic Mean-Field Games

Fu, Zuyue, Yang, Zhuoran, Chen, Yongxin, Wang, Zhaoran

arXiv.org Machine LearningOct-16-2019

We study discrete-time mean-field Markov games with infinite numbers of agents where each agent aims to minimize its ergodic cost. We consider the setting where the agents have identical linear state transitions and quadratic cost functions, while the aggregated effect of the agents is captured by the population mean of their states, namely, the mean-field state. For such a game, based on the Nash certainty equivalence principle, we provide sufficient conditions for the existence and uniqueness of its Nash equilibrium. Moreover, to find the Nash equilibrium, we propose a mean-field actor-critic algorithm with linear function approximation, which does not require knowing the model of dynamics. Specifically, at each iteration of our algorithm, we use the single-agent actor-critic algorithm to approximately obtain the optimal policy of the each agent given the current mean-field state, and then update the mean-field state. In particular, we prove that our algorithm converges to the Nash equilibrium at a linear rate. To the best of our knowledge, this is the first success of applying model-free reinforcement learning with function approximation to discrete-time mean-field Markov games with provable non-asymptotic global convergence guarantees.

algorithm, probability, problem 2, (14 more...)

arXiv.org Machine Learning

1910.07498

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On the Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost

Yang, Zhuoran, Chen, Yongxin, Hong, Mingyi, Wang, Zhaoran

arXiv.org Machine LearningJul-14-2019

Compared with the classical policy gradient algorithm 1992), actor-critic tracks the action-value function (critic) in policy gradient in an online(Williams, manner, and alternatively updates the policy (actor) and the critic. On the one hand, the online update of critic significantly reduces the variance of policy gradient and hence leads to faster convergence. On the other hand, it also introduces algorithmic instability, which is often observed in practice (Islam et al., 2017) and parallels the notoriously unstable training of generative adversarial and Vinyals, 2016). Such instability of actor-critic originates from severalnetworks (Pfau intertwining challenges, including(i) function approximation of actor and critic, (ii) improper choice of stepsizes, (iii) the noise arising from stochastic approximation, (iv) the asynchrony between actor and critic, and (v) possibly off-policy data used in the update of critic. As a result, the convergence of actor-critic remains much less well understood than that of policy gradient, which itself is open. Consequently, the practical use of actor-critic often lacks theoretical guidance. In this paper, we aim to theoretically understand the algorithmic instability of actor-critic.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1907.06246

Genre:

Research Report (0.50)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback