AITopics | Liang, Yingbin

Collaborating Authors

Liang, Yingbin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sample Complexity Bounds for Two Timescale Value-based Reinforcement Learning Algorithms

Xu, Tengyu, Liang, Yingbin

arXiv.org Machine LearningNov-10-2020

Two timescale stochastic approximation (SA) has been widely used in value-based reinforcement learning algorithms. In the policy evaluation setting, it can model the linear and nonlinear temporal difference learning with gradient correction (TDC) algorithms as linear SA and nonlinear SA, respectively. In the policy optimization setting, two timescale nonlinear SA can also model the greedy gradient-Q (Greedy-GQ) algorithm. In previous studies, the non-asymptotic analysis of linear TDC and Greedy-GQ has been studied in the Markovian setting, with diminishing or accuracy-dependent stepsize. For the nonlinear TDC algorithm, only the asymptotic convergence has been established. In this paper, we study the non-asymptotic convergence rate of two timescale linear and nonlinear TDC and Greedy-GQ under Markovian sampling and with accuracy-independent constant stepsize. For linear TDC, we provide a novel non-asymptotic analysis and show that it attains an $\epsilon$-accurate solution with the optimal sample complexity of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$ under a constant stepsize. For nonlinear TDC and Greedy-GQ, we show that both algorithms attain $\epsilon$-accurate stationary solution with sample complexity $\mathcal{O}(\epsilon^{-2})$. It is the first non-asymptotic convergence result established for nonlinear TDC under Markovian sampling and our result for Greedy-GQ outperforms the previous result orderwisely by a factor of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$.

algorithm, artificial intelligence, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2011.05053

Country: North America > Canada > Alberta (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters

Ji, Kaiyi, Lee, Jason D., Liang, Yingbin, Poor, H. Vincent

arXiv.org Machine LearningOct-22-2020

Although model-agnostic meta-learning (MAML) is a very successful algorithm in meta-learning practice, it can have high computational cost because it updates all model parameters over both the inner loop of task-specific adaptation and the outer-loop of meta initialization training. A more efficient algorithm ANIL (which refers to almost no inner loop) was proposed recently by Raghu et al. 2019, which adapts only a small subset of parameters in the inner loop and thus has substantially less computational cost than MAML as demonstrated by extensive experiments. However, the theoretical convergence of ANIL has not been studied yet. In this paper, we characterize the convergence rate and the computational complexity for ANIL under two representative inner-loop loss geometries, i.e., strongly-convexity and nonconvexity. Our results show that such a geometric property can significantly affect the overall convergence performance of ANIL. For example, ANIL achieves a faster convergence rate for a strongly-convex inner-loop loss as the number $N$ of inner-loop gradient descent steps increases, but a slower convergence rate for a nonconvex inner-loop loss as $N$ increases. Moreover, our complexity analysis provides a theoretical quantification on the improved efficiency of ANIL over MAML. The experiments on standard few-shot meta-learning benchmarks validate our theoretical findings.

meta, neural network, optimization problem, (21 more...)

arXiv.org Machine Learning

2006.09486

Country:

North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)

Add feedback

Provably Faster Algorithms for Bilevel Optimization and Applications to Meta-Learning

Ji, Kaiyi, Yang, Junjie, Liang, Yingbin

arXiv.org Machine LearningOct-15-2020

Bilevel optimization has arisen as a powerful tool for many machine learning problems such as meta-learning, hyper-parameter optimization, reinforcement learning, etc. In this paper, we investigate the nonconvex-strongly-convex bilevel optimization problem, and propose two novel algorithms named deterBiO and stocBiO respectively for the deterministic and stochastic settings. At the core design of deterBiO is the construction of a low-cost and easy-to-implement hyper-gradient estimator via a simple back-propagation. In addition, stocBiO updates with the mini-batch data sampling rather than the existing single-sample schemes, where a sample-efficient Hessian inverse estimator is proposed. We provide the finite-time convergence guarantee for both algorithms, and show that they outperform the best known computational complexities orderwisely with respect to the condition number $\kappa$ and/or the target accuracy $\epsilon$. We further demonstrate the superior efficiency of the proposed algorithms by the experiments on meta-learning and hyper-parameter optimization.

algorithm, neural network, optimization problem, (18 more...)

arXiv.org Machine Learning

2010.07962

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Finite-Time Analysis for Double Q-learning

Xiong, Huaqing, Zhao, Lin, Liang, Yingbin, Zhang, Wei

arXiv.org Machine LearningOct-12-2020

Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $\epsilon$-accurate neighborhood of the global optimum by taking $\tilde{\Omega}\left(\left( \frac{1}{(1-\gamma)^6\epsilon^2}\right)^{\frac{1}{\omega}} +\left(\frac{1}{1-\gamma}\right)^{\frac{1}{1-\omega}}\right)$ iterations, where $\omega\in(0,1)$ is the decay parameter of the learning rate, and $\gamma$ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.

artificial intelligence, q-learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2009.14257

Country:

Asia (0.45)
North America > United States (0.28)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Spectral Algorithms for Community Detection in Directed Networks

Wang, Zhe, Liang, Yingbin, Ji, Pengsheng

arXiv.org Machine LearningAug-9-2020

Community detection in large social networks is affected by degree heterogeneity of nodes. The D-SCORE algorithm for directed networks was introduced to reduce this effect by taking the element-wise ratios of the singular vectors of the adjacency matrix before clustering. Meaningful results were obtained for the statistician citation network, but rigorous analysis on its performance was missing. First, this paper establishes theoretical guarantee for this algorithm and its variants for the directed degree-corrected block model (Directed-DCBM). Second, this paper provides significant improvements for the original D-SCORE algorithms by attaching the nodes outside of the community cores using the information of the original network instead of the singular vectors.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Machine Learning

2008.0382

Country:

North America > United States > Ohio (0.14)
North America > United States > Georgia > Clarke County > Athens (0.14)

Genre: Research Report (0.81)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Momentum Q-learning with Finite-Sample Convergence Guarantee

Weng, Bowen, Xiong, Huaqing, Zhao, Lin, Liang, Yingbin, Zhang, Wei

arXiv.org Machine LearningJul-30-2020

Existing studies indicate that momentum ideas in conventional optimization can be used to improve the performance of Q-learning algorithms. However, the finite-sample analysis for momentum-based Q-learning algorithms is only available for the tabular case without function approximations. This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee. Specifically, we propose the MomentumQ algorithm, which integrates the Nesterov's and Polyak's momentum schemes, and generalizes the existing momentum-based Q-learning algorithms. For the infinite state-action space case, we establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling. In particular, we characterize the finite-sample convergence rate which is provably faster than the vanilla Q-learning. This is the first finite-sample analysis for momentum-based Q-learning algorithms with function approximations. For the tabular case under synchronous sampling, we also obtain a finite-sample convergence rate that is slightly better than the SpeedyQ \citep{azar2011speedy} when choosing a special family of step sizes. Finally, we demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.

artificial intelligence, q-learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2007.15418

Country:

Asia (0.93)
North America > United States > Ohio (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

Guan, Ziwei, Xu, Tengyu, Liang, Yingbin

arXiv.org Machine LearningJun-24-2020

Generative adversarial imitation learning (GAIL) is a popular inverse reinforcement learning approach for jointly optimizing policy and reward from expert trajectories. A primary question about GAIL is whether applying a certain policy gradient algorithm to GAIL attains a global minimizer (i.e., yields the expert policy), for which existing understanding is very limited. Such global convergence has been shown only for the linear (or linear-type) MDP and linear (or linearizable) reward. In this paper, we study GAIL under general MDP and for nonlinear reward function classes (as long as the objective function is strongly concave with respect to the reward parameter). We characterize the global convergence with a sublinear rate for a broad range of commonly used policy gradient algorithms, all of which are implemented in an alternating manner with stochastic gradient ascent for reward update, including projected policy gradient (PPG)-GAIL, Frank-Wolfe policy gradient (FWPG)-GAIL, trust region policy optimization (TRPO)-GAIL and natural policy gradient (NPG)-GAIL. This is the first systematic theoretical study of GAIL for global convergence.

algorithm, artificial intelligence, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2006.13506

Country: North America > United States (0.45)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

Xu, Tengyu, Wang, Zhe, Liang, Yingbin

arXiv.org Machine LearningJun-24-2020

The actor-critic (AC) algorithm is a popular method to find an optimal policy in reinforcement learning. In the infinite horizon scenario, the finite-sample convergence rate for the AC and natural actor-critic (NAC) algorithms has been established recently, but under independent and identically distributed (i.i.d.) sampling and single-sample update at each iteration. In contrast, this paper characterizes the convergence rate and sample complexity of AC and NAC under Markovian sampling, with mini-batch data for each iteration, and with actor having general policy class approximation. We show that the overall sample complexity for a mini-batch AC to attain an $\epsilon$-accurate stationary point improves the best known sample complexity of AC by an order of $\mathcal{O}(\epsilon^{-1}\log(1/\epsilon))$, and the overall sample complexity for a mini-batch NAC to attain an $\epsilon$-accurate globally optimal point improves the existing sample complexity of NAC by an order of $\mathcal{O}(\epsilon^{-2}/\log(1/\epsilon))$. Moreover, the sample complexity of AC and NAC characterized in this work outperforms that of policy gradient (PG) and natural policy gradient (NPG) by a factor of $\mathcal{O}((1-\gamma)^{-3})$ and $\mathcal{O}((1-\gamma)^{-4}\epsilon^{-2}/\log(1/\epsilon))$, respectively. This is the first theoretical study establishing that AC and NAC attain orderwise performance improvement over PG and NPG under infinite horizon due to the incorporation of critic.

artificial intelligence, null 2 2, reinforcement learning, (13 more...)

arXiv.org Machine Learning

2004.12956

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Enhanced First and Zeroth Order Variance Reduced Algorithms for Min-Max Optimization

Xu, Tengyu, Wang, Zhe, Liang, Yingbin, Poor, H. Vincent

arXiv.org Machine LearningJun-17-2020

Min-max optimization captures many important machine learning problems such as robust adversarial learning and inverse reinforcement learning, and nonconvex-strongly-concave min-max optimization has been an active line of research. Specifically, a novel variance reduction algorithm SREDA was proposed recently by (Luo et al. 2020) to solve such a problem, and was shown to achieve the optimal complexity dependence on the required accuracy level $\epsilon$. Despite the superior theoretical performance, the convergence guarantee of SREDA requires stringent initialization accuracy and an $\epsilon$-dependent stepsize for controlling the per-iteration progress, so that SREDA can run very slowly in practice. This paper develops a novel analytical framework that guarantees the SREDA's optimal complexity performance for a much enhanced algorithm SREDA-Boost, which has less restrictive initialization requirement and an accuracy-independent (and much bigger) stepsize. Hence, SREDA-Boost runs substantially faster in experiments than SREDA. We further apply SREDA-Boost to propose a zeroth-order variance reduction algorithm named ZO-SREDA-Boost for the scenario that has access only to the information about function values not gradients, and show that ZO-SREDA-Boost outperforms the best known complexity dependence on $\epsilon$. This is the first study that applies the variance reduction technique to zeroth-order algorithm for min-max optimization problems.

artificial intelligence, null 2 2, reinforcement learning, (18 more...)

arXiv.org Machine Learning

2006.09361

Country: North America > United States (0.14)

Genre: Research Report (0.49)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms

Xu, Tengyu, Wang, Zhe, Liang, Yingbin

arXiv.org Machine LearningMay-7-2020

As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first nested-loop design, actor's one update of policy is followed by an entire loop of critic's updates of the value function, and the finite-sample analysis of such AC and NAC algorithms have been recently well established. The second two time-scale design, in which actor and critic update simultaneously but with different learning rates, has much fewer tuning parameters than the nested-loop design and is hence substantially easier to implement. Although two time-scale AC and NAC have been shown to converge in the literature, the finite-sample convergence rate has not been established. In this paper, we provide the first such non-asymptotic convergence rate for two time-scale AC and NAC under Markovian sampling and with actor having general policy class approximation. We show that two time-scale AC requires the overall sample complexity at the order of $\mathcal{O}(\epsilon^{-2.5}\log^3(\epsilon^{-1}))$ to attain an $\epsilon$-accurate stationary point, and two time-scale NAC requires the overall sample complexity at the order of $\mathcal{O}(\epsilon^{-4}\log^2(\epsilon^{-1}))$ to attain an $\epsilon$-accurate global optimal point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling and for analyzing the convergence rate of the linear critic with dynamically changing base functions and transition kernel.

artificial intelligence, null 2, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2005.03557

Country: North America > United States > Ohio (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback