AITopics | Zhang, Weitong

Collaborating Authors

Zhang, Weitong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Neural Thompson Sampling

Zhang, Weitong, Zhou, Dongruo, Li, Lihong, Gu, Quanquan

arXiv.org Machine LearningOct-2-2020

The stochastic multi-armed bandit (Bubeck and Cesa-Bianchi, 2012; Lattimore and Szepesvári, 2020) has been extensively studied, as an important model to optimize the tradeoff between exploration and exploitation in sequential decision making. Among its many variants, the contextual bandit is widely used in real-world applications such as recommendation (Li et al., 2010), advertising (Graepel et al., 2010), robotic control (Mahler et al., 2016), and healthcare (Greenewald et al., 2017). In each round of a contextual bandit, the agent observes a feature vector (the "context") for each of the K arms, pulls one of them, and in return receives a scalar reward. The goal is to maximize the cumulative reward, or minimize regret (to be defined later), in a total of T rounds. To do so, the agent must find a tradeoff between exploration and exploitation. One of the most effective and widely used techniques is Thompson Sampling, or TS (Thompson, 1933). The basic idea is to compute the posterior distribution of each arm being optimal for the present context, and sample an arm from this distribution. TS is often easy to implement, and has found great success in practice (Chapelle and Li, 2011; Graepel et al., 2010; Kawale et al., 2015; Russo et al., 2017). Recently, a series of work has applied TS or its variants to explore in contextual bandits with neural network models (Blundell et al., 2015; Kveton et al., 2020; Lu and Van Roy, 2017; Riquelme

neural network, thompson sampling, upstream oil & gas, (15 more...)

arXiv.org Machine Learning

2010.00827

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

Wu, Yue, Zhang, Weitong, Xu, Pan, Gu, Quanquan

arXiv.org Machine LearningJun-14-2020

Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the policy gradient. Under the two time-scale learning rate schedule, the asymptotic convergence of AC has been well studied in the literature. However, the non-asymptotic convergence and finite sample complexity of actor-critic methods are largely open. In this work, we provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point (i.e., $\|\nabla J(\boldsymbol{\theta})\|_2^2 \le \epsilon$) of the non-concave performance function $J(\boldsymbol{\theta})$, with $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ sample complexity. To the best of our knowledge, this is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2005.0135

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback