AITopics | Hong, Yi-Te

Collaborating Authors

Hong, Yi-Te

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online Reinforcement Learning in Stochastic Games

Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen

Neural Information Processing SystemsFeb-14-2020, 16:43:08 GMT

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Genre: Instructional Material > Online (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Online Reinforcement Learning in Stochastic Games

Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen

Neural Information Processing SystemsDec-31-2017

We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the \textsc{UCSG} algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the \textit{diameter}, which is an intrinsic value related to the mixing property of SGs. Slightly extended, \textsc{UCSG} finds an $\varepsilon$-maximin stationary policy with a sample complexity of $\tilde{\mathcal{O}}\left(\text{poly}(1/\varepsilon)\right)$, where $\varepsilon$ is the error parameter. To the best of our knowledge, this extended result is the first in the average-reward setting. In the analysis, we develop Markov chain's perturbation bounds for mean first passage times and techniques to deal with non-stationary opponents, which may be of interest in their own right.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Instructional Material > Online (0.60)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

Tracking the Best Expert in Non-stationary Stochastic Environments

Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen

Neural Information Processing SystemsDec-31-2016

We study the dynamic regret of multi-armed bandit and experts problem in non-stationary stochastic environments. We introduce a new parameter $\W$, which measures the total statistical variance of the loss distributions over $T$ rounds of the process, and study how this amount affects the regret. We investigate the interaction between $\W$ and $\Gamma$, which counts the number of times the distributions change, as well as $\W$ and $V$, which measures how far the distributions deviates over time. One striking result we find is that even when $\Gamma$, $V$, and $\Lambda$ are all restricted to constant, the regret lower bound in the bandit setting still grows with $T$. The other highlight is that in the full-information setting, a constant regret becomes achievable with constant $\Gamma$ and $\Lambda$, as it can be made independent of $T$, while with constant $V$ and $\Lambda$, the regret still has a $T^{1/3}$ dependency. We not only propose algorithms with upper bound guarantee, but prove their matching lower bounds as well.

algorithm, artificial intelligence, big data, (19 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Add feedback