AITopics | Lu, Chi-Jen

Plotting

Lu, Chi-Jen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Online Reinforcement Learning in Stochastic Games

Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen

Neural Information Processing SystemsFeb-14-2020, 16:43:08 GMT

artificial intelligence, machine learning, reinforcement learning, (5 more...)

Neural Information Processing Systems

Genre: Instructional Material > Online (0.65)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Online Reinforcement Learning in Stochastic Games

Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen

Neural Information Processing SystemsDec-31-2017

We study online reinforcement learning in average-reward stochastic games (SGs). An SG models a two-player zero-sum game in a Markov environment, where state transitions and one-step payoffs are determined simultaneously by a learner and an adversary. We propose the \textsc{UCSG} algorithm that achieves a sublinear regret compared to the game value when competing with an arbitrary opponent. This result improves previous ones under the same setting. The regret bound has a dependency on the \textit{diameter}, which is an intrinsic value related to the mixing property of SGs. Slightly extended, \textsc{UCSG} finds an $\varepsilon$-maximin stationary policy with a sample complexity of $\tilde{\mathcal{O}}\left(\text{poly}(1/\varepsilon)\right)$, where $\varepsilon$ is the error parameter. To the best of our knowledge, this extended result is the first in the average-reward setting. In the analysis, we develop Markov chain's perturbation bounds for mean first passage times and techniques to deal with non-stationary opponents, which may be of interest in their own right.

algorithm, artificial intelligence, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.14)

Genre: Instructional Material > Online (0.60)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

Tracking the Best Expert in Non-stationary Stochastic Environments

Wei, Chen-Yu, Hong, Yi-Te, Lu, Chi-Jen

Neural Information Processing SystemsDec-31-2016

We study the dynamic regret of multi-armed bandit and experts problem in non-stationary stochastic environments. We introduce a new parameter $\W$, which measures the total statistical variance of the loss distributions over $T$ rounds of the process, and study how this amount affects the regret. We investigate the interaction between $\W$ and $\Gamma$, which counts the number of times the distributions change, as well as $\W$ and $V$, which measures how far the distributions deviates over time. One striking result we find is that even when $\Gamma$, $V$, and $\Lambda$ are all restricted to constant, the regret lower bound in the bandit setting still grows with $T$. The other highlight is that in the full-information setting, a constant regret becomes achievable with constant $\Gamma$ and $\Lambda$, as it can be made independent of $T$, while with constant $V$ and $\Lambda$, the regret still has a $T^{1/3}$ dependency. We not only propose algorithms with upper bound guarantee, but prove their matching lower bounds as well.

algorithm, artificial intelligence, big data, (19 more...)

Neural Information Processing Systems

Country: Europe > Spain (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.68)

Add feedback

Rivalry of Two Families of Algorithms for Memory-Restricted Streaming PCA

Li, Chun-Liang, Lin, Hsuan-Tien, Lu, Chi-Jen

arXiv.org Machine LearningOct-11-2015

We study the problem of recovering the subspace spanned by the first $k$ principal components of $d$-dimensional data under the streaming setting, with a memory bound of $O(kd)$. Two families of algorithms are known for this problem. The first family is based on the framework of stochastic gradient descent. Nevertheless, the convergence rate of the family can be seriously affected by the learning rate of the descent steps and deserves more serious study. The second family is based on the power method over blocks of data, but setting the block size for its existing algorithms is not an easy task. In this paper, we analyze the convergence rate of a representative algorithm with decayed learning rate (Oja and Karhunen, 1985) in the first family for the general $k>1$ case. Moreover, we propose a novel algorithm for the second family that sets the block sizes automatically and dynamically with faster convergence rate. We then conduct empirical studies that fairly compare the two families on real-world data. The studies reveal the advantages and disadvantages of these two families.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1506.0149

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

An Online Boosting Algorithm with Theoretical Justifications

Chen, Shang-Tse, Lin, Hsuan-Tien, Lu, Chi-Jen

arXiv.org Machine LearningJun-27-2012

We study the task of online boosting--combining online weak learners into an online strong learner. While batch boosting has a sound theoretical foundation, online boosting deserves more study from the theoretical perspective. In this paper, we carefully compare the differences between online and batch boosting, and propose a novel and reasonable assumption for the online weak learner. Based on the assumption, we design an online boosting algorithm with a strong theoretical guarantee by adapting from the offline SmoothBoost algorithm that matches the assumption closely. We further tackle the task of deciding the number of weak learners using established theoretical results for online convex programming and predicting with expert advice. Experiments on real-world data sets demonstrate that the proposed algorithm compares favorably with existing online boosting algorithms.

artificial intelligence, machine learning, weak learner, (17 more...)

arXiv.org Machine Learning

1206.6422

Country:

Asia > Taiwan (0.28)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback