About this course: Welcome to the Reinforcement Learning course. Here you will find out about: - foundations of RL methods: value/policy iteration, q-learning, policy gradient, etc. --- with math & batteries included - using deep neural networks for RL tasks --- also known as "the hype train" - state of the art RL algorithms --- and how to apply duct tape to them for practical problems.
This course is about Reinforcement Learning. The first step is to talk about the mathematical background: we can use a Markov Decision Process as a model for reinforcement learning. We can solve the problem 3 ways: value-iteration, policy-iteration and Q-learning. Q-learning is a model free approach so it is state-of-the-art approach. It learns the optimal policy by interacting with the environment.
Sponsored search is an indispensable business model and a major revenue contributor of almost all the search engines. From the advertisers' side, participating in ranking the search results by paying for the sponsored search advertisement to attract more awareness and purchase facilitates their commercial goal. From the users' side, presenting personalized advertisement reflecting their propensity would make their online search experience more satisfactory. Sponsored search platforms rank the advertisements by a ranking function to determine the list of advertisements to show and the charging price for the advertisers. Hence, it is crucial to find a good ranking function which can simultaneously satisfy the platform, the users and the advertisers. Moreover, advertisements showing positions under different queries from different users may associate with advertisement candidates of different bid price distributions and click probability distributions, which requires the ranking functions to be optimized adaptively to the traffic characteristics. In this work, we proposed a generic framework to optimize the ranking functions by deep reinforcement learning methods. The framework is composed of two parts: an offline learning part which initializes the ranking functions by learning from a simulated advertising environment, allowing adequate exploration of the ranking function parameter space without hurting the performance of the commercial platform. An online learning part which further optimizes the ranking functions by adapting to the online data distribution. Experimental results on a large-scale sponsored search platform confirm the effectiveness of the proposed method.
NOTE: This course is a continuation of XCS229i: Machine Learning. Though not strictly required, it is highly recommended to take XCS229i before enrolling in XCS229ii, as assignments assume knowledge of topics in the first course. As machine learning models grow in sophistication, it is increasingly important for its practitioners to be comfortable navigating their many tuning parameters. Through video lectures and hands-on exercises, this course will equip you with the knowledge to get the most out of your data. You will learn the concepts and techniques you need to guide teams of ML practitioners.