Its impact is drastic and real: Youtube's AIdriven recommendation system would present sports videos for days if one happens to watch a live baseball game on the platform ; email writing becomes much faster with machine learning (ML) based auto-completion ; many businesses have adopted natural language processing based chatbots as part of their customer services . AI has also greatly advanced human capabilities in complex decision-making processes ranging from determining how to allocate security resources to protect airports  to games such as poker  and Go . All such tangible and stunning progress suggests that an "AI summer" is happening. As some put it, "AI is the new electricity" . Meanwhile, in the past decade, an emerging theme in the AI research community is the so-called "AI for social good" (AI4SG): researchers aim at developing AI methods and tools to address problems at the societal level and improve the wellbeing of the society.
We discuss deep reinforcement learning in an overview style. We draw a big picture, filled with details. We discuss six core elements, six important mechanisms, and twelve applications, focusing on contemporary work, and in historical contexts. We start with background of artificial intelligence, machine learning, deep learning, and reinforcement learning (RL), with resources. Next we discuss RL core elements, including value function, policy, reward, model, exploration vs. exploitation, and representation. Then we discuss important mechanisms for RL, including attention and memory, unsupervised learning, hierarchical RL, multi-agent RL, relational RL, and learning to learn. After that, we discuss RL applications, including games, robotics, natural language processing (NLP), computer vision, finance, business management, healthcare, education, energy, transportation, computer systems, and, science, engineering, and art. Finally we summarize briefly, discuss challenges and opportunities, and close with an epilogue.
Alpha signals for statistical arbitrage strategies are often driven by latent factors. This paper analyses how to optimally trade with latent factors that cause prices to jump and diffuse. Moreover, we account for the effect of the trader's actions on quoted prices and the prices they receive from trading. Under fairly general assumptions, we demonstrate how the trader can learn the posterior distribution over the latent states, and explicitly solve the latent optimal trading problem. We provide a verification theorem, and a methodology for calibrating the model by deriving a variation of the expectation-maximization algorithm. To illustrate the efficacy of the optimal strategy, we demonstrate its performance through simulations and compare it to strategies which ignore learning in the latent factors. We also provide calibration results for a particular model using Intel Corporation stock as an example.
In this paper, we propose a simple, versatile model for learning the structure and parameters of multivariate distributions from a data set. Learning a Markov network from a given data set is not a simple problem, because Markov networks rigorously represent Markov properties, and this rigor imposes complex constraints on the design of the networks. Our proposed model removes these constraints, acquiring important aspects from the information geometry. The proposed parameter- and structure-learning algorithms are simple to execute as they are based solely on local computation at each node. Experiments demonstrate that our algorithms work appropriately.