Goto

Collaborating Authors

 Reinforcement Learning


Financial Engineering and Artificial Intelligence in Python

#artificialintelligence

Preview this course - GET COUPON CODE Have you ever thought about what would happen if you combined the power of machine learning and artificial intelligence with financial engineering? Today, you can stop imagining, and start doing. This course will teach you the core fundamentals of financial engineering, with a machine learning twist. We will cover must-know topics in financial engineering, such as: Exploratory data analysis, significance testing, correlations, alpha and beta Time series analysis, simple moving average, exponentially-weighted moving average Holt-Winters exponential smoothing model Efficient Market Hypothesis Random Walk Hypothesis Time series forecasting ("stock price prediction") Modern portfolio theory Efficient frontier / Markowitz bullet Mean-variance optimization Maximizing the Sharpe ratio Convex optimization with Linear Programming and Quadratic Programming Capital Asset Pricing Model (CAPM) Algorithmic trading (VIP only) Statistical Factor Models (VIP only) Regime Detection with Hidden Markov Models (VIP only) In addition, we will look at various non-traditional techniques which stem purely from the field of machine learning and artificial intelligence, such as: Classification models Unsupervised learning Reinforcement learning and Q-learning ***VIP-only sections (get it while it lasts!) You will learn exactly why their methodology is fundamentally flawed and why their results are complete nonsense.


Important AI and Machine Learning Trends for 2020

#artificialintelligence

Businesses that range from high tech startups to international multinationals see artificial intelligence as a crucial competitive edge in an increasingly technical and competitive sector. However, the AI industry goes so fast that it is often difficult to adhere to the most recent research discoveries and accomplishments, and even more difficult to employ technological results to achieve business results. To assist you to create a strong AI plan for your company in 2020, I have outlined the hottest trends across various research areas, such as natural language processing, conversational AI, computer vision, and reinforcement learning. I have also included outside education it is possible to follow to enhance your experience. In 2018, pre-trained language versions pushed the limitations of natural language understanding and production.


Efficient falsification approach for autonomous vehicle validation using a parameter optimisation technique based on reinforcement learning

arXiv.org Artificial Intelligence

The widescale deployment of Autonomous Vehicles (AV) appears to be imminent despite many safety challenges that are yet to be resolved. It is well-known that there are no universally agreed Verification and Validation (VV) methodologies guarantee absolute safety, which is crucial for the acceptance of this technology. The uncertainties in the behaviour of the traffic participants and the dynamic world cause stochastic reactions in advanced autonomous systems. The addition of ML algorithms and probabilistic techniques adds significant complexity to the process for real-world testing when compared to traditional methods. Most research in this area focuses on generating challenging concrete scenarios or test cases to evaluate the system performance by looking at the frequency distribution of extracted parameters as collected from the real-world data. These approaches generally employ Monte-Carlo simulation and importance sampling to generate critical cases. This paper presents an efficient falsification method to evaluate the System Under Test. The approach is based on a parameter optimisation problem to search for challenging scenarios. The optimisation process aims at finding the challenging case that has maximum return. The method applies policy-gradient reinforcement learning algorithm to enable the learning. The riskiness of the scenario is measured by the well established RSS safety metric, euclidean distance, and instance of a collision. We demonstrate that by using the proposed method, we can more efficiently search for challenging scenarios which could cause the system to fail in order to satisfy the safety requirements.


DORB: Dynamically Optimizing Multiple Rewards with Bandits

arXiv.org Artificial Intelligence

Policy gradients-based reinforcement learning has proven to be a promising approach for directly optimizing non-differentiable evaluation metrics for language generation tasks. However, optimizing for a specific metric reward leads to improvements in mostly that metric only, suggesting that the model is gaming the formulation of that metric in a particular way without often achieving real qualitative improvements. Hence, it is more beneficial to make the model optimize multiple diverse metric rewards jointly. While appealing, this is challenging because one needs to manually decide the importance and scaling weights of these metric rewards. Further, it is important to consider using a dynamic combination and curriculum of metric rewards that flexibly changes over time. Considering the above aspects, in our work, we automate the optimization of multiple metric rewards simultaneously via a multi-armed bandit approach (DORB), where at each round, the bandit chooses which metric reward to optimize next, based on expected arm gains. We use the Exp3 algorithm for bandits and formulate two approaches for bandit rewards: (1) Single Multi-reward Bandit (SM-Bandit); (2) Hierarchical Multi-reward Bandit (HM-Bandit). We empirically show the effectiveness of our approaches via various automatic metrics and human evaluation on two important NLG tasks: question generation and data-to-text generation, including on an unseen-test transfer setup. Finally, we present interpretable analyses of the learned bandit curriculum over the optimized rewards.


Placement in Integrated Circuits using Cyclic Reinforcement Learning and Simulated Annealing

arXiv.org Artificial Intelligence

Physical design and production of Integrated Circuits (IC) is becoming increasingly more challenging as the sophistication in IC technology is steadily increasing. Placement has been one of the most critical steps in IC physical design. Through decades of research, partition-based, analytical-based and annealing-based placers have been enriching the placement solution toolbox. However, open challenges including long run time and lack of ability to generalize continue to restrict wider applications of existing placement tools. We devise a learning-based placement tool based on cyclic application of Reinforcement Learning (RL) and Simulated Annealing (SA) by leveraging the advancement of RL. Results show that the RL module is able to provide a better initialization for SA and thus leads to a better final placement design. Compared to other recent learning-based placers, our method is majorly different with its combination of RL and SA. It leverages the RL model's ability to quickly get a good rough solution after training and the heuristic's ability to realize greedy improvements in the solution.


Deep reinforcement learning for RAN optimization and control

#artificialintelligence

Due to the high variability of the traffic in the radio access network (RAN), fixed network configurations are not flexible to achieve the optimal performance. Our vendors provide several settings of the eNodeB to optimize the RAN performance, such as media access control scheduler, loading balance, etc. But the detailed mechanisms of the eNodeB configurations are usually very complicated and not disclosed, not to mention the large KPIs space needed to be considered. We aim to build an intelligent controller without strong assumption or domain knowledge about the RAN and can run for 24/7 without supervision. To achieve this goal, we first build a closed-loop control testbed RAN in a lab environment with one eNodeB provided by one of the largest wireless vendors and four smartphones. Next, we build a double Q network agent that is trained with the live feedbacks of the key performance indicators from the RAN.


Please human, can you teach me how to AI?

#artificialintelligence

In the exploding era of computing (ubiquitous, mobile, quantum or whatever suits you better) there's still a sacred Graal we struggle to reach without success, even if we look closer every Moore's law step we advance: Artificial General Intelligence (AGI). Back in 2010 or so, in my days as Bioengineering MSc at University, I had my 10 minutes epiphany. I suddenly pictured that, some day, a reinforcement learning implementation general enough on a hardware powerful and beautiful enough might have led to a so-called strong artificial intelligence or artificial general intelligence. Indeed for those who do not chew machine learning at breakfast, this may look something really cool, but moving to a more concrete reality my realization was much more pragmatic. In "traditional" Artificial Intelligence approaches, you pick for a task (the one you think it is worthy enough to be tackled) and put in place a supervised learning technique.


Cutting-Edge AI: Deep Reinforcement Learning in Python

#artificialintelligence

Created by Lazy Programmer Inc. English [Auto-generated] Created by Lazy Programmer Inc. This is technically Deep Learning in Python part 11 of my deep learning series, and my 3rd reinforcement learning course. Deep Reinforcement Learning is actually the combination of 2 topics: Reinforcement Learning and Deep Learning (Neural Networks). While both of these have been around for quite some time, it's only been recently that Deep Learning has really taken off, and along with it, Reinforcement Learning. The maturation of deep learning has propelled advances in reinforcement learning, which has been around since the 1980s, although some aspects of it, such as the Bellman equation, have been for much longer.


A deep Q-Learning based Path Planning and Navigation System for Firefighting Environments

#artificialintelligence

Live fire creates a dynamic, rapidly changing environment that presents a worthy challenge for deep learning and artificial intelligence methodologies to assist firefighters with scene comprehension in maintaining their situational awareness, tracking and relay of important features necessary for key decisions as they tackle these catastrophic events. We propose a deep Q-learning based agent who is immune to stress induced disorientation and anxiety and thus able to make clear decisions for navigation based on the observed and stored facts in live fire environments. As a proof of concept, we imitate structural fire in a gaming engine called Unreal Engine which enables the interaction of the agent with the environment. The agent is trained with a deep Q-learning algorithm based on a set of rewards and penalties as per its actions on the environment. We exploit experience replay to accelerate the learning process and augment the learning of the agent with human-derived experiences.


A Geometric Perspective on Self-Supervised Policy Adaptation

arXiv.org Artificial Intelligence

One of the most challenging aspects of real-world reinforcement learning (RL) is the multitude of unpredictable and ever-changing distractions that could divert an agent from what was tasked to do in its training environment. While an agent could learn from reward signals to ignore them, the complexity of the real-world can make rewards hard to acquire, or, at best, extremely sparse. A recent class of self-supervised methods have shown promise that reward-free adaptation under challenging distractions is possible. However, previous work focused on a short one-episode adaptation setting. In this paper, we consider a long-term adaptation setup that is more akin to the specifics of the real-world and propose a geometric perspective on self-supervised adaptation. We empirically describe the processes that take place in the embedding space during this adaptation process, reveal some of its undesirable effects on performance and show how they can be eliminated. Moreover, we theoretically study how actor-based and actor-free agents can further generalise to the target environment by manipulating the geometry of the manifolds described by the actor and critic functions.