AITopics | Zohren, Stefan

Collaborating Authors

Zohren, Stefan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Same State, Different Task: Continual Reinforcement Learning without Interference

Kessler, Samuel, Parker-Holder, Jack, Ball, Philip, Zohren, Stefan, Roberts, Stephen J.

arXiv.org Artificial IntelligenceJun-5-2021

Continual Learning (CL) considers the problem of training an agent sequentially on a set of tasks while seeking to retain performance on all previous tasks. A key challenge in CL is catastrophic forgetting, which arises when performance on a previously mastered task is reduced when learning a new task. While a variety of methods exist to combat forgetting, in some cases tasks are fundamentally incompatible with each other and thus cannot be learnt by a single policy. This can occur, in reinforcement learning (RL) when an agent may be rewarded for achieving different goals from the same observation. In this paper we formalize this ``interference'' as distinct from the problem of forgetting. We show that existing CL methods based on single neural network predictors with shared replay buffers fail in the presence of interference. Instead, we propose a simple method, OWL, to address this challenge. OWL learns a factorized policy, using shared feature extraction layers, but separate heads, each specializing on a new task. The separate heads in OWL are used to prevent interference. At test time, we formulate policy selection as a multi-armed bandit problem, and show it is possible to select the best policy for an unknown task using feedback from the environment. The use of bandit algorithms allows the OWL agent to constructively re-use different continually learnt policies at different times during an episode. We show in multiple RL environments that existing replay based CL methods fail, while OWL is able to achieve close to optimal performance when training sequentially.

deep learning, interference, neural network, (20 more...)

arXiv.org Artificial Intelligence

2106.0294

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (0.67)
Education (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Slow Momentum with Fast Reversion: A Trading Strategy Using Deep Learning and Changepoint Detection

Wood, Kieran, Roberts, Stephen, Zohren, Stefan

arXiv.org Machine LearningMay-28-2021

Momentum strategies are an important part of alternative investments and are at the heart of commodity trading advisors (CTAs). These strategies have however been found to have difficulties adjusting to rapid changes in market conditions, such as during the 2020 market crash. In particular, immediately after momentum turning points, where a trend reverses from an uptrend (downtrend) to a downtrend (uptrend), time-series momentum (TSMOM) strategies are prone to making bad bets. To improve the response to regime change, we introduce a novel approach, where we insert an online change-point detection (CPD) module into a Deep Momentum Network (DMN) [1904.04912] pipeline, which uses an LSTM deep-learning architecture to simultaneously learn both trend estimation and position sizing. Furthermore, our model is able to optimise the way in which it balances 1) a slow momentum strategy which exploits persisting trends, but does not overreact to localised price moves, and 2) a fast mean-reversion strategy regime by quickly flipping its position, then swapping it back again to exploit localised price moves. Our CPD module outputs a changepoint location and severity score, allowing our model to learn to respond to varying degrees of disequilibrium, or smaller and more localised changepoints, in a data driven manner. Using a portfolio of 50, liquid, continuous futures contracts over the period 1990-2020, the addition of the CPD module leads to an improvement in Sharpe ratio of $33\%$. Even more notably, this module is especially beneficial in periods of significant nonstationarity, and in particular, over the most recent years tested (2015-2020) the performance boost is approximately $400\%$. This is especially interesting as traditional momentum strategies have been underperforming in this period.

deep learning, module, neural network, (18 more...)

arXiv.org Machine Learning

2105.13727

Country: Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Deep Learning for Market by Order Data

Zhang, Zihao, Lim, Bryan, Zohren, Stefan

arXiv.org Artificial IntelligenceFeb-17-2021

Market by order (MBO) data - a detailed feed of individual trade instructions for a given stock on an exchange - is arguably one of the most granular sources of microstructure information. While limit order books (LOBs) are implicitly derived from it, MBO data is largely neglected by current academic literature which focuses primarily on LOB modelling. In this paper, we demonstrate the utility of MBO data for forecasting high-frequency price movements, providing an orthogonal source of information to LOB snapshots. We provide the first predictive analysis on MBO data by carefully introducing the data structure and presenting a specific normalisation scheme to consider level information in order books and to allow model training with multiple instruments. Through forecasting experiments using deep neural networks, we show that while MBO-driven and LOB-driven models individually provide similar performance, ensembles of the two can lead to improvements in forecasting accuracy -- indicating that MBO data is additive to LOB-based features.

deep learning, mbo data, neural network, (20 more...)

arXiv.org Artificial Intelligence

2102.08811

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Non-Stationary Noisy Covariance Matrices: A Cross-Validation Approach

Tan, Vincent W. C., Zohren, Stefan

arXiv.org Machine LearningDec-10-2020

We introduce a novel covariance estimator that exploits the heteroscedastic nature of financial time series by employing exponential weighted moving averages and shrinking the in-sample eigenvalues through cross-validation. Our estimator is model-agnostic in that we make no assumptions on the distribution of the random entries of the matrix or structure of the covariance matrix. Additionally, we show how Random Matrix Theory can provide guidance for automatic tuning of the hyperparameter which characterizes the time scale for the dynamics of the estimator. By attenuating the noise from both the cross-sectional and time-series dimensions, we empirically demonstrate the superiority of our estimator over competing estimators that are based on exponentially-weighted and uniformly-weighted covariance matrices.

artificial intelligence, banking & finance, eigenvalue, (18 more...)

arXiv.org Machine Learning

2012.05757

Country:

North America > United States (0.46)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.63)

Industry: Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.61)

Add feedback

Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training

Granziol, Diego, Zohren, Stefan, Roberts, Stephen

arXiv.org Machine LearningNov-6-2020

We study the effect of mini-batching on the loss landscape of deep neural networks using spiked, field-dependent random matrix theory. We demonstrate that the magnitude of the extremal values of the batch Hessian are larger than those of the empirical Hessian. We also derive similar results for the Generalised Gauss-Newton matrix approximation of the Hessian. As a consequence of our theorems we derive an analytical expressions for the maximal learning rates as a function of batch size, informing practical optimisation schemes for both stochastic gradient descent (linear scaling) and adaptive algorithms such as Adam (square root scaling). Whilst the linear scaling for stochastic gradient descent has been derived under more restrictive conditions, which we generalise, the square root scaling rule for adaptive optimisers is, to our knowledge, completely novel. For stochastic Second-order methods and adaptive methods, we derive that the minimal damping coefficient is proportional to the ratio of the learning rate to batch size.

deep learning, matrix, neural network, (17 more...)

arXiv.org Machine Learning

2006.09092

Country:

Europe > United Kingdom > England (0.14)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Time Series Forecasting With Deep Learning: A Survey

Lim, Bryan, Zohren, Stefan

arXiv.org Machine LearningSep-27-2020

While traditional methods have focused on parametric models informed by domain expertise - such as autoregressive (AR) [6], exponential smoothing [7, 8] or structural time series models [9] - modern machine learning methods provide a means to learn temporal dynamics in a purely data-driven manner [10]. With the increasing data availability and computing power in recent times, machine learning has become a vital part of the next generation of time series forecasting models. Deep learning in particular has gained popularity in recent times, inspired by notable achievements in image classification [11], natural language processing [12] and reinforcement learning [13]. By incorporating bespoke architectural assumptions - or inductive biases [14] - that reflect the nuances of underlying datasets, deep neural networks are able to learn complex data representations [15], which alleviates the need for manual feature engineering and model design. The availability of open-source backpropagation frameworks [16, 17] has also simplified the network training, allowing for the customisation for network components and loss functions.

deep learning, forecasting, neural network, (13 more...)

arXiv.org Machine Learning

2004.13408

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Overview (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Maximum Entropy approach to Massive Graph Spectra

Granziol, Diego, Ru, Robin, Zohren, Stefan, Dong, Xiaowen, Osborne, Michael, Roberts, Stephen

arXiv.org Machine LearningDec-19-2019

Machine Learning Research Group and Oxford-Man Institute for Quantitative Finance, Department of Engineering Science, University of Oxford Abstract Graph spectral techniques for measuring graph similarity, or for learning the cluster number, require kernel smoothing. The choice of kernel function and bandwidth are typically chosen in an ad-hoc manner and heavily affect the resulting output. We prove that kernel smoothing biases the moments of the spectral density. We propose an information theoretically optimal approach to learn a smooth graph spectral density, which fully respects the moment information. Our method's computational cost is linear in the number of edges, and hence can be applied to large networks, with millions of nodes. We apply our method to the problems to graph similarity and cluster number learning, where we outperform comparable iterative spectral approaches on synthetic and real graphs. Keywords: Networks, Information Theory, Maximum Entropy, Graph Spectral Theory, Random matrix theory, iterative methods, kernel smoothing 1. Introduction: networks, their graph spectra and importance Many systems of interest can be naturally characterised by complex networks; examples include social networks (Mislove et al., 2007b; Flake et al., 2000; Leskovec et al., 2007), biological networks (Palla et al., 2005) and technological networks.

artificial intelligence, machine learning, spectral density, (17 more...)

arXiv.org Machine Learning

1912.09068

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.34)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.62)

Add feedback

MEMe: An Accurate Maximum Entropy Method for Efficient Approximations in Large-Scale Machine Learning

Granziol, Diego, Ru, Binxin, Zohren, Stefan, Doing, Xiaowen, Osborne, Michael, Roberts, Stephen

arXiv.org Machine LearningJun-3-2019

Making high quality inference on large, feature rich datasets under a constrained computational budget is arguably the primary goal of the learning community. This, however, comes with significant challenges. On the one hand, the exact computation of linear algebraic quantities may be prohibitively expensive, such as that of the log determinant. On the other hand, an analytic expression for the quantity of interest may not exist at all, such as the case for the entropy of a Gaussian mixture model, and approximate methods are often both inefficient and inaccurate.

algorithm, artificial intelligence, bayesian inference, (13 more...)

arXiv.org Machine Learning

doi: 10.3390/e21060551

1906.01101

Country:

Europe > United Kingdom > England (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Maximum Entropy (0.44)

Add feedback

Population-based Global Optimisation Methods for Learning Long-term Dependencies with RNNs

Lim, Bryan, Zohren, Stefan, Roberts, Stephen

arXiv.org Machine LearningMay-23-2019

Despite recent innovations in network architectures and loss functions, training RNNs to learn long-term dependencies remains difficult due to challenges with gradient-based optimisation methods. Inspired by the success of Deep Neuroevolution in reinforcement learning (Such et al. 2017), we explore the use of gradient-free population-based global optimisation (PBO) techniques -- training RNNs to capture long-term dependencies in time-series data. Testing evolution strategies (ES) and particle swarm optimisation (PSO) on an application in volatility forecasting, we demonstrate that PBO methods lead to performance improvements in general, with ES exhibiting the most consistent results across a variety of architectures.

artificial intelligence, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1905.09691

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.15)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

Enhancing Time Series Momentum Strategies Using Deep Neural Networks

Lim, Bryan, Zohren, Stefan, Roberts, Stephen

arXiv.org Machine LearningApr-9-2019

While time series momentum is a well-studied phenomenon in finance, common strategies require the explicit definition of both a trend estimator and a position sizing rule. In this paper, we introduce Deep Momentum Networks -- a hybrid approach which injects deep learning based trading rules into the volatility scaling framework of time series momentum. The model also simultaneously learns both trend estimation and position sizing in a data-driven manner, with networks directly trained by optimising the Sharpe ratio of the signal. Backtesting on a portfolio of 88 continuous futures contracts, we demonstrate that the Sharpe-optimised LSTM improved traditional methods by more than two times in the absence of transactions costs, and continue outperforming when considering transaction costs up to 2-3 basis points. To account for more illiquid assets, we also propose a turnover regularisation term which trains the network to factor in costs at run-time.

deep learning, downstream oil & gas, volatility, (20 more...)

arXiv.org Machine Learning

1904.04912

Genre: Research Report (0.64)

Industry:

Energy > Oil & Gas (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback