Goto

Collaborating Authors

 Lindsay, David


Temporal distribution of clusters of investors and their application in prediction with expert advice

arXiv.org Artificial Intelligence

Financial organisations such as brokers face a significant challenge in servicing the investment needs of thousands of their traders worldwide. This task is further compounded since individual traders will have their own risk appetite and investment goals. Traders may look to capture short-term trends in the market which last only seconds to minutes, or they may have longer-term views which last several days to months. To reduce the complexity of this task, client trades can be clustered. By examining such clusters, we would likely observe many traders following common patterns of investment, but how do these patterns vary through time? Knowledge regarding the temporal distributions of such clusters may help financial institutions manage the overall portfolio of risk that accumulates from underlying trader positions. This study contributes to the field by demonstrating that the distribution of clusters derived from the real-world trades of 20k Foreign Exchange (FX) traders (from 2015 to 2017) is described in accordance with Ewens' Sampling Distribution. Further, we show that the Aggregating Algorithm (AA), an on-line prediction with expert advice algorithm, can be applied to the aforementioned real-world data in order to improve the returns of portfolios of trader risk. However we found that the AA 'struggles' when presented with too many trader ``experts'', especially when there are many trades with similar overall patterns. To help overcome this challenge, we have applied and compared the use of Statistically Validated Networks (SVN) with a hierarchical clustering approach on a subset of the data, demonstrating that both approaches can be used to significantly improve results of the AA in terms of profitability and smoothness of returns.


Transductive Confidence Machine and its application to Medical Data Sets

arXiv.org Artificial Intelligence

The Transductive Confidence Machine Nearest Neighbours (TCMNN) algorithm and a supporting, simple user interface was developed. Different settings of the TCMNN algorithms' parameters were tested on medical data sets, in addition to the use of different Minkowski metrics and polynomial kernels. The effect of increasing the number of nearest neighbours and marking results with significance was also investigated. SVM implementation of the Transductive Confidence Machine was compared with Nearest Neighbours implementation. The application of neural networks was investigated as a useful comparison to the transductive algorithms.


Effective Confidence Region Prediction Using Probability Forecasters

arXiv.org Artificial Intelligence

Confidence region prediction is a practically useful extension to the commonly studied pattern recognition problem. Instead of predicting a single label, the constraint is relaxed to allow prediction of a subset of labels given a desired confidence level 1-delta. Ideally, effective region predictions should be (1) well calibrated - predictive regions at confidence level 1-delta should err with relative frequency at most delta and (2) be as narrow (or certain) as possible. We present a simple technique to generate confidence region predictions from conditional probability estimates (probability forecasts). We use this 'conversion' technique to generate confidence region predictions from probability forecasts output by standard machine learning algorithms when tested on 15 multi-class datasets. Our results show that approximately 44% of experiments demonstrate well-calibrated confidence region predictions, with the K-Nearest Neighbour algorithm tending to perform consistently well across all data. Our results illustrate the practical benefits of effective confidence region prediction with respect to medical diagnostics, where guarantees of capturing the true disease label can be given.


Learning from String Sequences

arXiv.org Artificial Intelligence

The Universal Similarity Metric (USM) has been demonstrated to give practically useful measures of "similarity" between sequence data. Here we have used the USM as an alternative distance metric in a K-Nearest Neighbours (K-NN) learner to allow effective pattern recognition of variable length sequence data. We compare this USM approach with the commonly used string-to-word vector approach. Our experiments have used two data sets of divergent domains: (1) spam email filtering and (2) protein subcellular localisation. Our results with this data reveal that the USM based K-NN learner (1) gives predictions with higher classification accuracy than those output by techniques that use the string to word vector approach, and (2) can be used to generate reliable probability forecasts.