Country
Predictive intraday correlations in stable and volatile market environments: Evidence from deep learning
Standard methods and theories in finance can be ill-equipped to capture highly non-linear interactions in financial prediction problems based on large-scale datasets, with deep learning offering a way to gain insights into correlations in markets as complex systems. In this paper, we apply deep learning to econometrically constructed gradients to learn and exploit lagged correlations among S&P 500 stocks to compare model behaviour in stable and volatile market environments, and under the exclusion of target stock information for predictions. In order to measure the effect of time horizons, we predict intraday and daily stock price movements in varying interval lengths and gauge the complexity of the problem at hand with a modification of our model architecture. Our findings show that accuracies, while remaining significant and demonstrating the exploitability of lagged correlations in stock markets, decrease with shorter prediction horizons. We discuss implications for modern finance theory and our work's applicability as an investigative tool for portfolio managers. Lastly, we show that our model's performance is consistent in volatile markets by exposing it to the environment of the recent financial crisis of 2007/2008.
On the Sample Complexity of Adversarial Multi-Source PAC Learning
Konstantinov, Nikola, Frantar, Elias, Alistarh, Dan, Lampert, Christoph H.
We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the single-source case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAC-learnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multi-source setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finite-sample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAC-learnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.
Supervised Deep Similarity Matching
Qin, Shanshan, Mudur, Nayantara, Pehlevan, Cengiz
We propose a novel biologically-plausible solution to the credit assignment problem, being motivated by observations in the ventral visual pathway and trained deep neural networks. In both, representations of objects in the same category become progressively more similar, while objects belonging to different categories becomes less similar. We use this observation to motivate a layer-specific learning goal in a deep network: each layer aims to learn a representational similarity matrix that interpolates between previous and later layers. We formulate this idea using a supervised deep similarity matching cost function and derive from it deep neural networks with feedforward, lateral and feedback connections, and neurons that exhibit biologically-plausible Hebbian and anti-Hebbian plasticity. Supervised deep similarity matching can be interpreted as an energy-based learning algorithm, but with significant differences from others in how a contrastive function is constructed.
The Two Regimes of Deep Network Training
Leclerc, Guillaume, Madry, Aleksander
Learning rate schedule has a major impact on the performance of deep learning models. Still, the choice of a schedule is often heuristical. We aim to develop a precise understanding of the effects of different learning rate schedules and the appropriate way to select them. To this end, we isolate two distinct phases of training, the first, which we refer to as the "large-step" regime, exhibits a rather poor performance from an optimization point of view but is the primary contributor to model generalization; the latter, "small-step" regime exhibits much more "convex-like" optimization behavior but used in isolation produces models that generalize poorly. We find that by treating these regimes separately-and em specializing our training algorithm to each one of them, we can significantly simplify learning rate schedules.
A Model-Based Derivative-Free Approach to Black-Box Adversarial Examples: BOBYQA
Ughi, Giuseppe, Abrol, Vinayak, Tanner, Jared
We demonstrate that model-based derivative free optimisation algorithms can generate adversarial targeted misclassification of deep networks using fewer network queries than non-model-based methods. Specifically, we consider the black-box setting, and show that the number of networks queries is less impacted by making the task more challenging either through reducing the allowed $\ell^{\infty}$ perturbation energy or training the network with defences against adversarial misclassification. We illustrate this by contrasting the BOBYQA algorithm with the state-of-the-art model-free adversarial targeted misclassification approaches based on genetic, combinatorial, and direct-search algorithms. We observe that for high $\ell^{\infty}$ energy perturbations on networks, the aforementioned simpler model-free methods require the fewest queries. In contrast, the proposed BOBYQA based method achieves state-of-the-art results when the perturbation energy decreases, or if the network is trained against adversarial perturbations.
FSinR: an exhaustive package for feature selection
Aragón-Royón, F., Jiménez-Vílchez, A., Arauzo-Azofra, A., Benítez, J. M.
Feature Selection (FS) is a key task in Machine Learning. It consists in selecting a number of relevant variables for the model construction or data analysis. We present the R package, FSinR, which implements a variety of widely known filter and wrapper methods, as well as search algorithms. Thus, the package provides the possibility to perform the feature selection process, which consists in the combination of a guided search on the subsets of features with the filter or wrapper methods that return an evaluation measure of those subsets. In this article, we also present some examples on the usage of the package and a comparison with other packages available in R that contain methods for feature selection.
Self-Adaptive Training: beyond Empirical Risk Minimization
Huang, Lang, Zhang, Chao, Zhang, Hongyang
We propose self-adaptive training---a new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational cost---to improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and out-of-distribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from sub-optimal performance. In this paper, we observe that model predictions can substantially benefit the training process: self-adaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the error-capacity curve of self-adaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recently-discovered double-descent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at \url{https://github.com/LayneH/self-adaptive-training}.
Q-learning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast Learning
Devraj, Adithya M., Meyn, Sean P.
It has been a trend in the Reinforcement Learning literature to derive sample complexity bounds: a bound on how many experiences with the environment are required to obtain an $\varepsilon$-optimal policy. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1-\beta)$, where $\beta < 1$ is the discount factor. For a large discount factor, these bounds seem to imply that a very large number of samples is required to achieve an $\varepsilon$-optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded for all $\beta < 1$. One may argue that this is impossible, due to a recent min-max lower bound. The explanation is that this previous lower bound is for a specific problem, which we modify, without compromising the ultimate objective of obtaining an $\varepsilon$-optimal policy. Specifically, we show that the asymptotic variance of the Q-learning algorithm, with an optimized step-size sequence, is a quadratic function of $1/(1-\beta)$; an expected, and essentially known result. The new relative Q-learning algorithm proposed here is shown to have asymptotic variance that is a quadratic in $1/(1- \rho \beta)$, where $1 - \rho > 0$ is the spectral gap of an optimal transition matrix.
Prediction with Corrupted Expert Advice
Amir, Idan, Attias, Idan, Koren, Tomer, Livni, Roi, Mansour, Yishay
Prediction with expert advice is perhaps the single most fundamental problem in online learning and sequential decision making. In this problem, the goal of a learner is to aggregate decisions from multiple experts and achieve performance that approaches that of the best individual expert in hindsight. The standard performance criterion is the regret: the difference between the loss of the learner and that of the best single expert. The experts problem is often considered in the so-called adversarial setting, where the losses of the individual experts may be virtually arbitrary and even be chosen by an adversary so as to maximize the learner's regret. The canonical algorithm in this setup is the Multiplicative Weights algorithm (Littlestone and Warmuth, 1989; Freund and Schapire, 1995), that guarantees an optimal regret of Θ( T log N) in any problem with N experts and T decision rounds. A long line of research in online learning has focused on obtaining better regret guarantees, often referred to as "fast rates," on benign problem instances in which the loss generation process behaves more favourably than in a fully adversarial setup. A prototypical example of such an instance is the stochastic setting of the experts problem, where the losses of the experts are drawn i.i.d.
Testing Goodness of Fit of Conditional Density Models with Kernels
Jitkrittum, Wittawat, Kanagawa, Heishiro, Schölkopf, Bernhard
Conditional distributions provide a versatile tool for capturing the relationship between a target variable and a conditioning variable (or covariate). The last few decades has seen a broad range of modeling applications across multiple disciplines including econometrics in particular [30, 42], machine learning [14, 40], among others. In many cases, estimating a conditional density function from the observed data is a one of the first crucial steps in the data analysis pipeline. While the task of conditional density estimation has received a considerable attention in the literature, fewer works have investigated the equally important task of evaluating the goodness of fit of a given conditional density model. Several approaches that address the task of conditional model evaluation take the form of a hypothesis test. Given a conditional model, and a joint sample containing realizations of both target variables and covariates, test the null hypothesis stating that the model is correctly specified, against the alternative stating that it is not. The model does not specify the marginal distribution of the covariates. We refer to this task as conditional goodness-of-fit testing. One of the early nonparametric tests is [1], which extended the classic Kolmogorov test to the conditional case.