arow
Volume Regularization for Binary Classification
We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains "simple" weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of 30 NLP datasets and binarized USPS optical character recognition datasets.
Improving Adversarial Robustness by Putting More Regularizations on Less Robust Samples
Yang, Dongyoon, Kong, Insung, Kim, Yongdai
Adversarial training, which is to enhance robustness against adversarial attacks, has received much attention because it is easy to generate human-imperceptible perturbations of data to deceive a given deep neural network. In this paper, we propose a new adversarial training algorithm that is theoretically well motivated and empirically superior to other existing algorithms. A novel feature of the proposed algorithm is to apply more regularization to data vulnerable to adversarial attacks than other existing regularization algorithms do. Theoretically, we show that our algorithm can be understood as an algorithm of minimizing the regularized empirical risk motivated from a newly derived upper bound of the robust risk. Numerical experiments illustrate that our proposed algorithm improves the generalization (accuracy on examples) and robustness (accuracy on adversarial attacks) simultaneously to achieve the state-of-the-art performance.
Predicting Stock Returns with Batched AROW
Hassani, Rachid Guennouni, Gilles, Alexis, Lassalle, Emmanuel, Dรฉnouveaux, Arthur
Financial markets exhibit highly non-stationary behaviors, making it difficult to build predictive signals that do not decay too rapidly (see [SCSG13, Con01] for empirical studies of return time series). A standard method for capturing these changes in time series data consists in using a rolling regression, that is, a linear regression model trained on a rolling window and kept as static model during a prediction period. However, the size of historical training data as well as the duration of the prediction period have a direct impact on the performance of the resulting model: using too many training data would result in a model that does not react quickly enough to sudden changes while short training and prediction windows would make the model unstable (see for instance [IJR17]). Online learning algorithms are suited to situations where data arrives sequentially. New information is taken into account by updating the model parameters in a supervised fashion. More precisely, an online learning algorithm repeats the following steps indefinitely: receive a new instance x t, make a prediction ลท t, receive the correct label y t for the instance and update the model accordingly. In the particular case of regression, online models are also good candidates to handle the non-stationarity inherent in financial time series while keeping a certain memory of what has been learnt from the beginning. The recursive least squares (RLS) algorithm is a well known approach to online linear regression problems (e.g.
Volume Regularization for Binary Classification
We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains ``simple'' weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of $30$ NLP datasets and binarized USPS optical character recognition datasets.
OASIS: Online Active Semi-Supervised Learning
Goldberg, Andrew B. (Arcode Corporation) | Zhu, Xiaojin (University of Wisconsin-Madison) | Furger, Alex (University of Wisconsin-Madison) | Xu, Jun-Ming (University of Wisconsin-Madison)
We consider a learning setting of importance to large scale machine learning: potentially unlimited data arrives sequentially, but only a small fraction of it is labeled. The learner cannot store the data; it should learn from both labeled and unlabeled data, and it may also request labels for some of the unlabeled items. This setting is frequently encountered in real-world applications and has the characteristics of online, semi-supervised, and active learning. Yet previous learning models fail to consider these characteristics jointly. We present OASIS, a Bayesian model for this learning setting. The main contributions of the model include the novel integration of a semi-supervised likelihood function, a sequential Monte Carlo scheme for efficient online Bayesian updating, and a posterior-reduction criterion for active learning. Encouraging results on both synthetic and real-world optical character recognition data demonstrate the synergy of these characteristics in OASIS.
Learning via Gaussian Herding
We introduce a new family of online learning algorithms based upon constraining the velocity flow over a distribution of weight vectors. In particular, we show how to effectively herd a Gaussian weight vector distribution by trading off velocity constraints with a loss function. By uniformly bounding this loss function, we demonstrate how to solve the resulting optimization analytically. We compare the resulting algorithms on a variety of real world datasets, and demonstrate how these algorithms achieve state-of-the-art robust performance, especially with high label noise in the training data.
Adaptive Regularization of Weight Vectors
Crammer, Koby, Kulesza, Alex, Dredze, Mark
We present AROW, a new online learning algorithm that combines several properties of successful : large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, which does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and empirically show that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data.