Seldin, Yevgeny, Laviolette, François, Cesa-Bianchi, Nicolò, Shawe-Taylor, John, Auer, Peter

We present a set of high-probability inequalities that control the concentration of weighted averages of multiple (possibly uncountably many) simultaneously evolving and interdependent martingales. Our results extend the PAC-Bayesian analysis in learning theory from the i.i.d. setting to martingales opening the way for its application to importance weighted sampling, reinforcement learning, and other interactive learning domains, as well as many other domains in probability theory and statistics, where martingales are encountered. We also present a comparison inequality that bounds the expectation of a convex function of a martingale difference sequence shifted to the [0,1] interval by the expectation of the same function of independent Bernoulli variables. This inequality is applied to derive a tighter analog of Hoeffding-Azuma's inequality.

The majority of theoretical work in machine learning is done under the assumption of exchangeability: essentially, it is assumed that the examples are generated from the same probability distribution independently. This paper is concerned with the problem of testing the exchangeability assumption in the online mode: examples are observed one by one and the goal is to monitor online the strength of evidence against the hypothesis of exchangeability. We introduce the notion of exchangeability martingales, which are online procedures for detecting deviations from exchangeability; in essence, they are betting schemes that never risk bankruptcy and are fair under the hypothesis of exchangeability. Some specific exchangeability martingales are constructed using Transductive Confidence Machine.

In recent work Long and Servedio LS05short presented a martingale boosting'' algorithm that works by constructing a branching program over weak classifiers and has a simple analysis based on elementary properties of random walks. LS05short showed that this martingale booster can tolerate random classification noise when it is run with a noise-tolerant weak learner; however, a drawback of the algorithm is that it is not adaptive, i.e. it cannot effectively take advantage of variation in the quality of the weak classifiers it receives. In this paper we present a variant of the original martingale boosting algorithm and prove that it is adaptive. This adaptiveness is achieved by modifying the original algorithm so that the random walks that arise in its analysis have different step size depending on the quality of the weak learner at each stage. The new algorithm inherits the desirable properties of the original LS05short algorithm, such as random classification noise tolerance, and has several other advantages besides adaptiveness: it requires polynomially fewer calls to the weak learner than the original algorithm, and it can be used with confidence-rated weak hypotheses that output real values rather than Boolean predictions.

A standard assumption in machine learning has been the assumption that the data are generated in the IID fashion, i.e., independently from the same distribution. This assumption is also known as the assumption of randomness (see, e.g., [11, Section 7.1] and [27]). In this paper we are interested in testing this assumption. Conformal martingales are constructed on top of conventional machinelearning algorithms and have been used as a means of detecting deviations from randomness both in theoretical work (see, e.g., [27, Section 7.1], [4], [3]) and in practice (in the framework of the Microsoft Azure module on time series anomaly detection [28]). They provide an online measure of the amount of evidence found against the hypothesis of randomness and can be said to perform conformal change detection: if the assumption of randomness is satisfied, a fixed nonnegative conformal martingale with a positive initial value is not expected to increase its initial value manyfold; on the other hand, if the hypothesis of randomness is violated, a properly designed nonnegative conformal martingale with a positive initial value can be expected to increase its value substantially. Correspondingly, we have two desiderata for such a martingale S: - Validity is satisfied automatically: S is not expected to ever increase its initial value by much, under the hypothesis of randomness.

Rakhlin, Alexander, Sridharan, Karthik

We study an equivalence of (i) deterministic pathwise statements appearing in the online learning literature (termed \emph{regret bounds}), (ii) high-probability tail bounds for the supremum of a collection of martingales (of a specific form arising from uniform laws of large numbers for martingales), and (iii) in-expectation bounds for the supremum. By virtue of the equivalence, we prove exponential tail bounds for norms of Banach space valued martingales via deterministic regret bounds for the online mirror descent algorithm with an adaptive step size. We extend these results beyond the linear structure of the Banach space: we define a notion of \emph{martingale type} for general classes of real-valued functions and show its equivalence (up to a logarithmic factor) to various sequential complexities of the class (in particular, the sequential Rademacher complexity and its offset version). For classes with the general martingale type 2, we exhibit a finer notion of variation that allows partial adaptation to the function indexing the martingale. Our proof technique rests on sequential symmetrization and on certifying the \emph{existence} of regret minimization strategies for certain online prediction problems.