Regression
Linear Hinge Loss and Average Margin
Gentile, Claudio, Warmuth, Manfred K.
We describe a unifying method for proving relative loss bounds for online linearthreshold classification algorithms, such as the Perceptron and the Winnow algorithms. For classification problems the discrete loss is used, i.e., the total number of prediction mistakes. We introduce a continuous lossfunction, called the "linear hinge loss", that can be employed to derive the updates of the algorithms. We first prove bounds w.r.t. the linear hinge loss and then convert them to the discrete loss. We introduce anotion of "average margin" of a set of examples . We show how relative loss bounds based on the linear hinge loss can be converted to relative loss bounds i.t.o. the discrete loss using the average margin.
Finite-Dimensional Approximation of Gaussian Processes
Ferrari-Trecate, Giancarlo, Williams, Christopher K. I., Opper, Manfred
Gaussian process (GP) prediction suffers from O(n3) scaling with the data set size n. By using a finite-dimensional basis to approximate the GP predictor, the computational complexity can be reduced. We derive optimalfinite-dimensional predictors under a number of assumptions, andshow the superiority of these predictors over the Projected Bayes Regression method (which is asymptotically optimal). We also show how to calculate the minimal model size for a given n. The calculations are backed up by numerical experiments.
Regression with Input-dependent Noise: A Gaussian Process Treatment
Goldberg, Paul W., Williams, Christopher K. I., Bishop, Christopher M.
Gaussian processes provide natural nonparametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Competitive On-line Linear Regression
We apply a general algorithm for merging prediction strategies (the Aggregating Algorithm) to the problem of linear regression with the square loss; our main assumption is that the response variable is bounded. It turns out that for this particular problem the Aggregating Algorithm resembles, but is slightly different from, the wellknown ridge estimation procedure. From general results about the Aggregating Algorithm we deduce a guaranteed bound on the difference between our algorithm's performance and the best, in some sense, linear regression function's performance. We show that the AA attains the optimal constant in our bound, whereas the constant attained by the ridge regression procedure in general can be 4 times worse. 1 INTRODUCTION The usual approach to regression problems is to assume that the data are generated by some stochastic mechanism and make some, typically very restrictive, assumptions about that stochastic mechanism. In recent years, however, a different approach to this kind of problems was developed (see, e.g., DeSantis et al. [2], Littlestone and Warmuth [7]): in our context, that approach sets the goal of finding an online algorithm that performs not much worse than the best regression function found off-line; in other words, it replaces the usual statistical analyses by the competitive analysis of online algorithms. DeSantis et al. [2] performed a competitive analysis of the Bayesian merging scheme for the log-loss prediction game; later Littlestone and Warmuth [7] and Vovk [10] introduced an online algorithm (called the Weighted Majority Algorithm by the Competitive Online Linear Regression 365 former authors) for the simple binary prediction game. These two algorithms (the Bayesian merging scheme and the Weighted Majority Algorithm) are special cases of the Aggregating Algorithm (AA) proposed in [9, 11]. The AA is a member of a wide family of algorithms called "multiplicative weight" or "exponential weight" algorithms. Closer to the topic of this paper, Cesa-Bianchi et al. [1) performed a competitive analysis, under the square loss, of the standard Gradient Descent Algorithm and Kivinen and Warmuth [6] complemented it by a competitive analysis of a modification of the Gradient Descent, which they call the Exponentiated Gradient Algorithm.
Regression with Input-dependent Noise: A Gaussian Process Treatment
Goldberg, Paul W., Williams, Christopher K. I., Bishop, Christopher M.
Gaussian processes provide natural nonparametric prior distributions over regression functions. In this paper we consider regression problems where there is noise on the output, and the variance of the noise depends on the inputs. If we assume that the noise is a smooth function of the inputs, then it is natural to model the noise variance using a second Gaussian process, in addition to the Gaussian process governing the noise-free output value. We show that prior uncertainty about the parameters controlling both processes can be handled and that the posterior distribution of the noise rate can be sampled from using Markov chain Monte Carlo methods. Our results on a synthetic data set give a posterior noise variance that well-approximates the true variance.
Competitive On-line Linear Regression
We apply a general algorithm for merging prediction strategies (the Aggregating Algorithm) to the problem of linear regression with the square loss; our main assumption is that the response variable is bounded. It turns out that for this particular problem the Aggregating Algorithm resembles, but is slightly different from, the wellknown ridge estimation procedure. From general results about the Aggregating Algorithm we deduce a guaranteed bound on the difference between our algorithm's performance and the best, in some sense, linear regression function's performance. We show that the AA attains the optimal constant in our bound, whereas the constant attained by the ridge regression procedure in general can be 4 times worse. 1 INTRODUCTION The usual approach to regression problems is to assume that the data are generated by some stochastic mechanism and make some, typically very restrictive, assumptions about that stochastic mechanism. In recent years, however, a different approach to this kind of problems was developed (see, e.g., DeSantis et al. [2], Littlestone and Warmuth [7]): in our context, that approach sets the goal of finding an online algorithm that performs not much worse than the best regression function found off-line; in other words, it replaces the usual statistical analyses by the competitive analysis of online algorithms. DeSantis et al. [2] performed a competitive analysis of the Bayesian merging scheme for the log-loss prediction game; later Littlestone and Warmuth [7] and Vovk [10] introduced an online algorithm (called the Weighted Majority Algorithm by the Competitive Online Linear Regression 365 former authors) for the simple binary prediction game. These two algorithms (the Bayesian merging scheme and the Weighted Majority Algorithm) are special cases of the Aggregating Algorithm (AA) proposed in [9, 11]. The AA is a member of a wide family of algorithms called "multiplicative weight" or "exponential weight" algorithms. Closer to the topic of this paper, Cesa-Bianchi et al. [1) performed a competitive analysis, under the square loss, of the standard Gradient Descent Algorithm and Kivinen and Warmuth [6] complemented it by a competitive analysis of a modification of the Gradient Descent, which they call the Exponentiated Gradient Algorithm.
Competitive On-line Linear Regression
We apply a general algorithm for merging prediction strategies (the Aggregating Algorithm) to the problem of linear regression with the square loss; our main assumption is that the response variable is bounded. It turns out that for this particular problem the Aggregating Algorithmresembles, but is slightly different from, the wellknown ridgeestimation procedure. From general results about the Aggregating Algorithm we deduce a guaranteed bound on the difference betweenour algorithm's performance and the best, in some sense, linear regression function's performance. We show that the AA attains the optimal constant in our bound, whereas the constant attainedby the ridge regression procedure in general can be 4 times worse. 1 INTRODUCTION The usual approach to regression problems is to assume that the data are generated bysome stochastic mechanism and make some, typically very restrictive, assumptions about that stochastic mechanism. In recent years, however, a different approach to this kind of problems was developed (see, e.g., DeSantis et al. [2], Littlestone andWarmuth [7]): in our context, that approach sets the goal of finding an online algorithm that performs not much worse than the best regression function foundoff-line; in other words, it replaces the usual statistical analyses by the competitive analysis of online algorithms. DeSantis et al. [2] performed a competitive analysis of the Bayesian merging scheme for the log-loss prediction game; later Littlestone and Warmuth [7] and Vovk [10] introduced an online algorithm (called the Weighted Majority Algorithm by the Competitive Online Linear Regression 365 former authors) for the simple binary prediction game. These two algorithms (the Bayesian merging scheme and the Weighted Majority Algorithm) are special cases of the Aggregating Algorithm (AA) proposed in [9, 11]. The AA is a member of a wide family of algorithms called "multiplicative weight" or "exponential weight" algorithms. Closerto the topic of this paper, Cesa-Bianchi et al. [1) performed a competitive analysis, under the square loss, of the standard Gradient Descent Algorithm and Kivinen and Warmuth [6] complemented it by a competitive analysis of a modification ofthe Gradient Descent, which they call the Exponentiated Gradient Algorithm.