Not enough data to create a plot.
Try a different view from the menu above.
Sugiyama, Masashi
Lanczos Approximations for the Speedup of Kernel Partial Least Squares Regression
Kraemer, Nicole, Sugiyama, Masashi, Braun, Mikio
The runtime for Kernel Partial Least Squares (KPLS) to compute the fit is quadratic in the number of examples. However, the necessity of obtaining sensitivity measures as degrees of freedom for model selection or confidence intervals for more detailed analysis requires cubic runtime, and thus constitutes a computational bottleneck in real-world data analysis. We propose a novel algorithm for KPLS which not only computes (a) the fit, but also (b) its approximate degrees of freedom and (c) error bars in quadratic runtime. The algorithm exploits a close connection between Kernel PLS and the Lanczos algorithm for approximating the eigenvalues of symmetric matrices, and uses this approximation to compute the trace of powers of the kernel matrix in quadratic runtime.
Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation
Sugiyama, Masashi, Nakajima, Shinichi, Kashima, Hisashi, Buenau, Paul V., Kawanabe, Motoaki
When training and test samples follow different input distributions (i.e., the situation called \emph{covariate shift}), the maximum likelihood estimator is known to lose its consistency. For regaining consistency, the log-likelihood terms need to be weighted according to the \emph{importance} (i.e., the ratio of test and training input densities). Thus, accurately estimating the importance is one of the key tasks in covariate shift adaptation. A naive approach is to first estimate training and test input densities and then estimate the importance by the ratio of the density estimates. However, since density estimation is a hard problem, this approach tends to perform poorly especially in high dimensional cases. In this paper, we propose a direct importance estimation method that does not require the input density estimates. Our method is equipped with a natural model selection procedure so tuning parameters such as the kernel width can be objectively optimized. This is an advantage over a recently developed method of direct importance estimation. Simulations illustrate the usefulness of our approach.
Multi-Task Learning via Conic Programming
Kato, Tsuyoshi, Kashima, Hisashi, Sugiyama, Masashi, Asai, Kiyoshi
When we have several related tasks, solving them simultaneously is shown to be more effective than solving them individually. This approach is called multi-task learning (MTL) and has been studied extensively. Existing approaches to MTL often treat all the tasks as \emph{uniformly related to each other and the relatedness of the tasks is controlled globally. For this reason, the existing methods can lead to undesired solutions when some tasks are not highly related to each other, and some pairs of related tasks can have significantly different solutions. In this paper, we propose a novel MTL algorithm that can overcome these problems. Our method makes use of a task network, which describes the relation structure among tasks. This allows us to deal with intricate relation structures in a systematic way. Furthermore, we control the relatedness of the tasks locally, so all pairs of related tasks are guaranteed to have similar solutions. We apply the above idea to support vector machines (SVMs) and show that the optimization problem can be cast as a second order cone program, which is convex and can be solved efficiently. The usefulness of our approach is demonstrated through simulations with protein super-family classification and ordinal regression problems.
Mixture Regression for Covariate Shift
Sugiyama, Masashi, Storkey, Amos J.
In supervised learning there is a typical presumption that the training and test points are taken from the same distribution. In practice this assumption is commonly violated.The situations where the training and test data are from different distributions is called covariate shift. Recent work has examined techniques for dealing with covariate shift in terms of minimisation of generalisation error. As yet the literature lacks a Bayesian generative perspective on this problem. This paper tackles this issue for regression models. Recent work on covariate shift can be understood in terms of mixture regression. Using this view, we obtain a general approach to regression under covariate shift, which reproduces previous work as a special case. The main advantages of this new formulation over previous models forcovariate shift are that we no longer need to presume the test and training densities are known, the regression and density estimation are combined into a single procedure, and previous methods are reproduced as special cases of this procedure, shedding light on the implicit assumptions the methods are making.
Active Learning for Misspecified Models
Sugiyama, Masashi
Active learning is the problem in supervised learning to design the locations oftraining input points so that the generalization error is minimized. Existing active learning methods often assume that the model used for learning is correctly specified, i.e., the learning target function can be expressed bythe model at hand. In many practical situations, however, this assumption may not be fulfilled. In this paper, we first show that the existing activelearning method can be theoretically justified under slightly weaker condition: the model does not have to be correctly specified, but slightly misspecified models are also allowed. However, it turns out that the weakened condition is still restrictive in practice.
Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
Blanchard, Gilles, Sugiyama, Masashi, Kawanabe, Motoaki, Spokoiny, Vladimir, Müller, Klaus-Robert
We propose a new linear method for dimension reduction to identify non-Gaussian components in high dimensional data. Our method, NGCA (non-Gaussian component analysis), uses a very general semi-parametric framework. In contrast to existing projection methods we define what is uninteresting (Gaussian): by projecting out uninterestingness, we can estimate therelevant non-Gaussian subspace. We show that the estimation error of finding the non-Gaussian components tends to zero at a parametric rate.Once NGCA components are identified and extracted, various tasks can be applied in the data analysis process, like data visualization, clustering, denoising or classification. A numerical study demonstrates the usefulness of our method.
Training Data Selection for Optimal Generalization in Trigonometric Polynomial Networks
Sugiyama, Masashi, Ogawa, Hidemitsu
In this paper, we consider the problem of active learning in trigonometric polynomialnetworks and give a necessary and sufficient condition of sample points to provide the optimal generalization capability. By analyzing thecondition from the functional analytic point of view, we clarify the mechanism of achieving the optimal generalization capability. We also show that a set of training examples satisfying the condition does not only provide the optimal generalization but also reduces the computational complexityand memory required for the calculation of learning results. Finally, examples of sample points satisfying the condition are given and computer simulations are performed to demonstrate the effectiveness ofthe proposed active learning method. 1 Introduction Supervised learning is obtaining an underlying rule from training examples, and can be formulated as a function approximation problem. If sample points are actively designed, then learning can be performed more efficiently. In this paper, we discuss the problem of designing sample points, referred to as active learning, for optimal generalization. Active learning is classified into two categories depending on the optimality. One is global optimal, where a set of all training examples is optimal (e.g.
Training Data Selection for Optimal Generalization in Trigonometric Polynomial Networks
Sugiyama, Masashi, Ogawa, Hidemitsu
In this paper, we consider the problem of active learning in trigonometric polynomial networks and give a necessary and sufficient condition of sample points to provide the optimal generalization capability. By analyzing the condition from the functional analytic point of view, we clarify the mechanism of achieving the optimal generalization capability. We also show that a set of training examples satisfying the condition does not only provide the optimal generalization but also reduces the computational complexity and memory required for the calculation of learning results. Finally, examples of sample points satisfying the condition are given and computer simulations are performed to demonstrate the effectiveness of the proposed active learning method.
Training Data Selection for Optimal Generalization in Trigonometric Polynomial Networks
Sugiyama, Masashi, Ogawa, Hidemitsu
In this paper, we consider the problem of active learning in trigonometric polynomial networks and give a necessary and sufficient condition of sample points to provide the optimal generalization capability. By analyzing the condition from the functional analytic point of view, we clarify the mechanism of achieving the optimal generalization capability. We also show that a set of training examples satisfying the condition does not only provide the optimal generalization but also reduces the computational complexity and memory required for the calculation of learning results. Finally, examples of sample points satisfying the condition are given and computer simulations are performed to demonstrate the effectiveness of the proposed active learning method.