Karasuyama, Masayuki, Takeuchi, Ichiro

We propose a multiple incremental decremental algorithm of Support Vector Machine (SVM). Conventional single cremental decremental SVM can update the trained model efficiently when single data point is added to or removed from the training set. When we add and/or remove multiple data points, this algorithm is time-consuming because we need to repeatedly apply it to each data point. The roposed algorithm is computationally more efficient when multiple data points are added and/or removed simultaneously. The single incremental decremental algorithm is built on an optimization technique called parametric programming.

Karasuyama, Masayuki, Mamitsuka, Hiroshi

Label propagation is one of the state-of-the-art methods for semi-supervised learning, which estimates labels by propagating label information through a graph. Label propagation assumes that data points (nodes) connected in a graph should have similar labels. Consequently, the label estimation heavily depends on edge weights in a graph which represent similarity of each node pair. We propose a method for a graph to capture the manifold structure of input features using edge weights parameterized by a similarity function. In this approach, edge weights represent both similarity and local reconstruction weight simultaneously, both being reasonable for label propagation.

Shibagaki, Atsushi, Suzuki, Yoshiki, Karasuyama, Masayuki, Takeuchi, Ichiro

Careful tuning of a regularization parameter is indispensable in many machine learning tasks because it has a significant impact on generalization performances.Nevertheless, current practice of regularization parameter tuning is more of an art than a science, e.g., it is hard to tell how many grid-points would be needed in cross-validation (CV) for obtaining a solution with sufficiently small CV error.In this paper we propose a novel framework for computing a lower bound of the CV errors as a function of the regularization parameter, which we call regularization path of CV error lower bounds.The proposed framework can be used for providing a theoretical approximation guarantee on a set of solutions in the sense that how far the CV error of the current best solution could be away from best possible CV error in the entire range of the regularization parameters.We demonstrate through numerical experiments that a theoretically guaranteed a choice of regularization parameter in the above sense is possible with reasonable computational costs. Papers published at the Neural Information Processing Systems Conference.

Yoshida, Tomoki, Takeuchi, Ichiro, Karasuyama, Masayuki

Graphs are versatile tools for representing structured data. Therefore, a variety of machine learning methods have been studied for graph data analysis. Although many of those learning methods depend on the measurement of differences between input graphs, defining an appropriate distance metric for a graph remains a controversial issue. Hence, we propose a supervised distance metric learning method for the graph classification problem. Our method, named interpretable graph metric learning (IGML), learns discriminative metrics in a subgraph-based feature space, which has a strong graph representation capability. By introducing a sparsity-inducing penalty on a weight of each subgraph, IGML can identify a small number of important subgraphs that can provide insight about the given classification task. Since our formulation has a large number of optimization variables, an efficient algorithm is also proposed by using pruning techniques based on safe screening and working set selection methods. An important property of IGML is that the optimality of the solution is guaranteed because the problem is formulated as a convex problem and our pruning strategies only discard unnecessary subgraphs. Further, we show that IGML is also applicable to other structured data such as item-set and sequence data, and that it can incorporate vertex-label similarity by using a transportation-based subgraph feature. We empirically evaluate the computational efficiency and classification performance on several benchmark datasets and show some illustrative examples demonstrating that IGML identifies important subgraphs from a given graph dataset.

Inatsu, Yu, Karasuyama, Masayuki, Inoue, Keiichi, Takeuchi, Ichiro

As part of a quality control process in manufacturing it is often necessary to test whether all parts of a product satisfy a required property, with as few inspections as possible. When multiple inspection apparatuses with different costs and precision exist, it is desirable that testing can be carried out cost-effectively by properly controlling the trade-off between the costs and the precision. In this paper, we formulate this as a level set estimation (LSE) problem under cost-dependent input uncertainty - LSE being a type of active learning for estimating the level set, i.e., the subset of the input space in which an unknown function value is greater or smaller than a pre-determined threshold. Then, we propose a new algorithm for LSE under cost-dependent input uncertainty with theoretical convergence guarantee. We demonstrate the effectiveness of the proposed algorithm by applying it to synthetic and real datasets.

Suzuki, Shinya, Takeno, Shion, Tamura, Tomoyuki, Shitara, Kazuki, Karasuyama, Masayuki

We propose Pareto-frontier entropy search (PFES) for multi-objective Bayesian optimization (MBO). Unlike the existing entropy search for MBO which considers the entropy of the input space, we define the entropy of Pareto-frontier in the output space. By using a sampled Pareto-frontier from the current model, PFES provides a simple formula for directly evaluating the entropy. Besides the usual MBO setting, in which all the objectives are simultaneously observed, we also consider the "decoupled" setting, in which the objective functions can be observed separately. PFES can easily derive an acquisition function for the decoupled setting through the entropy of the marginal density for each output variable. For the both settings, by conditioning on the sampled Pareto-frontier, dependence among different objectives arises in the entropy evaluation. PFES can incorporate this dependency into the acquisition function, while the existing information-based MBO employs an independent Gaussian approximation. Our numerical experiments show effectiveness of PFES through synthetic functions and real-world datasets from materials science.

Duy, Vo Nguyen Le, Sakuma, Takuto, Ishiyama, Taiju, Toda, Hiroki, Nishi, Kazuya, Karasuyama, Masayuki, Okubo, Yuta, Sunaga, Masayuki, Tabei, Yasuo, Takeuchi, Ichiro

We study the problem of discriminative sub-trajectory mining. Given two groups of trajectories, the goal of this problem is to extract moving patterns in the form of sub-trajectories which are more similar to sub-trajectories of one group and less similar to those of the other. We propose a new method called Statistically Discriminative Sub-trajectory Mining (SDSM) for this problem. An advantage of the SDSM method is that the statistical significance of the extracted sub-trajectories are properly controlled in the sense that the probability of finding a false positive sub-trajectory is smaller than a specified significance threshold alpha (e.g., 0.05), which is indispensable when the method is used in scientific or social studies under noisy environment. Finding such statistically discriminative sub-trajectories from massive trajectory dataset is both computationally and statistically challenging. In the SDSM method, we resolve the difficulties by introducing a tree representation among sub-trajectories and running an efficient permutation-based statistical inference method on the tree. To the best of our knowledge, SDSM is the first method that can efficiently extract statistically discriminative sub-trajectories from massive trajectory dataset. We illustrate the effectiveness and scalability of the SDSM method by applying it to a real-world dataset with 1,000,000 trajectories which contains 16,723,602,505 sub-trajectories.

Takeno, Shion, Fukuoka, Hitoshi, Tsukada, Yuhki, Koyama, Toshiyuki, Shiga, Motoki, Takeuchi, Ichiro, Karasuyama, Masayuki

Bayesian optimization (BO) is an effective tool for black-box optimization in which objective function evaluation is usually quite expensive. In practice, lower fidelity approximations of the objective function are often available. Recently, multi-fidelity Bayesian optimization (MFBO) has attracted considerable attention because it can dramatically accelerate the optimization process by using those cheaper observations. We propose a novel information theoretic approach to MFBO. Information-based approaches are popular and empirically successful in BO, but existing studies for information-based MFBO are plagued by difficulty for accurately estimating the information gain. Our approach is based on a variant of information-based BO called max-value entropy search (MES), which greatly facilitates evaluation of the information gain in MFBO. In fact, computations of our acquisition function is written analytically except for one dimensional integral and sampling, which can be calculated efficiently and accurately. We demonstrate effectiveness of our approach by using synthetic and benchmark datasets, and further we show a real-world application to materials science data.

Yoshida, Tomoki, Takeuchi, Ichiro, Karasuyama, Masayuki

We study safe screening for metric learning. Distance metric learning can optimize a metric over a set of triplets, each one of which is defined by a pair of same class instances and an instance in a different class. However, the number of possible triplets is quite huge even for a small dataset. Our safe triplet screening identifies triplets which can be safely removed from the optimization problem without losing the optimality. Compared with existing safe screening studies, triplet screening is particularly significant because of (1) the huge number of possible triplets, and (2) the semi-definite constraint in the optimization. We derive several variants of screening rules, and analyze their relationships. Numerical experiments on benchmark datasets demonstrate the effectiveness of safe triplet screening.

Nakagawa, Kazuya, Suzumura, Shinya, Karasuyama, Masayuki, Tsuda, Koji, Takeuchi, Ichiro

In this paper we study predictive pattern mining problems where the goal is to construct a predictive model based on a subset of predictive patterns in the database. Our main contribution is to introduce a novel method called safe pattern pruning (SPP) for a class of predictive pattern mining problems. The SPP method allows us to efficiently find a superset of all the predictive patterns in the database that are needed for the optimal predictive model. The advantage of the SPP method over existing boosting-type method is that the former can find the superset by a single search over the database, while the latter requires multiple searches. The SPP method is inspired by recent development of safe feature screening. In order to extend the idea of safe feature screening into predictive pattern mining, we derive a novel pruning rule called safe pattern pruning (SPP) rule that can be used for searching over the tree defined among patterns in the database. The SPP rule has a property that, if a node corresponding to a pattern in the database is pruned out by the SPP rule, then it is guaranteed that all the patterns corresponding to its descendant nodes are never needed for the optimal predictive model. We apply the SPP method to graph mining and item-set mining problems, and demonstrate its computational advantage.