Supervised Learning
Batch kernel SOM and related Laplacian methods for social network analysis
Boulet, Romain, Jouve, Bertrand, Rossi, Fabrice, Villa, Nathalie
Institut de Mathรฉmatiques, Universitรฉ de Toulouse et CNRS (UMR 5219), 118 route de Narbonne, 31062 Toulouse cedex 9, France Abstract Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts. This work was partially supported by ANR Project "Graph-Comp". Preprint submitted to Neurocomputing 19 March 2018 1 Introduction Complex networks are large graphs with a non trivial organization. They arise naturally in numerous context [7], such as, to name a few, the World Wide Web (which gives a perfect example of how large and complex such a network may grow), metabolic pathways, citation networks between scientific articles or more general social networks that model interaction between individuals and/or organizations, etc. Complex networks share common properties that have allowed the emergence of mathematical descriptions such as small world graphs or power law graphs. The structure of these graphs often gives some keys to understand the complex network underlined. To study such a structure, one often begins with a metrology process applied to the graph that describes the degree distribution, the number of components, the density, etc. However, it should be noted that dealing with very large graphs (millions of vertices) is still an open question (see [9] for an example of an efficient algorithm to explore that kind of data sets). Several ways have been explored to cluster the vertices of the graph into communities [43] and some of them have in common the use of the Laplacian matrix. Indeed, there are important relationships between the spectrum of the Laplacian and the graph invariants that characterize its structure (see, e.g. These properties can be used for building, from the eigen-decomposition of the Laplacian, a similarity measure or a metric space such that the induced dissimilarities between vertices of the graph are related to its community structure (see [13], among others).
Multi-Instance Multi-Label Learning with Application to Scene Classification
Zhang, Zhi-Li, Zhang, Min-ling
In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning.
Boosting Structured Prediction for Imitation Learning
Bagnell, J. A., Chestnutt, Joel, Bradley, David M., Ratliff, Nathan D.
The Maximum Margin Planning (MMP) (Ratliff et al., 2006) algorithm solves imitation learning problems by learning linear mappings from features to cost functions in a planning domain. The learned policy is the result of minimum-cost planning using these cost functions. These mappings are chosen so that example policies (or trajectories) given by a teacher appear to be lower cost (with a lossscaled margin) than any other policy for a given planning domain.
Multi-Instance Multi-Label Learning with Application to Scene Classification
Zhang, Zhi-Li, Zhang, Min-ling
In this paper, we formalize multi-instance multi-label learning, where each training example is associated with not only multiple instances but also multiple class labels. Such a problem can occur in many real-world tasks, e.g. an image usually contains multiple patches each of which can be described by a feature vector, and the image can belong to multiple categories since its semantics can be recognized in different ways. We analyze the relationship between multi-instance multi-label learning and the learning frameworks of traditional supervised learning, multiinstance learning and multi-label learning.
Boosting Structured Prediction for Imitation Learning
Bagnell, J. A., Chestnutt, Joel, Bradley, David M., Ratliff, Nathan D.
The Maximum Margin Planning (MMP) (Ratliff et al., 2006) algorithm solves imitation learning problems by learning linear mappings from features to cost functions in a planning domain. The learned policy is the result of minimum-cost planning using these cost functions. These mappings are chosen so that example policies (or trajectories) given by a teacher appear to be lower cost (with a lossscaled margin) than any other policy for a given planning domain.
Boosting Structured Prediction for Imitation Learning
Bagnell, J. A., Chestnutt, Joel, Bradley, David M., Ratliff, Nathan D.
The Maximum Margin Planning (MMP) (Ratliff et al., 2006) algorithm solves imitation learning problems by learning linear mappings from features to cost functions in a planning domain. The learned policy is the result of minimum-cost planning using these cost functions. These mappings are chosen so that example policies (or trajectories) given by a teacher appear to be lower cost (with a lossscaled margin)than any other policy for a given planning domain. We provide a novel approach, MMPBOOST, based on the functional gradient descent view of boosting (Mason et al., 1999; Friedman, 1999a) that extends MMP by "boosting" in new features. This approach uses simple binary classification or regression to improve performance of MMP imitation learning, and naturally extends to the class of structured maximum margin prediction problems.
Structured Prediction via the Extragradient Method
Taskar, Ben, Lacoste-Julien, Simon, Jordan, Michael I.
We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.
From Lasso regression to Feature vector machine
Li, Fan, Yang, Yiming, Xing, Eric P.
Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with nonlinear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with nonlinear models by introducing kernels defined on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.
Structured Prediction via the Extragradient Method
Taskar, Ben, Lacoste-Julien, Simon, Jordan, Michael I.
We present a simple and scalable algorithm for large-margin estimation of structured models, including an important class of Markov networks and combinatorial models. We formulate the estimation problem as a convex-concave saddle-point problem and apply the extragradient method, yielding an algorithm with linear convergence using simple gradient and projection calculations. The projection step can be solved using combinatorial algorithms for min-cost quadratic flow. This makes the approach an efficient alternative to formulations based on reductions to a quadratic program (QP). We present experiments on two very different structured prediction tasks: 3D image segmentation and word alignment, illustrating the favorable scaling properties of our algorithm.
From Lasso regression to Feature vector machine
Li, Fan, Yang, Yiming, Xing, Eric P.
Lasso regression tends to assign zero weights to most irrelevant or redundant features, and hence is a promising technique for feature selection. Its limitation, however, is that it only offers solutions to linear models. Kernel machines with feature scaling techniques have been studied for feature selection with nonlinear models. However, such approaches require to solve hard non-convex optimization problems. This paper proposes a new approach named the Feature Vector Machine (FVM). It reformulates the standard Lasso regression into a form isomorphic to SVM, and this form can be easily extended for feature selection with nonlinear models by introducing kernels defined on feature vectors. FVM generates sparse solutions in the nonlinear feature space and it is much more tractable compared to feature scaling kernel machines. Our experiments with FVM on simulated data show encouraging results in identifying the small number of dominating features that are non-linearly correlated to the response, a task the standard Lasso fails to complete.