Goto

Collaborating Authors

 dissimilarity representation


One-Class LP Classifiers for Dissimilarity Representations

Neural Information Processing Systems

Problems in which abnormal or novel situations should be detected can be approached by describing the domain of the class of typical exam- ples. These applications come from the areas of machine diagnostics, fault detection, illness identification or, in principle, refer to any prob- lem where little knowledge is available outside the typical class. In this paper we explain why proximities are natural representations for domain descriptors and we propose a simple one-class classifier for dissimilarity representations. By the use of linear programming an efficient one-class description can be found, based on a small number of prototype objects. This classifier can be made (1) more robust by transforming the dissimi- larities and (2) cheaper to compute by using a reduced representation set.


Random Forest for Dissimilarity-based Multi-view Learning

arXiv.org Machine Learning

Many classification problems are naturally multi-view in the sense their data are described through multiple heterogeneous descriptions. For such tasks, dissimilarity strategies are effective ways to make the different descriptions comparable and to easily merge them, by (i) building intermediate dissimilarity representations for each view and (ii) fusing these representations by averaging the dissimilarities over the views. In this work, we show that the Random Forest proximity measure can be used to build the dissimilarity representations, since this measure reflects similarities between features but also class membership. We then propose a Dynamic View Selection method to better combine the view-specific dissimilarity representations. This allows to take a decision, on each instance to predict, with only the most relevant views for that instance. Experiments are conducted on several real-world multi-view datasets, and show that the Dynamic View Selection offers a significant improvement in performance compared to the simple average combination and two state-of-the-art static view combinations.


A Novel Random Forest Dissimilarity Measure for Multi-View Learning

arXiv.org Machine Learning

Multi-view learning is a learning task in which data is described by several concurrent representations. Its main challenge is most often to exploit the complementarities between these representations to help solve a classification/regression task. This is a challenge that can be met nowadays if there is a large amount of data available for learning. However, this is not necessarily true for all real-world problems, where data are sometimes scarce (e.g. problems related to the medical environment). In these situations, an effective strategy is to use intermediate representations based on the dissimilarities between instances. This work presents new ways of constructing these dissimilarity representations, learning them from data with Random Forest classifiers. More precisely, two methods are proposed, which modify the Random Forest proximity measure, to adapt it to the context of High Dimension Low Sample Size (HDLSS) multi-view classification problems. The second method, based on an Instance Hardness measurement, is significantly more accurate than other state-of-the-art measurements including the original RF Proximity measurement and the Large Margin Nearest Neighbor (LMNN) metric learning measurement.


Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings

arXiv.org Machine Learning

Mapping complex input data into suitable lower dimensional manifolds is a common procedure in machine learning. This step is beneficial mainly for two reasons: (1) it reduces the data dimensionality and (2) it provides a new data representation possibly characterised by convenient geometric properties. Euclidean spaces are by far the most widely used embedding spaces, thanks to their well-understood structure and large availability of consolidated inference methods. However, recent research demonstrated that many types of complex data (e.g., those represented as graphs) are actually better described by non-Euclidean geometries. Here, we investigate how embedding graphs on constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts on the ability to detect changes in sequences of attributed graphs. The proposed methodology consists in embedding graphs into a geometric space and perform change detection there by means of conventional methods for numerical streams. The curvature of the space is a parameter that we learn to reproduce the geometry of the original application-dependent graph space. Preliminary experimental results show the potential capability of representing graphs by means of curved manifold, in particular for change and anomaly detection problems.


The Approximation of the Dissimilarity Projection

arXiv.org Machine Learning

Diffusion magnetic resonance imaging (dMRI) data allow to reconstruct the 3D pathways of axons within the white matter of the brain as a tractography. The analysis of tractographies has drawn attention from the machine learning and pattern recognition communities providing novel challenges such as finding an appropriate representation space for the data. Many of the current learning algorithms require the input to be from a vectorial space. This requirement contrasts with the intrinsic nature of the tractography because its basic elements, called streamlines or tracks, have different lengths and different number of points and for this reason they cannot be directly represented in a common vectorial space. In this work we propose the adoption of the dissimilarity representation which is an Euclidean embedding technique defined by selecting a set of streamlines called prototypes and then mapping any new streamline to the vector of distances from prototypes. We investigate the degree of approximation of this projection under different prototype selection policies and prototype set sizes in order to characterise its use on tractography data. Additionally we propose the use of a scalable approximation of the most effective prototype selection policy that provides fast and accurate dissimilarity approximations of complete tractographies.


Entropic one-class classifiers

arXiv.org Machine Learning

The one-class classification problem is a well-known research endeavor in pattern recognition. The problem is also known under different names, such as outlier and novelty/anomaly detection. The core of the problem consists in modeling and recognizing patterns belonging only to a so-called target class. All other patterns are termed non-target, and therefore they should be recognized as such. In this paper, we propose a novel one-class classification system that is based on an interplay of different techniques. Primarily, we follow a dissimilarity representation based approach; we embed the input data into the dissimilarity space by means of an appropriate parametric dissimilarity measure. This step allows us to process virtually any type of data. The dissimilarity vectors are then represented through a weighted Euclidean graphs, which we use to (i) determine the entropy of the data distribution in the dissimilarity space, and at the same time (ii) derive effective decision regions that are modeled as clusters of vertices. Since the dissimilarity measure for the input data is parametric, we optimize its parameters by means of a global optimization scheme, which considers both mesoscopic and structural characteristics of the data represented through the graphs. The proposed one-class classifier is designed to provide both hard (Boolean) and soft decisions about the recognition of test patterns, allowing an accurate description of the classification process. We evaluate the performance of the system on different benchmarking datasets, containing either feature-based or structured patterns. Experimental results demonstrate the effectiveness of the proposed technique.


Analysis of Multibeam SONAR Data using Dissimilarity Representations

arXiv.org Machine Learning

This paper considers the problem of low-dimensional visualisation of very high dimensional information sources for the purpose of situation awareness in the maritime environment. In response to the requirement for human decision support aids to reduce information overload (and specifically, data amenable to inter-point relative similarity measures) appropriate to the below-water maritime domain, we are investigating a preliminary prototype topographic visualisation model. The focus of the current paper is on the mathematical problem of exploiting a relative dissimilarity representation of signals in a visual informatics mapping model, driven by real-world sonar systems. An independent source model is used to analyse the sonar beams from which a simple probabilistic input model to represent uncertainty is mapped to a latent visualisation space where data uncertainty can be accommodated. The use of euclidean and non-euclidean measures are used and the motivation for future use of non-euclidean measures is made. Concepts are illustrated using a simulated 64 beam weak SNR dataset with realistic sonar targets.


Dissimilarity-based Ensembles for Multiple Instance Learning

arXiv.org Machine Learning

In multiple instance learning, objects are sets (bags) of feature vectors (instances) rather than individual feature vectors. In this paper we address the problem of how these bags can best be represented. Two standard approaches are to use (dis)similarities between bags and prototype bags, or between bags and prototype instances. The first approach results in a relatively low-dimensional representation determined by the number of training bags, while the second approach results in a relatively high-dimensional representation, determined by the total number of instances in the training set. In this paper a third, intermediate approach is proposed, which links the two approaches and combines their strengths. Our classifier is inspired by a random subspace ensemble, and considers subspaces of the dissimilarity space, defined by subsets of instances, as prototypes. We provide guidelines for using such an ensemble, and show state-of-the-art performances on a range of multiple instance learning problems.


One-Class LP Classifiers for Dissimilarity Representations

Neural Information Processing Systems

Problems in which abnormal or novel situations should be detected can be approached by describing the domain of the class of typical examples. These applications come from the areas of machine diagnostics, fault detection, illness identification or, in principle, refer to any problem where little knowledge is available outside the typical class. In this paper we explain why proximities are natural representations for domain descriptors and we propose a simple one-class classifier for dissimilarity representations. By the use of linear programming an efficient one-class description can be found, based on a small number of prototype objects. This classifier can be made (1) more robust by transforming the dissimilarities and (2) cheaper to compute by using a reduced representation set. Finally, a comparison to a comparable one-class classifier by Campbell and Bennett is given.


One-Class LP Classifiers for Dissimilarity Representations

Neural Information Processing Systems

Problems in which abnormal or novel situations should be detected can be approached by describing the domain of the class of typical examples. Theseapplications come from the areas of machine diagnostics, fault detection, illness identification or, in principle, refer to any problem wherelittle knowledge is available outside the typical class. In this paper we explain why proximities are natural representations for domain descriptors and we propose a simple one-class classifier for dissimilarity representations. By the use of linear programming an efficient one-class description can be found, based on a small number of prototype objects. This classifier can be made (1) more robust by transforming the dissimilarities and(2) cheaper to compute by using a reduced representation set. Finally, a comparison to a comparable one-class classifier by Campbell and Bennett is given.