Goto

Collaborating Authors

 Representation Of Examples


On Approximate Reasoning Capabilities of Low-Rank Vector Spaces

AAAI Conferences

In relational databases, relations between objects, represented by binary matrices or tensors, may be arbitrarily complex. In practice however, there are recurring relational patterns such as transitive, permutation and sequential relationships, that seem to have a regular structure not captured by the classical notion of matrix rank or tensor rank. In this paper, we show that factorizing the relational tensor using a logistic or hinge loss instead of the more standard squared loss is more appropriate because it can accurately model many common relations with a fixed-size embedding that depends sub-linearly on the number of entities in the knowledge base. We illustrate this fact empirically by being able to efficiently predict missing links in several synthetic and real-world experiments. Further, we provide theoretical justification for logistic loss by studying its connection to a complexity measure from the field of information complexity called the sign rank. Sign rank is a more appropriate complexity measure as it has a low value for transitive, permutation, or sequential relationships, while being large for uniformly sampled binary matrices/tensors with a high probability.


AffectiveSpace 2: Enabling Affective Intuition for Concept-Level Sentiment Analysis

AAAI Conferences

Predicting the affective valence of unknown multi-word expressions is key for concept-level sentiment analysis. AffectiveSpace 2 is a vector space model, built by means of random projection, that allows for reasoning by analogy on natural language con- cepts. By reducing the dimensionality of affec- tive common-sense knowledge, the model allows semantic features associated with concepts to be generalized and, hence, allows concepts to be intu- itively clustered according to their semantic and affective relatedness. Such an affective intuition (so called because it does not rely on explicit fea- tures, but rather on implicit analogies) enables the inference of emotions and polarity conveyed by multi-word expressions, thus achieving efficient concept-level sentiment analysis.


Analysis of the Limitations of an Experience Metric Space when Used in a Mobile Domestic Robot

AAAI Conferences

This paper introduces the concept and use of an Interaction History Architecture for use on a mobile domestic robot and analyses the limitations of this configuration. The interaction history architecture builds upon Shannon information theory and has been previously used in a humanoid robot to learn basic children’s games. Previous work has shown that experience spaces can be highly flexible when used for learning. In this paper we outline and experiment designed to test the abilities of the architecture and how it can be used with classic clicker style training to teach domestic robots simple tasks. It then presents results from an experiment exploring these capabilities as well as the limitation found therein.


On Classification with Bags, Groups and Sets

arXiv.org Machine Learning

In recent years, the field of pattern recognition has seen many problems that are difficult to formulate as regular supervised classification problems where (feature vector, label) pairs are available to train a classifier that, in turn, can predict labels for previously unseen feature vectors. A subset of these problems contains learning scenarios where (part of) the objects are represented by sets or bags of feature vectors or instances. Such learning scenarios include multiple instance learning [11], set classification [42], group-based classification [47] and many others. In this paper we review these learning scenarios. There are several reasons why a bag representation might be chosen in a pattern recognition problem. The first reason is that a single feature vector is often too restrictive to describe an object. For example, in drug activity prediction, we are interested in classifying molecules as having the desired effect (active) or not. However, a molecule is not just a list of its elements: most molecules can fold into different shapes or conformations, which can influence the activity of that molecule.


A New Space for Comparing Graphs

arXiv.org Machine Learning

Finding a new mathematical representations for graph, which allows direct comparison between different graph structures, is an open-ended research direction. Having such a representation is the first prerequisite for a variety of machine learning algorithms like classification, clustering, etc., over graph datasets. In this paper, we propose a symmetric positive semidefinite matrix with the $(i,j)$-{th} entry equal to the covariance between normalized vectors $A^ie$ and $A^je$ ($e$ being vector of all ones) as a representation for graph with adjacency matrix $A$. We show that the proposed matrix representation encodes the spectrum of the underlying adjacency matrix and it also contains information about the counts of small sub-structures present in the graph such as triangles and small paths. In addition, we show that this matrix is a \emph{"graph invariant"}. All these properties make the proposed matrix a suitable object for representing graphs. The representation, being a covariance matrix in a fixed dimensional metric space, gives a mathematical embedding for graphs. This naturally leads to a measure of similarity on graph objects. We define similarity between two given graphs as a Bhattacharya similarity measure between their corresponding covariance matrix representations. As shown in our experimental study on the task of social network classification, such a similarity measure outperforms other widely used state-of-the-art methodologies. Our proposed method is also computationally efficient. The computation of both the matrix representation and the similarity value can be performed in operations linear in the number of edges. This makes our method scalable in practice. We believe our theoretical and empirical results provide evidence for studying truncated power iterations, of the adjacency matrix, to characterize social networks.


A Survey on Metric Learning for Feature Vectors and Structured Data

arXiv.org Machine Learning

The need for appropriate ways to measure the distance or similarity between data is ubiquitous in machine learning, pattern recognition and data mining, but handcrafting such good metrics for specific problems is generally difficult. This has led to the emergence of metric learning, which aims at automatically learning a metric from data and has attracted a lot of interest in machine learning and related fields for the past ten years. This survey paper proposes a systematic review of the metric learning literature, highlighting the pros and cons of each approach. We pay particular attention to Mahalanobis distance metric learning, a well-studied and successful framework, but additionally present a wide range of methods that have recently emerged as powerful alternatives, including nonlinear metric learning, similarity learning and local metric learning. Recent trends and extensions, such as semi-supervised metric learning, metric learning for histogram data and the derivation of generalization guarantees, are also covered. Finally, this survey addresses metric learning for structured data, in particular edit distance learning, and attempts to give an overview of the remaining challenges in metric learning for the years to come.


Dissimilarity-based Ensembles for Multiple Instance Learning

arXiv.org Machine Learning

In multiple instance learning, objects are sets (bags) of feature vectors (instances) rather than individual feature vectors. In this paper we address the problem of how these bags can best be represented. Two standard approaches are to use (dis)similarities between bags and prototype bags, or between bags and prototype instances. The first approach results in a relatively low-dimensional representation determined by the number of training bags, while the second approach results in a relatively high-dimensional representation, determined by the total number of instances in the training set. In this paper a third, intermediate approach is proposed, which links the two approaches and combines their strengths. Our classifier is inspired by a random subspace ensemble, and considers subspaces of the dissimilarity space, defined by subsets of instances, as prototypes. We provide guidelines for using such an ensemble, and show state-of-the-art performances on a range of multiple instance learning problems.


Learning to Prune in Metric and Non-Metric Spaces

Neural Information Processing Systems

Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-to prune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality.


Feature vector regularization in machine learning

arXiv.org Machine Learning

Problems in machine learning (ML) can involve noisy input data, and ML classification methods have reached limiting accuracies when based on standard ML data sets consisting of feature vectors and their classes. Greater accuracy will require incorporation of prior structural information on data into learning. We study methods to regularize feature vectors (unsupervised regularization methods), analogous to supervised regularization for estimating functions in ML. We study regularization (denoising) of ML feature vectors using Tikhonov and other regularization methods for functions on ${\bf R}^n$. A feature vector ${\bf x}=(x_1,\ldots,x_n)=\{x_q\}_{q=1}^n$ is viewed as a function of its index $q$, and smoothed using prior information on its structure. This can involve a penalty functional on feature vectors analogous to those in statistical learning, or use of proximity (e.g. graph) structure on the set of indices. Such feature vector regularization inherits a property from function denoising on ${\bf R}^n$, in that accuracy is non-monotonic in the denoising (regularization) parameter $\alpha$. Under some assumptions about the noise level and the data structure, we show that the best reconstruction accuracy also occurs at a finite positive $\alpha$ in index spaces with graph structures. We adapt two standard function denoising methods used on ${\bf R}^n$, local averaging and kernel regression. In general the index space can be any discrete set with a notion of proximity, e.g. a metric space, a subset of ${\bf R}^n$, or a graph/network, with feature vectors as functions with some notion of continuity. We show this improves feature vector recovery, and thus the subsequent classification or regression done on them. We give an example in gene expression analysis for cancer classification with the genome as an index space and network structure based protein-protein interactions.


Supervised Metric Learning with Generalization Guarantees

arXiv.org Machine Learning

The crucial importance of metrics in machine learning algorithms has led to an increasing interest in optimizing distance and similarity functions, an area of research known as metric learning. When data consist of feature vectors, a large body of work has focused on learning a Mahalanobis distance. Less work has been devoted to metric learning from structured objects (such as strings or trees), most of it focusing on optimizing a notion of edit distance. We identify two important limitations of current metric learning approaches. First, they allow to improve the performance of local algorithms such as k-nearest neighbors, but metric learning for global algorithms (such as linear classifiers) has not been studied so far. Second, the question of the generalization ability of metric learning methods has been largely ignored. In this thesis, we propose theoretical and algorithmic contributions that address these limitations. Our first contribution is the derivation of a new kernel function built from learned edit probabilities. Our second contribution is a novel framework for learning string and tree edit similarities inspired by the recent theory of (e,g,t)-good similarity functions. Using uniform stability arguments, we establish theoretical guarantees for the learned similarity that give a bound on the generalization error of a linear classifier built from that similarity. In our third contribution, we extend these ideas to metric learning from feature vectors by proposing a bilinear similarity learning method that efficiently optimizes the (e,g,t)-goodness. Generalization guarantees are derived for our approach, highlighting that our method minimizes a tighter bound on the generalization error of the classifier. Our last contribution is a framework for establishing generalization bounds for a large class of existing metric learning algorithms based on a notion of algorithmic robustness.