Representation Of Examples
A review on distance based time series classification
Abanda, Amaia, Mori, Usue, Lozano, Jose A.
Time series classification is an increasing research topic due to the vast amount of time series data that are being created over a wide variety of fields. The particularity of the data makes it a challenging task and different approaches have been taken, including the distance based approach. 1-NN has been a widely used method within distance based time series classification due to it simplicity but still good performance. However, its supremacy may be attributed to being able to use specific distances for time series within the classification process and not to the classifier itself. With the aim of exploiting these distances within more complex classifiers, new approaches have arisen in the past few years that are competitive or which outperform the 1-NN based approaches. In some cases, these new methods use the distance measure to transform the series into feature vectors, bridging the gap between time series and traditional classifiers. In other cases, the distances are employed to obtain a time series kernel and enable the use of kernel methods for time series classification. One of the main challenges is that a kernel function must be positive semi-definite, a matter that is also addressed within this review. The presented review includes a taxonomy of all those methods that aim to classify time series using a distance based approach, as well as a discussion of the strengths and weaknesses of each method.
Adversarial Auto-encoders for Speech Based Emotion Recognition
Sahu, Saurabh, Gupta, Rahul, Sivaraman, Ganesh, AbdAlmageed, Wael, Espy-Wilson, Carol
Recently, generative adversarial networks and adversarial autoencoders have gained a lot of attention in machine learning community due to their exceptional performance in tasks such as digit classification and face recognition. They map the autoencoder's bottleneck layer output (termed as code vectors) to different noise Probability Distribution Functions (PDFs), that can be further regularized to cluster based on class information. In addition, they also allow a generation of synthetic samples by sampling the code vectors from the mapped PDFs. Inspired by these properties, we investigate the application of adversarial autoencoders to the domain of emotion recognition. Specifically, we conduct experiments on the following two aspects: (i) their ability to encode high dimensional feature vector representations for emotional utterances into a compressed space (with a minimal loss of emotion class discriminability in the compressed space), and (ii) their ability to regenerate synthetic samples in the original feature space, to be later used for purposes such as training emotion recognition classifiers. We demonstrate the promise of adversarial autoencoders with regards to these aspects on the Interactive Emotional Dyadic Motion Capture (IEMOCAP) corpus and present our analysis.
Learning from Exemplars and Prototypes in Machine Learning and Psychology
Zubek, Julian, Kuncheva, Ludmila
This paper draws a parallel between similarity-based categorisation models developed in cognitive psychology and the nearest neighbour classifier (1-NN) in machine learning. Conceived as a result of the historical rivalry between prototype theories (abstraction) and exemplar theories (memorisation), recent models of human categorisation seek a compromise in-between. Regarding the stimuli (entities to be categorised) as points in a metric space, machine learning offers a large collection of methods to select a small, representative and discriminative point set. These methods are known under various names: instance selection, data editing, prototype selection, prototype generation or prototype replacement. The nearest neighbour classifier is used with the selected reference set. Such a set can be interpreted as a data-driven categorisation model. We juxtapose the models from the two fields to enable cross-referencing. We believe that both machine learning and cognitive psychology can draw inspiration from the comparison and enrich their repertoire of similarity-based models.
Querying Complex Networks in Vector Space
Hamilton, William L., Bajaj, Payal, Zitnik, Marinka, Jurafsky, Dan, Leskovec, Jure
Learning vector embeddings of complex networks is a powerful approach used to predict missing or unobserved edges in network data. However, an open challenge in this area is developing techniques that can reason about $\textit{subgraphs}$ in network data, which can involve the logical conjunction of several edge relationships. Here we introduce a framework to make predictions about conjunctive logical queries---i.e., subgraph relationships---on heterogeneous network data. In our approach, we embed network nodes in a low-dimensional space and represent logical operators as learned geometric operations (e.g., translation, rotation) in this embedding space. We prove that a small set of geometric operations are sufficient to represent conjunctive logical queries on a network, and we introduce a series of increasingly strong implementations of these operators. We demonstrate the utility of this framework in two application studies on networks with millions of edges: predicting unobserved subgraphs in a network of drug-gene-disease interactions and in a network of social interactions derived from a popular web forum. These experiments demonstrate how our framework can efficiently make logical predictions such as "what drugs are likely to target proteins involved with both diseases X and Y?" Together our results highlight how imposing logical structure can make network embeddings more useful for large-scale knowledge discovery.
Similarity encoding for learning with dirty categorical variables
Cerda, Patricio, Varoquaux, Gaël, Kégl, Balázs
For statistical learning, categorical variables in a table are usually considered as discrete entities and encoded separately to feature vectors, e.g., with one-hot encoding. "Dirty" non-curated data gives rise to categorical variables with a very high cardinality but redundancy: several categories reflect the same entity. In databases, this issue is typically solved with a deduplication step. We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains. We study a generalization of one-hot encoding, similarity encoding, that builds feature vectors from similarities across categories. We perform a thorough empirical validation on non-curated tables, a problem seldom studied in machine learning. Results on seven real-world datasets show that similarity encoding brings significant gains in prediction in comparison with known encoding methods for categories or strings, notably one-hot encoding and bag of character n-grams. We draw practical recommendations for encoding dirty categories: 3-gram similarity appears to be a good choice to capture morphological resemblance. For very high-cardinality, dimensionality reduction significantly reduces the computational cost with little loss in performance: random projections or choosing a subset of prototype categories still outperforms classic encoding approaches.
Persistence paths and signature features in topological data analysis
Chevyrev, Ilya, Nanda, Vidit, Oberhauser, Harald
We introduce a new feature map for barcodes that arise in persistent homology computation. The main idea is to first realize each barcode as a path in a convenient vector space, and to then compute its path signature which takes values in the tensor algebra of that vector space. The composition of these two operations - barcode to path, path to tensor series - results in a feature map that has several desirable properties for statistical learning, such as universality and characteristicness, and achieves state-of-the-art results on common classification benchmarks.
On representation power of neural network-based graph embedding and beyond
Okuno, Akifumi, Shimodaira, Hidetoshi
The representation power of similarity functions used in neural network-based graph embedding is considered. The inner product similarity (IPS) with feature vectors computed via neural networks is commonly used for representing the strength of association between two nodes. However, only a little work has been done on the representation capability of IPS. A very recent work shed light on the nature of IPS and reveals that IPS has the capability of approximating any positive definite (PD) similarities. However, a simple example demonstrates the fundamental limitation of IPS to approximate non-PD similarities. We then propose a novel model named Shifted IPS (SIPS) that approximates any Conditionally PD (CPD) similarities arbitrary well. CPD is a generalization of PD with many examples such as negative Poincare distance and negative Wasserstein distance, thus SIPS has a potential impact to significantly improve the applicability of graph embedding without taking great care in configuring the similarity function. Our numerical experiments demonstrate the SIPS's superiority over IPS. In theory, we further extend SIPS beyond CPD by considering the inner product in Minkowski space so that it approximates more general similarities.
Learning to Play General Video-Games via an Object Embedding Network
Deep reinforcement learning (DRL) has proven to be an effective tool for creating general video-game AI. However most current DRL video-game agents learn end-to-end from the video-output of the game, which is superfluous for many applications and creates a number of additional problems. More importantly, directly working on pixel-based raw video data is substantially distinct from what a human player does.In this paper, we present a novel method which enables DRL agents to learn directly from object information. This is obtained via use of an object embedding network (OEN) that compresses a set of object feature vectors of different lengths into a single fixed-length unified feature vector representing the current game-state and fulfills the DRL simultaneously. We evaluate our OEN-based DRL agent by comparing to several state-of-the-art approaches on a selection of games from the GVG-AI Competition. Experimental results suggest that our object-based DRL agent yields performance comparable to that of those approaches used in our comparative study.
Optimal Transport for structured data
Vayer, Titouan, Chapel, Laetitia, Flamary, Rémi, Tavenard, Romain, Courty, Nicolas
Rennes, CNRS, LETG F-35000 Rennes Optimal transport has recently gained a lot of interest in the machine learning community thanks to its ability to compare probability distributions while respecting the underlying space's geometry. Wasserstein distance deals with feature information through its metric or cost function, but fails in exploiting the structural information, i.e. the specific relations existing among the components of the distribution. Recently adapted to a machine learning context, the Gromov-Wasserstein distance defines a metric well suited for comparing distributions that live in different metric spaces by exploiting their inner structural information. In this paper we propose a new optimal transport distance, called the Fused Gromov-Wasserstein distance, capable of leveraging both structural and feature information by combining both views and prove its metric properties over very general manifolds. We also define the barycenter of structured objects as their Fréchet mean, leveraging both feature and structural information. We illustrate the versatility of the method for problems where structured objects are involved, computing barycenters in graph and time series contexts. We also use this new distance for graph classification where we obtain comparable or superior results than state-of-the-art graph kernel methods and end-to-end graph CNN approach.
BourGAN: Generative Networks with Metric Embeddings
Xiao, Chang, Zhong, Peilin, Zheng, Changxi
This paper addresses the mode collapse for generative adversarial networks (GANs). We view modes as a geometric structure of data distribution in a metric space. Under this geometric lens, we embed subsamples of the dataset from an arbitrary metric space into the l2 space, while preserving their pairwise distance distribution. Not only does this metric embedding determine the dimensionality of the latent space automatically, it also enables us to construct a mixture of Gaussians to draw latent space random vectors. We use the Gaussian mixture model in tandem with a simple augmentation of the objective function to train GANs. Every major step of our method is supported by theoretical analysis, and our experiments on real and synthetic data confirm that the generator is able to produce samples spreading over most of the modes while avoiding unwanted samples, outperforming several recent GAN variants on a number of metrics and offering new features.