Goto

Collaborating Authors

 Representation Of Examples


Graph Analysis and Graph Pooling in the Spatial Domain

arXiv.org Machine Learning

The spatial convolution layer which is widely used in the Graph Neural Networks (GNNs) aggregates the feature vector of each node with the feature vectors of its neighboring nodes. The GNN is not aware of the locations of the nodes in the global structure of the graph and when the local structures corresponding to different nodes are similar to each other, the convolution layer maps all those nodes to similar or same feature vectors in the continuous feature space. Therefore, the GNN cannot distinguish two graphs if their difference is not in their local structures. In addition, when the nodes are not labeled/attributed the convolution layers can fail to distinguish even different local structures. In this paper, we propose an effective solution to address this problem of the GNNs. The proposed approach leverages a spatial representation of the graph which makes the neural network aware of the differences between the nodes and also their locations in the graph. The spatial representation which is equivalent to a point-cloud representation of the graph is obtained by a graph embedding method. Using the proposed approach, the local feature extractor of the GNN distinguishes similar local structures in different locations of the graph and the GNN infers the topological structure of the graph from the spatial distribution of the locally extracted feature vectors. Moreover, the spatial representation is utilized to simplify the graph down-sampling problem. A new graph pooling method is proposed and it is shown that the proposed pooling method achieves competitive or better results in comparison with the state-of-the-art methods.


Manifold Fitting in Ambient Space

arXiv.org Machine Learning

Modern data sets in many applications no longer comprise samples of real vectors in a real vector space but samples of much more complex structures which may be represented as points in a space with certain underlying geometric structure, namely a manifold. Manifold learning is an emerging field for learning the underlying structure. The study of manifold learning can be split into two main branches, namely dimension reduction and manifold fitting. With the aim of interacting statistics and geometry, we tackle the problem of manifold fitting in the ambient space. Inspired by the relation between the eigenvalues of the Laplace-Beltrami operator and the geometry of a manifold, we aim to find a small set of points that preserve the geometry of the underlying manifold. Based on this relationship, we extend the idea of subsampling to noisy datasets in high dimensional space and utilize the Moving Least Squares (MLS) approach to approximate the underlying manifold. We analyze the two core steps in our proposed method theoretically and also provide the bounds for the MLS approach. Our simulation results and real data analysis demonstrate the superiority of our method in estimating the underlying manifold from noisy data.


Deep Metric Learning using Similarities from Nonlinear Rank Approximations

arXiv.org Machine Learning

--In recent years, deep metric learning has achieved promising results in learning high dimensional semantic feature embeddings where the spatial relationships of the feature vectors match the visual similarities of the images. Similarity search for images is performed by determining the vectors with the smallest distances to a query vector . However, high retrieval quality does not depend on the actual distances of the feature vectors, but rather on the ranking order of the feature vectors from similar images. In this paper, we introduce a metric learning algorithm that focuses on identifying and modifying those feature vectors that most strongly affect the retrieval quality. We compute normalized approximated ranks and convert them to similarities by applying a nonlinear transfer function. These similarities are used in a newly proposed loss function that better contracts similar and disperses dissimilar samples. Experiments demonstrate significant improvement over existing deep feature embedding methods on the CUB-200-2011, Cars196, and Stanford Online Products data sets for all embedding sizes.


A General Framework for Implicit and Explicit Debiasing of Distributional Word Vector Spaces

arXiv.org Artificial Intelligence

Distributional word vectors have recently been shown to encode many of the human biases, most notably gender and racial biases, and models for attenuating such biases have consequently been proposed. However, existing models and studies (1) operate on under-specified and mutually differing bias definitions, (2) are tailored for a particular bias (e.g., gender bias) and (3) have been evaluated inconsistently and non-rigorously. In this work, we introduce a general framework for debiasing word embeddings. We operationalize the definition of a bias by discerning two types of bias specification: explicit and implicit. We then propose three debiasing models that operate on explicit or implicit bias specifications, and that can be composed towards more robust debiasing. Finally, we devise a full-fledged evaluation framework in which we couple existing bias metrics with newly proposed ones. Experimental findings across three embedding methods suggest that the proposed debiasing models are robust and widely applicable: they often completely remove the bias both implicitly and explicitly, without degradation of semantic information encoded in any of the input distributional spaces. Moreover, we successfully transfer debiasing models, by means of crosslingual embedding spaces, and remove or attenuate biases in distributional word vector spaces of languages that lack readily available bias specifications.


Estimation of Personalized Heterogeneous Treatment Effects Using Concatenation and Augmentation of Feature Vectors

arXiv.org Machine Learning

A new meta-algorithm for estimating the conditional average treatment effects is proposed in the paper. The main idea underlying the algorithm is to consider a new dataset consisting of feature vectors produced by means of concatenation of examples from control and treatment groups, which are close to each other. Outcomes of new data are defined as the difference between outcomes of the corresponding examples comprising new feature vectors. The second idea is based on the assumption that the number of controls is rather large and the control outcome function is precisely determined. This assumption allows us to augment treatments by generating feature vectors which are closed to available treatments. The outcome regression function constructed on the augmented set of concatenated feature vectors can be viewed as an estimator of the conditional average treatment effects. A simple modification of the Co-learner based on the random subspace method or the feature bagging is also proposed. Various numerical simulation experiments illustrate the proposed algorithm and show its outperformance in comparison with the well-known T-learner and X-learner for several types of the control and treatment outcome functions. Keywords: treatment effect, meta-learner, regression, treatment, control, simulation 1 Introduction One the most important problems in medicine is to choose the most appropriate treatment for a certain patient which may differ from other patients in her/his clinical or other characteristics [25]. With the increase of the amount of data and with the developing the electronic health record concept in medicine, there is a growing interest to apply machine learning methods to solve the problem of the most appropriate treatment by estimating treatment effects directly from observational data. The main peculiarity of observational data is that it contains past actions, their outcomes, but without direct access to the mechanism which gave rise to the action. Shalit at al. [34] give a clear example of observational data, when we have patient characteristics, medications (action), and outcomes, 1 arXiv:1909.03894v1


Nonparametric Contextual Bandits in an Unknown Metric Space

arXiv.org Machine Learning

Consider a nonparametric contextual multi-arm bandit problem where each arm $a \in [K]$ is associated to a nonparametric reward function $f_a: [0,1] \to \mathbb{R}$ mapping from contexts to the expected reward. Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. finite types or smooth with respect to an unknown metric space. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm's dependence on the local geometry of the reward functions.


Soccer Team Vectors

arXiv.org Artificial Intelligence

In this work we present STEVE - Soccer TEam VEctors, a principled approach for learning real valued vectors for soccer teams where similar teams are close to each other in the resulting vector space. STEVE only relies on freely available information about the matches teams played in the past. These vectors can serve as input to various machine learning tasks. Evaluating on the task of team market value estimation, STEVE outperforms all its competitors. Moreover, we use STEVE for similarity search and to rank soccer teams.


Orometric Methods in Bounded Metric Data

arXiv.org Artificial Intelligence

A large amount of data accommodated in knowledge graphs (KG) is actually metric. For example, the Wikidata KG contains a plenitude of metric facts about geographic entities like cities, chemical compounds or celestial objects. In this paper, we propose a novel approach that transfers orometric (topographic) measures to bounded metric spaces. While these methods were originally designed to identify relevant mountain peaks on the surface of the earth, we demonstrate a notion to use them for metric data sets in general. Notably, metric sets of items inclosed in knowledge graphs. Based on this we present a method for identifying outstanding items using the transferred valuations functions 'isolation' and 'prominence'. Building up on this we imagine an item recommendation process. To demonstrate the relevance of the novel valuations for such processes we use item sets from the Wikidata knowledge graph. We then evaluate the usefulness of 'isolation' and 'prominence' empirically in a supervised machine learning setting. In particular, we find structurally relevant items in the geographic population distributions of Germany and France.


A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

#artificialintelligence

This paper introduces a la carte embed-ding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transfor-mation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in con-text to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.


Encoding high-cardinality string categorical variables

arXiv.org Machine Learning

Statistical models usually require vector representations of categorical variables, using for instance one-hot encoding. This strategy breaks down when the number of categories grows, as it creates high-dimensional feature vectors. Additionally, for string entries, one-hot encoding does not capture information in their representation.Here, we seek low-dimensional encoding of high-cardinality string categorical variables. Ideally, these should be: scalable to many categories; interpretable to end users; and facilitate statistical analysis. We introduce two encoding approaches for string categories: Gamma-Poisson matrix factorization on substring counts, and the min-hash encoder, for fast approximation of string similarities. We show that min-hash turns set inclusions into inequality relations that are easier to learn. Both approaches are scalable and streamable. Experiments on real and simulated data show that these methods improve supervised learning with high-cardinality categorical variables. We recommend the following: if scalability is central, the min-hash encoder is the best option as it does not require any data fit; if interpretability is important, the Gamma-Poisson factorization is the best alternative, as it can be interpreted as one-hot encoding on inferred categories with informative feature names. Both models enable autoML on the original string entries as they remove the need for feature engineering or data cleaning.