Collaborating Authors

Representation Of Examples

More is not Always Better: The Negative Impact of A-box Materialization on RDF2vec Knowledge Graph Embeddings Artificial Intelligence

RDF2vec is an embedding technique for representing knowledge graph entities in a continuous vector space. In this paper, we investigate the effect of materializing implicit A-box axioms induced by subproperties, as well as symmetric and transitive properties. While it might be a reasonable assumption that such a materialization before computing embeddings might lead to better embeddings, we conduct a set of experiments on DBpedia which demonstrate that the materialization actually has a negative effect on the performance of RDF2vec. In our analysis, we argue that despite the huge body of work devoted on completing missing information in knowledge graphs, such missing implicit information is actually a signal, not a defect, and we show examples illustrating that assumption.

A learning problem whose consistency is equivalent to the non-existence of real-valued measurable cardinals Machine Learning

We show that the $k$-nearest neighbour learning rule is universally consistent in a metric space $X$ if and only if it is universally consistent in every separable subspace of $X$ and the density of $X$ is less than every real-measurable cardinal. In particular, the $k$-NN classifier is universally consistent in every metric space whose separable subspaces are sigma-finite dimensional in the sense of Nagata and Preiss if and only if there are no real-valued measurable cardinals. The latter assumption is relatively consistent with ZFC, however the consistency of the existence of such cardinals cannot be proved within ZFC. Our results were inspired by an example sketched by C\'erou and Guyader in 2006 at an intuitive level of rigour.

Feature extraction and similar image search with OpenCV for newbies


Image features For this task, first of all, we need to understand what is an Image Feature and how we can use it. Image feature is a simple image pattern, based on which we can describe what we see on the image. For example cat eye will be a feature on a image of a cat. The main role of features in computer vision(and not only) is to transform visual information into the vector space. Ok, but how to get this features from the image?

Towards Exploiting Implicit Human Feedback for Improving RDF2vec Embeddings Artificial Intelligence

RDF2vec is a technique for creating vector space embeddings from an RDF knowledge graph, i.e., representing each entity in the graph as a vector. It first creates sequences of nodes by performing random walks on the graph. In a second step, those sequences are processed by the word2vec algorithm for creating the actual embeddings. In this paper, we explore the use of external edge weights for guiding the random walks. As edge weights, transition probabilities between pages in Wikipedia are used as a proxy for the human feedback for the importance of an edge. We show that in some scenarios, RDF2vec utilizing those transition probabilities can outperform both RDF2vec based on random walks as well as the usage of graph internal edge weights.

Locality Preserving Loss to Align Vector Spaces Machine Learning

We present a locality preserving loss (LPL)that improves the alignment between vector space representations (i.e., word or sentence embeddings) while separating (increasing distance between) uncorrelated representations as compared to the standard method that minimizes the mean squared error (MSE) only. The locality preserving loss optimizes the projection by maintaining the local neighborhood of embeddings that are found in the source, in the target domain as well. This reduces the overall size of the dataset required to the train model. We argue that vector space alignment (with MSE and LPL losses) acts as a regularizer in certain language-based classification tasks, leading to better accuracy than the base-line, especially when the size of the training set is small. We validate the effectiveness ofLPL on a cross-lingual word alignment task, a natural language inference task, and a multi-lingual inference task.

Nonparametric Contextual Bandits in Metric Spaces with Unknown Metric

Neural Information Processing Systems

Suppose that there is a large set of arms, yet there is a simple but unknown structure amongst the arm reward functions, e.g. We present a novel algorithm which learns data-driven similarities amongst the arms, in order to implement adaptive partitioning of the context-arm space for more efficient learning. We provide regret bounds along with simulations that highlight the algorithm's dependence on the local geometry of the reward functions. Papers published at the Neural Information Processing Systems Conference.

Bipartite Link Prediction based on Topological Features via 2-hop Path Machine Learning

A variety of real-world systems can be modeled as bipartite networks. One of the most powerful and simple link prediction methods is Linear-Graph Autoencoder(LGAE) which has promising performance on challenging tasks such as link prediction and node clustering. LGAE relies on simple linear model w.r.t. the adjacency matrix of the graph to learn vector space representations of nodes. In this paper, we consider the case of bipartite link predictions where node attributes are unavailable. When using LGAE, we propose to multiply the reconstructed adjacency matrix with a symmetrically normalized training adjacency matrix. As a result, 2-hop paths are formed which we use as the predicted adjacency matrix to evaluate the performance of our model. Experimental results on both synthetic and real-world dataset show our approach consistently outperforms Graph Autoencoder and Linear Graph Autoencoder model in 10 out of 12 bipartite dataset and reaches competitive performances in 2 other bipartite dataset.

Beyond Vector Spaces: Compact Data Representation as Differentiable Weighted Graphs

Neural Information Processing Systems

Learning useful representations is a key ingredient to the success of modern machine learning. Currently, representation learning mostly relies on embedding data into Euclidean space. However, recent work has shown that data in some domains is better modeled by non-euclidean metric spaces, and inappropriate geometry can result in inferior performance. In this paper, we aim to eliminate the inductive bias imposed by the embedding space geometry. Namely, we propose to map data into more general non-vector metric spaces: a weighted graph with a shortest path distance.

Zooming for Efficient Model-Free Reinforcement Learning in Metric Spaces Machine Learning

Despite the wealth of research into provably efficient reinforcement learning algorithms, most works focus on tabular representation and thus struggle to handle exponentially or infinitely large state-action spaces. In this paper, we consider episodic reinforcement learning with a continuous state-action space which is assumed to be equipped with a natural metric that characterizes the proximity between different states and actions. We propose ZoomRL, an online algorithm that leverages ideas from continuous bandits to learn an adaptive discretization of the joint space by zooming in more promising and frequently visited regions while carefully balancing the exploitation-exploration trade-off. We show that ZoomRL achieves a worst-case regret $\tilde{O}(H^{\frac{5}{2}} K^{\frac{d+1}{d+2}})$ where $H$ is the planning horizon, $K$ is the number of episodes and $d$ is the covering dimension of the space with respect to the metric. Moreover, our algorithm enjoys improved metric-dependent guarantees that reflect the geometry of the underlying space. Finally, we show that our algorithm is robust to small misspecification errors.

Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach Artificial Intelligence

We design a new technique for the distributional semantic modeling with a neural network-based approach to learn distributed term representations (or term embeddings) - term vector space models as a result, inspired by the recent ontology-related approach (using different types of contextual knowledge such as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to the identification of terms (term extraction) and relations between them (relation extraction) called semantic pre-processing technology - SPT. Our method relies on automatic term extraction from the natural language texts and subsequent formation of the problem-oriented or application-oriented (also deeply annotated) text corpora where the fundamental entity is the term (includes non-compositional and compositional terms). This gives us an opportunity to changeover from distributed word representations (or word embeddings) to distributed term representations (or term embeddings). This transition will allow to generate more accurate semantic maps of different subject domains (also, of relations between input terms - it is useful to explore clusters and oppositions, or to test your hypotheses about them). The semantic map can be represented as a graph using Vec2graph - a Python library for visualizing word embeddings (term embeddings in our case) as dynamic and interactive graphs. The Vec2graph library coupled with term embeddings will not only improve accuracy in solving standard NLP tasks, but also update the conventional concept of automated ontology development. The main practical result of our work is the development kit (set of toolkits represented as web service APIs and web application), which provides all necessary routines for the basic linguistic pre-processing and the semantic pre-processing of the natural language texts in Ukrainian for future training of term vector space models.