Goto

Collaborating Authors

 Representation Of Examples


An Exploration of State-of-the-art Methods for Offensive Language Detection

arXiv.org Machine Learning

We provide a comprehensive investigation of different custom and off-the-shelf architectures as well as different approaches to generating feature vectors for offensive language detection. We also show that these approaches work well on small and noisy datasets such as on the Offensive Language Identification Dataset (OLID), so it should be possible to use them for other applications.


Sentiment Analysis on IMDB Movie Comments and Twitter Data by Machine Learning and Vector Space Techniques

arXiv.org Machine Learning

This study's goal is to create a model of sentiment analysis on a 2000 rows IMDB movie comments and 3200 Twitter data by using machine learning and vector space techniques; positive or negative preliminary information about the text is to provide. In the study, a vector space was created in the KNIME Analytics platform, and a classification study was performed on this vector space by Decision Trees, Na\"ive Bayes and Support Vector Machines classification algorithms. The conclusions obtained were compared in terms of each algorithms. The classification results for IMDB movie comments are obtained as 94,00%, 73,20%, and 85,50% by Decision Tree, Naive Bayes and SVM algorithms. The classification results for Twitter data set are presented as 82,76%, 75,44% and 72,50% by Decision Tree, Naive Bayes SVM algorithms as well. It is seen that the best classification results presented in both data sets are which calculated by SVM algorithm.


Sentence Similarity in Python using Doc2Vec – Kanoki

#artificialintelligence

Numeric representation of Text documents is challenging task in machine learning and there are different ways there to create the numerical features for texts such as vector representation using Bag of Words, Tf-IDF etc.I am not going in detail what are the advantages of one over the other or which is the best one to use in which case. There are lot of good reads available to explain this. It's a Model to create the word embeddings, where it takes input as a large corpus of text and produces a vector space typically of several hundred dimesions. The underlying assumption of Word2Vec is that two words sharing similar contexts also share a similar meaning and consequently a similar vector representation from the model. For instance: "Bank", "money" and "accounts" are often used in similar situations, with similar surrounding words like "dollar", "loan" or "credit", and according to Word2Vec they will therefore share a similar vector representation.


Approximating Continuous Functions on Persistence Diagrams Using Template Functions

arXiv.org Machine Learning

The persistence diagram is an increasingly useful tool arising from the field of Topological Data Analysis. However, using these diagrams in conjunction with machine learning techniques requires some mathematical finesse. The most success to date has come from finding methods for turning persistence diagrams into vectors in $\mathbb{R}^n$ in a way which preserves as much of the space of persistence diagrams as possible, commonly referred to as featurization. In this paper, we describe a mathematical framework for featurizing the persistence diagram space using template functions. These functions are general as they are only required to be continuous, have a compact support, and separate points. We discuss two example realizations of these functions: tent functions and Chybeyshev interpolating polynomials. Both of these functions are defined on a grid superposed on the birth-lifetime plane. We then combine the resulting features with machine learning algorithms to perform supervised classification and regression on several example data sets, including manifold data, shape data, and an embedded time series from a Rossler system. Our results show that the template function approach yields high accuracy rates that match and often exceed the results of existing methods for featurizing persistence diagrams. One counter-intuitive observation is that in most cases using interpolating polynomials, where each point contributes globally to the feature vector, yields significantly better results than using tent functions, where the contribution of each point is localized to its grid cell. Along the way, we also provide a complete characterization of compact sets in persistence diagram space endowed with the bottleneck distance.


Classification with unknown class conditional label noise on non-compact feature spaces

arXiv.org Machine Learning

We investigate the problem of classification in the presence of unknown class conditional label noise in which the labels observed by the learner have been corrupted with some unknown class dependent probability. In order to obtain finite sample rates, previous approaches to classification with unknown class conditional label noise have required that the regression function attains its extrema uniformly on sets of positive measure. We shall consider this problem in the setting of non-compact metric spaces, where the regression function need not attain its extrema. In this setting we determine the minimax optimal learning rates (up to logarithmic factors). The rate displays interesting threshold behaviour: When the regression function approaches its extrema at a sufficient rate, the optimal learning rates are of the same order as those obtained in the label-noise free setting. If the regression function approaches its extrema more gradually then classification performance necessarily degrades. In addition, we present an algorithm which attains these rates without prior knowledge of either the distributional parameters or the local density. This identifies for the first time a scenario in which finite sample rates are achievable in the label noise setting, but they differ from the optimal rates without label noise.


Hyperbolic Disk Embeddings for Directed Acyclic Graphs

arXiv.org Machine Learning

Obtaining continuous representations of structural data such as directed acyclic graphs (DAGs) has gained attention in machine learning and artificial intelligence. However, embedding complex DAGs in which both ancestors and descendants of nodes are exponentially increasing is difficult. Tackling in this problem, we develop Disk Embeddings, which is a framework for embedding DAGs into quasi-metric spaces. Existing state-of-the-art methods, Order Embeddings and Hyperbolic Entailment Cones, are instances of Disk Embedding in Euclidean space and spheres respectively. Furthermore, we propose a novel method Hyperbolic Disk Embeddings to handle exponential growth of relations. The results of our experiments show that our Disk Embedding models outperform existing methods especially in complex DAGs other than trees.


When Collaborative Filtering Meets Reinforcement Learning

arXiv.org Machine Learning

In this paper, we study a multi-step interactive recommendation problem, where the item recommended at current step may affect the quality of future recommendations. To address the problem, we develop a novel and effective approach, named CFRL, which seamlessly integrates the ideas of both collaborative filtering (CF) and reinforcement learning (RL). More specifically, we first model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP). Further, to achieve collaborative recommendations for the entire user community, we propose a novel CF-based MDP by encoding the states of all users into a shared latent vector space. Finally, we propose an effective Q-network learning method to learn the agent's optimal policy based on the CF-based MDP. The capability of CFRL is demonstrated by comparing its performance against a variety of existing methods on real-world datasets.


An Exact Reformulation of Feature-Vector-based Radial-Basis-Function Networks for Graph-based Observations

arXiv.org Machine Learning

Radial-basis-function networks are traditionally defined for sets of vector-based observations. In this short paper, we reformulate such networks so that they can be applied to adjacency-matrix representations of weighted, directed graphs that represent the relationships between object pairs. We re-state the sum-of-squares objective function so that it is purely dependent on entries from the adjacency matrix. From this objective function, we derive a gradient descent update for the network weights. We also derive a gradient update that simulates the repositioning of the radial basis prototypes and changes in the radial basis prototype parameters. An important property of our radial basis function networks is that they are guaranteed to yield the same responses as conventional radial-basis networks trained on a corresponding vector realization of the relationships encoded by the adjacency-matrix. Such a vector realization only needs to provably exist for this property to hold, which occurs whenever the relationships correspond to distances from some arbitrary metric applied to a latent set of vectors. We therefore completely avoid needing to actually construct vectorial realizations via multi-dimensional scaling, which ensures that the underlying relationships are totally preserved.


The statistical Minkowski distances: Closed-form formula for Gaussian Mixture Models

arXiv.org Machine Learning

The traditional Minkowski distances are induced by the corresponding Minkowski norms in real-valued vector spaces. In this work, we propose novel statistical symmetric distances based on the Minkowski's inequality for probability densities belonging to Lebesgue spaces. These statistical Minkowski distances admit closed-form formula for Gaussian mixture models when parameterized by integer exponents. This result extends to arbitrary mixtures of exponential families with natural parameter spaces being cones: This includes the binomial, the multinomial, the zero-centered Laplacian, the Gaussian and the Wishart mixtures, among others. We also derive a Minkowski's diversity index of a normalized weighted set of probability distributions from Minkowski's inequality.


Semi-supervised Learning with Graphs: Covariance Based Superpixels For Hyperspectral Image Classification

arXiv.org Machine Learning

ABSTRACT In this paper, we present a graph-based semi-supervised framework for hyperspectral image classification. We first introduce a novel superpixel algorithm based on the spectral covariance matrix representation of pixels to provide a better representation of our data. We then construct a superpixel graph, based on carefully considered feature vectors, before performing classification. We demonstrate, through a set of experimental results using two benchmarking datasets, that our approach outperforms three state-of-the-art classification frameworks, especially when a extremely small amount of labelled data is used. Index Terms-- Hyperspectral Imaging, Superpixels, Covariance, Graphs,Semi-Supervised Learning, Classification 1. INTRODUCTION Hyperspectral image (HSI) classification is an active area of research and poses unique challenges.