Goto

Collaborating Authors

 Supervised Learning


Leveraging Knowledge Graph Embedding Techniques for Industry 4.0 Use Cases

arXiv.org Artificial Intelligence

Industry is evolving towards Industry 4.0, which holds the promise of increased flexibility in manufacturing, better quality and improved productivity. A core actor of this growth is using sensors, which must capture data that can used in unforeseen ways to achieve a performance not achievable without them. However, the complexity of this improved setting is much greater than what is currently used in practice. Hence, it is imperative that the management cannot only be performed by human labor force, but part of that will be done by automated algorithms instead. A natural way to represent the data generated by this large amount of sensors, which are not acting measuring independent variables, and the interaction of the different devices is by using a graph data model. Then, machine learning could be used to aid the Industry 4.0 system to, for example, perform predictive maintenance. However, machine learning directly on graphs, needs feature engineering and has scalability issues. In this paper we discuss methods to convert (embed) the graph in a vector space, such that it becomes feasible to use traditional machine learning methods for Industry 4.0 settings.


Making Classifier Chains Resilient to Class Imbalance

arXiv.org Machine Learning

Class imbalance is an intrinsic characteristic of multi-label data. Most of the labels in multi-label data sets are associated with a small number of training examples, much smaller compared to the size of the data set. Class imbalance poses a key challenge that plagues most multi-label learning methods. Ensemble of Classifier Chains (ECC), one of the most prominent multi-label learning methods, is no exception to this rule, as each of the binary models it builds is trained from all positive and negative examples of a label. To make ECC resilient to class imbalance, we first couple it with random undersampling. We then present two extensions of this basic approach, where we build a varying number of binary models per label and construct chains of different sizes, in order to improve the exploitation of majority examples with approximately the same computational budget. Experimental results on 16 multi-label datasets demonstrate the effectiveness of the proposed approaches in a variety of evaluation metrics.


One-Class Kernel Spectral Regression for Outlier Detection

arXiv.org Machine Learning

The paper introduces a new efficient nonlinear one-class classifier formulated as the Rayleigh quotient criterion. The method, operating in a reproducing kernel Hilbert subspace, minimises the scatter of target distribution along an optimal projection direction while at the same time keeping projections of positive observations as distant as possible from the mean of the negative class. We provide a graph embedding view of the problem which can then be solved efficiently using the spectral regression approach. In this sense, unlike previous similar methods which often require costly eigen-computations of dense matrices, the proposed approach casts the problem under consideration into a regression framework which avoids eigen-decomposition computations. In particular, it is shown that the dominant complexity of the proposed method is the complexity of computing the kernel matrix. Additional appealing characteristics of the proposed one-class classifier are: 1-the ability to be trained in an incremental fashion (allowing for application in streaming data scenarios while also reducing computational complexity in a non-streaming operation mode); 2-being unsupervised while also providing the functionality for refining the solution using negative training examples, in case available; And last but not least 3-the deployment of the kernel trick allowing for nonlinearly mapping the data into a high-dimensional feature space. Extensive experiments conducted on several datasets verify the merits of the proposed approach in comparison with some other alternatives.


Towards Non-Parametric Learning to Rank

arXiv.org Machine Learning

This paper studies a stylized, yet natural, learning-to-rank problem and points out the critical incorrectness of a widely used nearest neighbor algorithm. We consider a model with $n$ agents (users) $\{x_i\}_{i \in [n]}$ and $m$ alternatives (items) $\{y_j\}_{j \in [m]}$, each of which is associated with a latent feature vector. Agents rank items nondeterministically according to the Plackett-Luce model, where the higher the utility of an item to the agent, the more likely this item will be ranked high by the agent. Our goal is to find neighbors of an arbitrary agent or alternative in the latent space. We first show that the Kendall-tau distance based kNN produces incorrect results in our model. Next, we fix the problem by introducing a new algorithm with features constructed from "global information" of the data matrix. Our approach is in sharp contrast to most existing feature engineering methods. Finally, we design another new algorithm identifying similar alternatives. The construction of alternative features can be done using "local information," highlighting the algorithmic difference between finding similar agents and similar alternatives.


Case Set for Review After Man Dies 10 Months After Shooting

U.S. News

The Vanderburgh County Coroner's office says Austin Smith died Friday. He was shot on Aug. 31, 2017. Twenty-two-year-old Travis Phelps is accused of firing several shots into Smith's car, causing him to crash.


Manifold Structured Prediction

arXiv.org Machine Learning

Regression and classification are probably the most classical machine learning problems and correspond to estimating a function with scalar and binary values, respectively. In practice, it is often interesting to estimate functions with more structured outputs. When the output space can be assumed to be a vector space, many ideas from regression can be extended, think for example to multivariate [14] or functional regression [23]. However, a lack of a natural vector structure is a feature of many practically interesting problems, such as ranking [11], quantile estimation [19] or graph prediction [28]. In this latter case, the outputs are typically provided only with some distance or similarity function that can be used to design appropriate loss function. Knowledge of the loss is sufficient to analyze an abstract empirical risk minimization approach within the framework of statistical learning theory, but deriving approaches that are at the same time statistically sound and computationally feasible is a key challenge. While ad-hoc solutions are available for many specific problems [7, 9, 18, 27], structured prediction [5] provides a unifying framework where a variety of problems can be tackled as special cases.


Comparison-Based Random Forests

arXiv.org Machine Learning

Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification that relies only on such triplet comparisons. In the theory part of this paper, we establish sufficient conditions for the consistency of such a forest. In a set of comprehensive experiments, we then demonstrate that the proposed random forest is efficient both for classification and regression. In particular, it is even competitive with other methods that have direct access to the metric representation of the data.


Report on FBI Actions in Clinton Email Case Set for Release

U.S. News

FILE - In this April 6, 2017, file photo, former Secretary of State Hillary Clinton speaks in New York. The Justice Department's internal watchdog is expected to criticize the FBI's handling of the Clinton email investigation, stepping into a political minefield as it details how a determinedly non-partisan law enforcement agency came to be entangled in the 2016 presidential race. President Donald Trump will look to the inspector general report to provide a fresh line of attack against the FBI's two former top officials, Director James Comey and his deputy, Andrew McCabe, as he claims that a politically tainted bureau tried to undermine his campaign and, through the Russia investigation, his presidency.


Benchmarks for Image Classification and Other High-dimensional Pattern Recognition Problems

arXiv.org Machine Learning

A good classification method should yield more accurate results than simple heuristics. But there are classification problems, especially high-dimensional ones like the ones based on image/video data, for which simple heuristics can work quite accurately; the structure of the data in such problems is easy to uncover without any sophisticated or computationally expensive method. On the other hand, some problems have a structure that can only be found with sophisticated pattern recognition methods. We are interested in quantifying the difficulty of a given high-dimensional pattern recognition problem. We consider the case where the patterns come from two pre-determined classes and where the objects are represented by points in a high-dimensional vector space. However, the framework we propose is extendable to an arbitrarily large number of classes. We propose classification benchmarks based on simple random projection heuristics. Our benchmarks are 2D curves parameterized by the classification error and computational cost of these simple heuristics. Each curve divides the plane into a "positive- gain" and a "negative-gain" region. The latter contains methods that are ill-suited for the given classification problem. The former is divided into two by the curve asymptote; methods that lie in the small region under the curve but right of the asymptote merely provide a computational gain but no structural advantage over the random heuristics. We prove that the curve asymptotes are optimal (i.e. at Bayes error) in some cases, and thus no sophisticated method can provide a structural advantage over the random heuristics. Such classification problems, an example of which we present in our numerical experiments, provide poor ground for testing new pattern classification methods.


Sparse Stochastic Zeroth-Order Optimization with an Application to Bandit Structured Prediction

arXiv.org Machine Learning

Stochastic zeroth-order (SZO), or gradient-free, optimization allows to optimize arbitrary functions by relying only on function evaluations under parameter perturbations, however, the iteration complexity of SZO methods suffers a factor proportional to the dimensionality of the perturbed function. We show that in scenarios with natural sparsity patterns as in structured prediction applications, this factor can be reduced to the expected number of active features over input-output pairs. We give a general proof that applies sparse SZO optimization to Lipschitz-continuous, nonconvex, stochastic objectives, and present an experimental evaluation on linear bandit structured prediction tasks with sparse word-based feature representations that confirm our theoretical results.