Goto

Collaborating Authors

 Country


Features or Shape? Tackling the False Dichotomy of Time Series Classification

arXiv.org Machine Learning

Time series classification is an important task in its own right, and it is often a precursor to further downstream analytics. To date, virtually all works in the literature have used either shape - based classification using a distance measure or feature - based classification after finding some suitable features for the do main . I t seems to be underappreciated that in many datasets it is the case that some classes are best discriminated with fea tures, while others are best discriminated with shape. Thus, making the shape vs. feature choice will condemn us to poor results, at least for some classes. In this work, we propose a new model for classifying time series that allows the use of both shape and feature - based measures, when warranted . Our algorithm automatically decides which approach is best for which class, and at query time chooses which classifier to trust the most. We evaluate our idea on real world datasets and demonstrate that our ideas produce statistically significant improvement in classification accuracy .


Group-Connected Multilayer Perceptron Networks

arXiv.org Machine Learning

Despite the success of deep learning in domains such as image, voice, and graphs, there has been little progress in deep representation learning for domains without a known structure between features. For instance, a tabular dataset of different demographic and clinical factors where the feature interactions are not given as a prior. In this paper, we propose Group-Connected Multilayer Perceptron (GMLP) networks to enable deep representation learning in these domains. GMLP is based on the idea of learning expressive feature combinations (groups) and exploiting them to reduce the network complexity by defining local group-wise operations. During the training phase, GMLP learns a sparse feature grouping matrix using temperature annealing softmax with an added entropy loss term to encourage the sparsity. Furthermore, an architecture is suggested which resembles binary trees, where group-wise operations are followed by pooling operations to combine information; reducing the number of groups as the network grows in depth. To evaluate the proposed method, we conducted experiments on five different real-world datasets covering various application areas. Additionally, we provide visualizations on MNIST and synthesized data. According to the results, GMLP is able to successfully learn and exploit expressive feature combinations and achieve state-of-the-art classification performance on different datasets.


Exploiting the potential of deep reinforcement learning for classification tasks in high-dimensional and unstructured data

arXiv.org Machine Learning

This paper presents a framework for efficiently learning feature selection policies which use less features to reach a high classification precision on large unstructured data. It uses a Deep Convolutional Autoencoder (DCAE) for learning compact feature spaces, in combination with recently-proposed Reinforcement Learning (RL) algorithms as Double DQN and Retrace.


Gaussian Process Latent Variable Model Factorization for Context-aware Recommender Systems

arXiv.org Machine Learning

Context-aware recommender systems (CARS) have gained increasing attention due to their ability to utilize contextual information. Compared to traditional recommender systems, CARS are, in general, able to generate more accurate recommendations. Latent factors approach accounts for a large proportion of CARS. Recently, a nonlinear Gaussian Process (GP) based factorization method was proven to outperform the state-of-the-art methods in CARS. Despite its effectiveness, GP model-based methods can suffer from over-fitting and may not be able to determine the impact of each context automatically. In order to address such shortcomings, we propose a Gaussian Process Latent V ariable Model Factorization (GPL VMF) method, where we apply an appropriate prior to the original GP model. Our work is primarily inspired by the Gaussian Process Latent V ariable Model (GPL VM), which was a nonlinear dimensionality reduction method. As a result, we improve the performance on the real datasets significantly as well as capturing the importance of each context. In addition to the general advantages, our method provides two main contributions regarding recommender system settings: (1) addressing the influence of bias by setting a nonzero mean function, and (2) utilizing real-valued contexts by fixing the latent space with real values.


Graph Convolutional Networks: analysis, improvements and results

arXiv.org Machine Learning

In the current era of neural networks and big data, higher dimensional data is processed for automation of different application areas. Graphs represent a complex data organization in which dependencies between more than one object or activity occur. Due to the high dimensionality, this data creates challenges for machine learning algorithms. Graph convolutional networks were introduced to utilize the convolutional models concepts that shows good results. In this context, we enhanced two of the existing Graph convolutional network models by proposing four enhancements. These changes includes: hyper parameters optimization, convex combination of activation functions, topological information enrichment through clustering coefficients measure, and structural redesign of the network through addition of dense layers. We present extensive results on four state-of-art benchmark datasets. The performance is notable not only in terms of lesser computational cost compared to competitors, but also achieved competitive results for three of the datasets and state-of-the-art for the fourth dataset.


Data Science through the looking glass and what we found there

arXiv.org Machine Learning

The recent success of machine learning (ML) has led to an explosive growth both in terms of new systems and algorithms built in industry and academia, and new applications built by an ever-growing community of data science (DS) practitioners. This quickly shifting panorama of technologies and applications is challenging for builders and practitioners alike to follow. In this paper, we set out to capture this panorama through a wide-angle lens, by performing the largest analysis of DS projects to date, focusing on questions that can help determine investments on either side. Specifically, we download and analyze: (a) over 6M Python notebooks publicly available on GITHUB, (b) over 2M enterprise DS pipelines developed within COMPANYX, and (c) the source code and metadata of over 900 releases from 12 important DS libraries. The analysis we perform ranges from coarse-grained statistical characterizations to analysis of library imports, pipelines, and comparative studies across datasets and time. We report a large number of measurements for our readers to interpret, and dare to draw a few (actionable, yet subjective) conclusions on (a) what systems builders should focus on to better serve practitioners, and (b) what technologies should practitioners bet on given current trends. We plan to automate this analysis and release associated tools and results periodically.


Towards Verifying Robustness of Neural Networks Against Semantic Perturbations

arXiv.org Machine Learning

Verifying robustness of neural networks given a specified threat model is a fundamental yet challenging task. While current verification methods mainly focus on the L_p-norm-ball threat model of the input instances, robustness verification against semantic adversarial attacks inducing large L_p-norm perturbations such as color shifting and lighting adjustment are beyond their capacity. To bridge this gap, we propose Semantify-NN, a model-agnostic and generic robustness verification approach against semantic perturbations for neural networks. By simply inserting our proposed semantic perturbation layers (SP-layers) to the input layer of any given model, Semantify-NN is model-agnostic, and any $L_p$-norm-ball based verification tools can be used to verify the model robustness against semantic perturbations. We illustrate the principles of designing the SP-layers and provide examples including semantic perturbations to image classification in the space of hue, saturation, lightness, brightness, contrast and rotation, respectively. Experimental results on various network architectures and different datasets demonstrate the superior verification performance of Semantify-NN over L_p-norm-based verification frameworks that naively convert semantic perturbation to L_p-norm. To the best of our knowledge, Semantify-NN is the first framework to support robustness verification against a wide range of semantic perturbations.


Statistical Testing on ASR Performance via Blockwise Bootstrap

arXiv.org Machine Learning

A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence intervals can be utilized to tell whether this improvement is real or only due to random chance. The bootstrap resampling method has been popular for such significance analysis which is intuitive and easy to use. However, this method fails in dealing with dependent data, which is prevalent in speech world - for example, ASR performance on utterances from the same speaker could be correlated. In this paper we present blockwise bootstrap approach - by dividing evaluation utterances into nonoverlapping blocks, this method resamples these blocks instead of original data. We show that the resulting variance estimator of absolute WER difference of two ASR systems is consistent under mild conditions. We also demonstrate the validity of blockwise bootstrap method on both synthetic and real-world speech data.


Zeroth-order Stochastic Compositional Algorithms for Risk-Aware Learning

arXiv.org Machine Learning

We present Free-MESSAGEp, the first zeroth-order algorithm for convex mean-semideviation-based risk-aware learning, which is also the first three-level zeroth-order compositional stochastic optimization algorithm, whatsoever. Using a non-trivial extension of Nesterov's classical results on Gaussian smoothing, we develop the Free-MESSAGEp algorithm from first principles, and show that it essentially solves a smoothed surrogate to the original problem, the former being a uniform approximation of the latter, in a useful, convenient sense. We then present a complete analysis of the Free-MESSAGEp algorithm, which establishes convergence in a user-tunable neighborhood of the optimal solutions of the original problem, as well as explicit convergence rates for both convex and strongly convex costs. Orderwise, and for fixed problem parameters, our results demonstrate no sacrifice in convergence speed compared to existing first-order methods, while striking a certain balance among the condition of the problem, its dimensionality, as well as the accuracy of the obtained results, naturally extending previous results in zeroth-order risk-neutral learning.


Pseudo-Encoded Stochastic Variational Inference

arXiv.org Machine Learning

Posterior inference in directed graphical models is commonly done using a probabilistic encoder (a.k.a inference model) conditioned on the input. Often this inference model is trained jointly with the probabilistic decoder (a.k.a generator model). If probabilistic encoder encounters complexities during training (e.g. suboptimal complxity or parameterization), then learning reaches a suboptimal objective; a phenomena commonly called inference suboptimality. In Variational Inference (VI), optimizing the ELBo using Stochastic Variational Inference (SVI) can eliminate the inference suboptimality (as demonstrated in this paper), however, this solution comes at a substantial computational cost when inference needs to be done on new data points. Essentially, a long sequential chain of gradient updates is required to fully optimize approximate posteriors. In this paper, we present an approach called Pseudo-Encoded Stochastic Variational Inference (PE-SVI), to reduce the inference complexity of SVI during test time. Our approach relies on finding a suitable initial start point for gradient operations, which naturally reduces the required gradient steps. Furthermore, this initialization allows for adopting larger step sizes (compared to random initialization used in SVI), which further reduces the inference time complexity. PE-SVI reaches the same ELBo objective as SVI using less than one percent of required steps, on average.