arXiv.org Machine Learning


Mish: A Self Regularized Non-Monotonic Neural Activation Function

arXiv.org Machine Learning

The concept of non-linearity in a Neural Network is introduced by an activation function which serves an integral role in the training and performance evaluation of the network. Over the years of theoretical research, many activation functions have been proposed, however, only a few are widely used in mostly all applications which include ReLU (Rectified Linear Unit), TanH (Tan Hyperbolic), Sigmoid, Leaky ReLU and Swish. In this work, a novel neural activation function called as Mish is proposed. The experiments show that Mish tends to work better than both ReLU and Swish along with other standard activation functions in many deep networks across challenging datasets. For instance, in Squeeze Excite Net- 18 for CIFAR 100 classification, the network with Mish had an increase in Top-1 test accuracy by 0.494% and 1.671% as compared to the same network with Swish and ReLU respectively. The similarity to Swish along with providing a boost in performance and its simplicity in implementation makes it easier for researchers and developers to use Mish in their Neural Network Models.


QuicK-means: Acceleration of K-means by learning a fast transform

arXiv.org Machine Learning

K-means -- and the celebrated Lloyd algorithm -- is more than the clustering method it was originally designed to be. It has indeed proven pivotal to help increase the speed of many machine learning and data analysis techniques such as indexing, nearest-neighbor search and prediction, data compression; its beneficial use has been shown to carry over to the acceleration of kernel machines (when using the Nystr\"om method). Here, we propose a fast extension of K-means, dubbed QuicK-means, that rests on the idea of expressing the matrix of the $K$ centroids as a product of sparse matrices, a feat made possible by recent results devoted to find approximations of matrices as a product of sparse factors. Using such a decomposition squashes the complexity of the matrix-vector product between the factorized $K \times D$ centroid matrix $\mathbf{U}$ and any vector from $\mathcal{O}(K D)$ to $\mathcal{O}(A \log A+B)$, with $A=\min (K, D)$ and $B=\max (K, D)$, where $D$ is the dimension of the training data. This drastic computational saving has a direct impact in the assignment process of a point to a cluster, meaning that it is not only tangible at prediction time, but also at training time, provided the factorization procedure is performed during Lloyd's algorithm. We precisely show that resorting to a factorization step at each iteration does not impair the convergence of the optimization scheme and that, depending on the context, it may entail a reduction of the training time. Finally, we provide discussions and numerical simulations that show the versatility of our computationally-efficient QuicK-means algorithm.


Quantum algorithms for Second-Order Cone Programming and Support Vector Machines

arXiv.org Machine Learning

Convex optimization is one of the central areas of study in computer science and mathematical optimization. The reason for the great importance of convex optimization is twofold. Firstly, starting with the seminal works of Khachiyan [25] and Karmarkar [18], efficient algorithms have been developed for a large family of convex optimization problems over the last few decades. Secondly, convex optimization has many real world applications and many optimization problems that arise in practice can be reduced to convex optimization [8]. There are three main classes of structured convex optimization problems: linear programs (LP), semidefinite programs (SDP), and second-order conic programs (SOCP). The fastest (classical) algorithms for these problems belong to the family of interior-point methods (IPM). Interior point methods are iterative algorithms where the main computation in each step is the solution of a system of linear equations whose size depends on the dimension of the optimization problem. The size of structured optimization problems that can be solved in practice is therefore limited by the efficiency of linear system solvers - on a single computer, most open-source and commercial solvers can handle dense problems with up to tens of thousands of constraints and variables, or sparse problems with the same number of nonzero entries [30, 31]. In recent years, there has been a tremendous interest in quantum linear algebra algorithms following the breakthrough algorithm of Harrow, Hassidim and Lloyd [16].


Stress-Plus-X (SPX) Graph Layout

arXiv.org Machine Learning

Stress, edge crossings, and crossing angles play an important role in the quality and readability of graph drawings. Most standard graph drawing algorithms optimize one of these criteria which may lead to layouts that are deficient in other criteria. We introduce an optimization framework, Stress-Plus-X (SPX), that simultaneously optimizes stress together with several other criteria: edge crossings, minimum crossing angle, and upwardness (for directed acyclic graphs). SPX achieves results that are close to the state-of-the-art algorithms that optimize these metrics individually. SPX is flexible and extensible and can optimize a subset or all of these criteria simultaneously. Our experimental analysis shows that our joint optimization approach is successful in drawing graphs with good performance across readability criteria.


$\alpha$ Belief Propagation as Fully Factorized Approximation

arXiv.org Machine Learning

Belief propagation (BP) can do exact inference in loop-free graphs, but its performance could be poor in graphs with loops, and the understanding of its solution is limited. This work gives an interpretable belief propagation rule that is actually minimization of a localized $\alpha$-divergence. We term this algorithm as $\alpha$ belief propagation ($\alpha$-BP). The performance of $\alpha$-BP is tested in MAP (maximum a posterior) inference problems, where $\alpha$-BP can outperform (loopy) BP by a significant margin even in fully-connected graphs.


Generating High-Resolution Fashion Model Images Wearing Custom Outfits

arXiv.org Machine Learning

Visualizing an outfit is an essential part of shopping for clothes. Due to the combinatorial aspect of combining fashion articles, the available images are limited to a pre-determined set of outfits. In this paper, we broaden these visualizations by generating high-resolution images of fashion models wearing a custom outfit under an input body pose. We show that our approach can not only transfer the style and the pose of one generated outfit to another, but also create realistic images of human bodies and garments.


Fairness in Deep Learning: A Computational Perspective

arXiv.org Machine Learning

Nevertheless, fairness in machine learning remains a problem. Machine learning algorithms have the risk of amplifying societal stereotypes by over associating protected attributes, e.g., race and gender, with the main prediction task [33]. Eventually they are capable of exhibiting discriminatory behaviors against certain subgroups. For example, a recruiting tool believes that men are more qualified and shows bias against women [26], facial recognition performs extremely poorly for darker skin females [5], recognition accuracy is very low for subgroup of people in pedestrian detection of self-driving cars [33]. The fairness problem might cause adverse impacts on individuals and society. It not only limits a person's opportunities which he/she is qualified, but also might further exacerbate social inequity. Among different machine learning models, the fairness problem of deep learning models has especially attracted attention from academia and industry recently. First, deep learning models have outperformed conventional machinePermission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.


Spooky effect in optimal OSPA estimation and how GOSPA solves it

arXiv.org Machine Learning

--In this paper, we show the spooky effect at a distance that arises in optimal estimation of multiple targets with the optimal sub-pattern assignment (OSPA) metric. This effect refers to the fact that if we have several independent potential targets at distant locations, a change in the probability of existence of one of them can completely change the optimal estimation of the rest of the potential targets. As opposed to OSPA, the generalised OSPA (GOSPA) metric ( α 2) penalises localisation errors for properly detected targets, false targets and missed targets. As a consequence, optimal GOSPA estimation aims to lower the number of false and missed targets, as well as the localisation error for properly detected targets, and avoids the spooky effect. Multiple target estimation is an inherent part of many applications such as surveillance, self-driving vehicles, and air-traffic control [1]-[3]. The special characteristic of multiple target estimation is that it requires the estimation of the number of targets, which is unknown, as well as their states. In a Bayesian paradigm, given some noisy observations of a random variable of interest, all information about this variable is contained in its posterior probability density function [4]. Given the posterior and a cost function, optimal estimation is performed by minimising the expected value of this cost function with respect to the posterior [5], [6]. For example, for random vectors of fixed dimensionality, if the cost function is the square error, the optimal estimator, which is referred to as the minimum mean square error estimator, is the posterior mean.


Bayesian Receiver Operating Characteristic Metric for Linear Classifiers

arXiv.org Machine Learning

We propose a novel classifier accuracy metric: the Bayesian Area Under the Receiver Operating Characteristic Curve (CBAUC). The method estimates the area under the ROC curve and is related to the recently proposed Bayesian Error Estimator. The metric can assess the quality of a classifier using only the training dataset without the need for computationally expensive cross-validation. We derive a closed-form solution of the proposed accuracy metric for any linear binary classifier under the Gaussianity assumption, and study the accuracy of the proposed estimator using simulated and real-world data. These experiments confirm that the closed-form CBAUC is both faster and more accurate than conventional AUC estimators.


Lukthung Classification Using Neural Networks on Lyrics and Audios

arXiv.org Machine Learning

--Music genre classification is a widely researched topic in music information retrieval (MIR). Being able to automatically tag genres will benefit music streaming service providers such as JOOX, Apple Music, and Spotify for their content-based recommendation. However, most studies on music classification have been done on western songs which differ from Thai songs. Lukthung, a distinctive and long-established type of Thai music, is one of the most popular music genres in Thailand and has a specific group of listeners. In this paper, we develop neural networks to classify such Lukthung genre from others using both lyrics and audios. Words used in Lukthung songs are particularly poetical, and their musical styles are uniquely composed of traditional Thai instruments. We leverage these two main characteristics by building a lyrics model based on bag-of- words (BoW), and an audio model using a convolutional neural network (CNN) architecture. We then aggregate the intermediate features learned from both models to build a final classifier .