Goto

Collaborating Authors

 Country


Domain Generalization for Object Recognition with Multi-task Autoencoders

arXiv.org Artificial Intelligence

The problem of domain generalization is to take knowledge acquired from a number of related domains where training data is available, and to then successfully apply it to previously unseen domains. We propose a new feature learning algorithm, Multi-Task Autoencoder (MTAE), that provides good generalization performance for cross-domain object recognition. Our algorithm extends the standard denoising autoencoder framework by substituting artificially induced corruption with naturally occurring inter-domain variability in the appearance of objects. Instead of reconstructing images from noisy versions, MTAE learns to transform the original image into analogs in multiple related domains. It thereby learns features that are robust to variations across domains. The learnt features are then used as inputs to a classifier. We evaluated the performance of the algorithm on benchmark image recognition datasets, where the task is to learn features from multiple datasets and to then predict the image label from unseen datasets. We found that (denoising) MTAE outperforms alternative autoencoder-based models as well as the current state-of-the-art algorithms for domain generalization.


Model Guided Sampling Optimization for Low-dimensional Problems

arXiv.org Machine Learning

Optimization of very expensive black-box functions requires utilization of maximum information gathered by the process of optimization. Model Guided Sampling Optimization (MGSO) forms a more robust alternative to Jones' Gaussian-process-based EGO algorithm. Instead of EGO's maximizing expected improvement, the MGSO uses sampling the probability of improvement which is shown to be helpful against trapping in local minima. Further, the MGSO can reach close-to-optimum solutions faster than standard optimization algorithms on low dimensional or smooth problems.


Calibration of One-Class SVM for MV set estimation

arXiv.org Machine Learning

A general approach for anomaly detection or novelty detection consists in estimating high density regions or Minimum Volume (MV) sets. The One-Class Support Vector Machine (OCSVM) is a state-of-the-art algorithm for estimating such regions from high dimensional data. Yet it suffers from practical limitations. When applied to a limited number of samples it can lead to poor performance even when picking the best hyperparameters. Moreover the solution of OCSVM is very sensitive to the selection of hyperparameters which makes it hard to optimize in an unsupervised setting. We present a new approach to estimate MV sets using the OCSVM with a different choice of the parameter controlling the proportion of outliers. The solution function of the OCSVM is learnt on a training set and the desired probability mass is obtained by adjusting the offset on a test set to prevent overfitting. Models learnt on different train/test splits are then aggregated to reduce the variance induced by such random splits. Our approach makes it possible to tune the hyperparameters automatically and obtain nested set estimates. Experimental results show that our approach outperforms the standard OCSVM formulation while suffering less from the curse of dimensionality than kernel density estimates. Results on actual data sets are also presented.


Dictionary Learning for Blind One Bit Compressed Sensing

arXiv.org Machine Learning

This letter proposes a dictionary learning algorithm for blind one bit compressed sensing. In the blind one bit compressed sensing framework, the original signal to be reconstructed from one bit linear random measurements is sparse in an unknown domain. In this context, the multiplication of measurement matrix $\Ab$ and sparse domain matrix $\Phi$, \ie $\Db=\Ab\Phi$, should be learned. Hence, we use dictionary learning to train this matrix. Towards that end, an appropriate continuous convex cost function is suggested for one bit compressed sensing and a simple steepest-descent method is exploited to learn the rows of the matrix $\Db$. Experimental results show the effectiveness of the proposed algorithm against the case of no dictionary learning, specially with increasing the number of training signals and the number of sign measurements.


Stabilized Nearest Neighbor Classifier and Its Statistical Properties

arXiv.org Machine Learning

The stability of statistical analysis is an important indicator for reproducibility, which is one main principle of scientific method. It entails that similar statistical conclusions can be reached based on independent samples from the same underlying population. In this paper, we introduce a general measure of classification instability (CIS) to quantify the sampling variability of the prediction made by a classification method. Interestingly, the asymptotic CIS of any weighted nearest neighbor classifier turns out to be proportional to the Euclidean norm of its weight vector. Based on this concise form, we propose a stabilized nearest neighbor (SNN) classifier, which distinguishes itself from other nearest neighbor classifiers, by taking the stability into consideration. In theory, we prove that SNN attains the minimax optimal convergence rate in risk, and a sharp convergence rate in CIS. The latter rate result is established for general plug-in classifiers under a low-noise condition. Extensive simulated and real examples demonstrate that SNN achieves a considerable improvement in CIS over existing nearest neighbor classifiers, with comparable classification accuracy. We implement the algorithm in a publicly available R package snn.


Parameter estimation in softmax decision-making models with linear objective functions

arXiv.org Machine Learning

With an eye towards human-centered automation, we contribute to the development of a systematic means to infer features of human decision-making from behavioral data. Motivated by the common use of softmax selection in models of human decision-making, we study the maximum likelihood parameter estimation problem for softmax decision-making models with linear objective functions. We present conditions under which the likelihood function is convex. These allow us to provide sufficient conditions for convergence of the resulting maximum likelihood estimator and to construct its asymptotic distribution. In the case of models with nonlinear objective functions, we show how the estimator can be applied by linearizing about a nominal parameter value. We apply the estimator to fit the stochastic UCL (Upper Credible Limit) model of human decision-making to human subject data. We show statistically significant differences in behavior across related, but distinct, tasks.


Regularized Kernel Recursive Least Square Algoirthm

arXiv.org Machine Learning

In most adaptive signal processing applications, system linearity is assumed and adaptive linear filters are thus used. The traditional class of supervised adaptive filters rely on error-correction learning for their adaptive capability. The kernel method is a powerful nonparametric modeling tool for pattern analysis and statistical signal processing. Through a nonlinear mapping, kernel methods transform the data into a set of points in a Reproducing Kernel Hilbert Space. KRLS achieves high accuracy and has fast convergence rate in stationary scenario. However the good performance is obtained at a cost of high computation complexity. Sparsification in kernel methods is know to related to less computational complexity and memory consumption.


Fast and Flexible ADMM Algorithms for Trend Filtering

arXiv.org Machine Learning

This paper presents a fast and robust algorithm for trend filtering, a recently developed nonparametric regression tool. It has been shown that, for estimating functions whose derivatives are of bounded variation, trend filtering achieves the minimax optimal error rate, while other popular methods like smoothing splines and kernels do not. Standing in the way of a more widespread practical adoption, however, is a lack of scalable and numerically stable algorithms for fitting trend filtering estimates. This paper presents a highly efficient, specialized ADMM routine for trend filtering. Our algorithm is competitive with the specialized interior point methods that are currently in use, and yet is far more numerically robust. Furthermore, the proposed ADMM implementation is very simple, and importantly, it is flexible enough to extend to many interesting related problems, such as sparse trend filtering and isotonic trend filtering. Software for our method is freely available, in both the C and R languages.


A Review of Nonnegative Matrix Factorization Methods for Clustering

arXiv.org Machine Learning

Nonnegative Matrix Factorization (NMF) was first introduced as a low-rank matrix approximation technique, and has enjoyed a wide area of applications. Although NMF does not seem related to the clustering problem at first, it was shown that they are closely linked. In this report, we provide a gentle introduction to clustering and NMF before reviewing the theoretical relationship between them. We then explore several NMF variants, namely Sparse NMF, Projective NMF, Nonnegative Spectral Clustering and Cluster-NMF, along with their clustering interpretations.


Partitioning Large Scale Deep Belief Networks Using Dropout

arXiv.org Machine Learning

Deep learning methods have shown great promise in many practical applications, ranging from speech recognition, visual object recognition, to text processing. However, most of the current deep learning methods suffer from scalability problems for large-scale applications, forcing researchers or users to focus on small-scale problems with fewer parameters. In this paper, we consider a well-known machine learning model, deep belief networks (DBNs) that have yielded impressive classification performance on a large number of benchmark machine learning tasks. To scale up DBN, we propose an approach that can use the computing clusters in a distributed environment to train large models, while the dense matrix computations within a single machine are sped up using graphics processors (GPU). When training a DBN, each machine randomly drops out a portion of neurons in each hidden layer, for each training case, making the remaining neurons only learn to detect features that are generally helpful for producing the correct answer. Within our approach, we have developed four methods to combine outcomes from each machine to form a unified model. Our preliminary experiment on the mnst handwritten digit database demonstrates that our approach outperforms the state of the art test error rate.