Country
An Improved Bound for the Nystrom Method for Large Eigengap
Mahdavi, Mehrdad, Yang, Tianbao, Jin, Rong
We develop an improved bound for the approximation error of the Nystr\"{o}m method under the assumption that there is a large eigengap in the spectrum of kernel matrix. This is based on the empirical observation that the eigengap has a significant impact on the approximation error of the Nystr\"{o}m method. Our approach is based on the concentration inequality of integral operator and the theory of matrix perturbation. Our analysis shows that when there is a large eigengap, we can improve the approximation error of the Nystr\"{o}m method from $O(N/m^{1/4})$ to $O(N/m^{1/2})$ when measured in Frobenius norm, where $N$ is the size of the kernel matrix, and $m$ is the number of sampled columns.
A Widely Applicable Bayesian Information Criterion
A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature $1/\log n$, where $n$ is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for and unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models.
Comparative Study and Optimization of Feature-Extraction Techniques for Content based Image Retrieval
Chadha, Aman, Mallik, Sushmit, Johar, Ravdeep
The aim of a Content-Based Image Retrieval (CBIR) system, also known as Query by Image Content (QBIC), is to help users to retrieve relevant images based on their contents. CBIR technologies provide a method to find images in large databases by using unique descriptors from a trained image. The image descriptors include texture, color, intensity and shape of the object inside an image. Several feature-extraction techniques viz., Average RGB, Color Moments, Co-occurrence, Local Color Histogram, Global Color Histogram and Geometric Moment have been critically compared in this paper. However, individually these techniques result in poor performance. So, combinations of these techniques have also been evaluated and results for the most efficient combination of techniques have been presented and optimized for each class of image query. We also propose an improvement in image retrieval performance by introducing the idea of Query modification through image cropping. It enables the user to identify a region of interest and modify the initial query to refine and personalize the image retrieval results.
Practical Bayesian Optimization of Machine Learning Algorithms
Snoek, Jasper, Larochelle, Hugo, Adams, Ryan P.
Machine learning algorithms frequently require careful tuning of model hyperparameters, regularization terms, and optimization parameters. Unfortunately, this tuning is often a "black art" that requires expert experience, unwritten rules of thumb, or sometimes brute-force search. Much more appealing is the idea of developing automatic approaches which can optimize the performance of a given learning algorithm to the task at hand. In this work, we consider the automatic tuning problem within the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). The tractable posterior distribution induced by the GP leads to efficient use of the information gathered by previous experiments, enabling optimal choices about what parameters to try next. Here we show how the effects of the Gaussian process prior and the associated inference procedure can have a large impact on the success or failure of Bayesian optimization. We show that thoughtful choices can lead to results that exceed expert-level performance in tuning machine learning algorithms. We also describe new algorithms that take into account the variable cost (duration) of learning experiments and that can leverage the presence of multiple cores for parallel experimentation. We show that these proposed algorithms improve on previous automatic procedures and can reach or surpass human expert-level optimization on a diverse set of contemporary algorithms including latent Dirichlet allocation, structured SVMs and convolutional neural networks.
Message passing with relaxed moment matching
Bayesian learning is often hampered by large computational expense. As a powerful generalization of popular belief propagation, expectation propagation (EP) efficiently approximates the exact Bayesian computation. Nevertheless, EP can be sensitive to outliers and suffer from divergence for difficult cases. To address this issue, we propose a new approximate inference approach, relaxed expectation propagation (REP). It relaxes the moment matching requirement of expectation propagation by adding a relaxation factor into the KL minimization. We penalize this relaxation with a $l_1$ penalty. As a result, when two distributions in the relaxed KL divergence are similar, the relaxation factor will be penalized to zero and, therefore, we obtain the original moment matching; In the presence of outliers, these two distributions are significantly different and the relaxation factor will be used to reduce the contribution of the outlier. Based on this penalized KL minimization, REP is robust to outliers and can greatly improve the posterior approximation quality over EP. To examine the effectiveness of REP, we apply it to Gaussian process classification, a task known to be suitable to EP. Our classification results on synthetic and UCI benchmark datasets demonstrate significant improvement of REP over EP and Power EP--in terms of algorithmic stability, estimation accuracy and predictive performance.
The expected performance of stellar parametrization with Gaia spectrophotometry
Liu, C., Bailer-Jones, C. A. L., Sordo, R., Vallenari, A., Borrachero, R., Luri, X., Sartoretti, P.
Gaia will obtain astrometry and spectrophotometry for essentially all sources in the sky down to a broad band magnitude limit of G=20, an expected yield of 10^9 stars. Its main scientific objective is to reveal the formation and evolution of our Galaxy through chemo-dynamical analysis. In addition to inferring positions, parallaxes and proper motions from the astrometry, we must also infer the astrophysical parameters of the stars from the spectrophotometry, the BP/RP spectrum. Here we investigate the performance of three different algorithms (SVM, ILIUM, Aeneas) for estimating the effective temperature, line-of-sight interstellar extinction, metallicity and surface gravity of A-M stars over a wide range of these parameters and over the full magnitude range Gaia will observe (G=6-20mag). One of the algorithms, Aeneas, infers the posterior probability density function over all parameters, and can optionally take into account the parallax and the Hertzsprung-Russell diagram to improve the estimates. For all algorithms the accuracy of estimation depends on G and on the value of the parameters themselves, so a broad summary of performance is only approximate. For stars at G=15 with less than two magnitudes extinction, we expect to be able to estimate Teff to within 1%, logg to 0.1-0.2dex, and [Fe/H] (for FGKM stars) to 0.1-0.2dex, just using the BP/RP spectrum (mean absolute error statistics are quoted). Performance degrades at larger extinctions, but not always by a large amount. Extinction can be estimated to an accuracy of 0.05-0.2mag for stars across the full parameter range with a priori unknown extinction between 0 and 10mag. Performance degrades at fainter magnitudes, but even at G=19 we can estimate logg to better than 0.2dex for all spectral types, and [Fe/H] to within 0.35dex for FGKM stars, for extinctions below 1mag.
Document Clustering Evaluation: Divergence from a Random Baseline
De Vries, Christopher M., Geva, Shlomo, Trotman, Andrew
Divergence from a random baseline is a technique for the evaluation of document clustering. It ensures cluster quality measures are performing work that prevents ineffective clusterings from giving high scores to clusterings that provide no useful result. These concepts are defined and analysed using intrinsic and extrinsic approaches to the evaluation of document cluster quality. This includes the classical clusters to categories approach and a novel approach that uses ad hoc information retrieval. The divergence from a random baseline approach is able to differentiate ineffective clusterings encountered in the INEX XML Mining track. It also appears to perform a normalisation similar to the Normalised Mutual Information (NMI) measure but it can be applied to any measure of cluster quality. When it is applied to the intrinsic measure of distortion as measured by RMSE, subtraction from a random baseline provides a clear optimum that is not apparent otherwise. This approach can be applied to any clustering evaluation. This paper describes its use in the context of document clustering evaluation.
Soft Computing approaches on the Bandwidth Problem
Czibula, Gabriela, Crisan, Gloria Cerasela, Pintea, Camelia-M., Czibula, Istvan-Gergely
The Matrix Bandwidth Minimization Problem (MBMP) seeks for a simultaneous reordering of the rows and the columns of a square matrix such that the nonzero entries are collected within a band of small width close to the main diagonal. The MBMP is a NP-complete problem, with applications in many scientific domains, linear systems, artificial intelligence, and real-life situations in industry, logistics, information recovery. The complex problems are hard to solve, that is why any attempt to improve their solutions is beneficent. Genetic algorithms and ant-based systems are Soft Computing methods used in this paper in order to solve some MBMP instances. Our approach is based on a learning agent-based model involving a local search procedure. The algorithm is compared with the classical Cuthill-McKee algorithm, and with a hybrid genetic algorithm, using several instances from Matrix Market collection. Computational experiments confirm a good performance of the proposed algorithms for the considered set of MBMP instances. On Soft Computing basis, we also propose a new theoretical Reinforcement Learning model for solving the MBMP problem.
Bayesian Network Structure Learning with Permutation Tests
Scutari, Marco, Brogini, Adriana
In literature there are several studies on the performance of Bayesian network structure learning algorithms. The focus of these studies is almost always the heuristics the learning algorithms are based on, i.e. the maximisation algorithms (in score-based algorithms) or the techniques for learning the dependencies of each variable (in constraint-based algorithms). In this paper we investigate how the use of permutation tests instead of parametric ones affects the performance of Bayesian network structure learning from discrete data. Shrinkage tests are also covered to provide a broad overview of the techniques developed in current literature.
New Analysis and Algorithm for Learning with Drifting Distributions
Mohri, Mehryar, Medina, Andres Munoz
We present a new analysis of the problem of learning with drifting distributions in the batch setting using the notion of discrepancy. We prove learning bounds based on the Rademacher complexity of the hypothesis set and the discrepancy of distributions both for a drifting PAC scenario and a tracking scenario. Our bounds are always tighter and in some cases substantially improve upon previous ones based on the $L_1$ distance. We also present a generalization of the standard on-line to batch conversion to the drifting scenario in terms of the discrepancy and arbitrary convex combinations of hypotheses. We introduce a new algorithm exploiting these learning guarantees, which we show can be formulated as a simple QP. Finally, we report the results of preliminary experiments demonstrating the benefits of this algorithm.