Statistical Learning
Reducing the Effects of Detrimental Instances
Smith, Michael R., Martinez, Tony
Not all instances in a data set are equally beneficial for inducing a model of the data. Some instances (such as outliers or noise) can be detrimental. However, at least initially, the instances in a data set are generally considered equally in machine learning algorithms. Many current approaches for handling noisy and detrimental instances make a binary decision about whether an instance is detrimental or not. In this paper, we 1) extend this paradigm by weighting the instances on a continuous scale and 2) present a methodology for measuring how detrimental an instance may be for inducing a model of the data. We call our method of identifying and weighting detrimental instances reduced detrimental instance learning (RDIL). We examine RIDL on a set of 54 data sets and 5 learning algorithms and compare RIDL with other weighting and filtering approaches. RDIL is especially useful for learning algorithms where every instance can affect the classification boundary and the training instances are considered individually, such as multilayer perceptrons trained with backpropagation (MLPs). Our results also suggest that a more accurate estimate of which instances are detrimental can have a significant positive impact for handling them.
A stochastic behavior analysis of stochastic restricted-gradient descent algorithm in reproducing kernel Hilbert spaces
Takizawa, Masa-aki, Yukawa, Masahiro, Richard, Cedric
This paper presents a stochastic behavior analysis of a kernel-based stochastic restricted-gradient descent method. The restricted gradient gives a steepest ascent direction within the so-called dictionary subspace. The analysis provides the transient and steady state performance in the mean squared error criterion. It also includes stability conditions in the mean and mean-square sense. The present study is based on the analysis of the kernel normalized least mean square (KNLMS) algorithm initially proposed by Chen et al. Simulation results validate the analysis.
Sparsity Based Poisson Denoising with Dictionary Learning
The problem of Poisson denoising appears in various imaging applications, such as low-light photography, medical imaging and microscopy. In cases of high SNR, several transformations exist so as to convert the Poisson noise into an additive i.i.d. Gaussian noise, for which many effective algorithms are available. However, in a low SNR regime, these transformations are significantly less accurate, and a strategy that relies directly on the true noise statistics is required. A recent work by Salmon et al. took this route, proposing a patch-based exponential image representation model based on GMM (Gaussian mixture model), leading to state-of-the-art results. In this paper, we propose to harness sparse-representation modeling to the image patches, adopting the same exponential idea. Our scheme uses a greedy pursuit with boot-strapping based stopping condition and dictionary learning within the denoising process. The reconstruction performance of the proposed scheme is competitive with leading methods in high SNR, and achieving state-of-the-art results in cases of low SNR.
Scene Image is Non-Mutually Exclusive - A Fuzzy Qualitative Scene Understanding
Lim, Chern Hong, Risnumawan, Anhar, Chan, Chee Seng
One of the biggest challenges in real world decision making process is to cope with uncertainty, complexity, volatility and ambiguity. How do we deal with this growing confusion in our world? In scene understanding, an important and yet difficult image understanding problem due to their variability, ambiguity, wide range of illumination and scale conditions falls into this category. The conventional goal of the works is to assign an unknown scene image to one of the several possible classes. For example, Figure 1(a) is a Coast class scene while Figure 1(c) is a Mountain class scene. Intentionally, most state-of-the-art approaches in scene understanding domain [1]-[4] are exemplar-based and assume that scene images are mutually exclusive, P (A B) 0. This simplifies the complex problem of scene understanding (uncertainty, complexity, volatility, and ambiguity) to a simple binary classification task. Such approaches learn patterns from a training set and subsequently, search for the images similar to it. As a result of this, classification errors often occur when the scene classes overlap in the selected feature space.
Efficient Modeling and Forecasting of the Electricity Spot Price
Ziel, Florian, Steinert, Rick, Husmann, Sven
The increasing importance of renewable energy, especially solar and wind power, has led to new forces in the formation of electricity prices. Hence, this paper introduces an econometric model for the hourly time series of electricity prices of the European Power Exchange (EPEX) which incorporates specific features like renewable energy. The model consists of several sophisticated and established approaches and can be regarded as a periodic VAR-TARCH with wind power, solar power, and load as influences on the time series. It is able to map the distinct and well-known features of electricity prices in Germany. An efficient iteratively reweighted lasso approach is used for the estimation. Moreover, it is shown that several existing models are outperformed by the procedure developed in this paper.
Fast Multilevel Support Vector Machines
Razzaghi, Talayeh, Safro, Ilya
Solving different types of optimization models (including parameters fitting) for support vector machines on large-scale training data is often an expensive computational task. This paper proposes a multilevel algorithmic framework that scales efficiently to very large data sets. Instead of solving the whole training set in one optimization process, the support vectors are obtained and gradually refined at multiple levels of coarseness of the data. The proposed framework includes: (a) construction of hierarchy of large-scale data coarse representations, and (b) a local processing of updating the hyperplane throughout this hierarchy. Our multilevel framework substantially improves the computational time without loosing the quality of classifiers. The algorithms are demonstrated for both regular and weighted support vector machines. Experimental results are presented for balanced and imbalanced classification problems. Quality improvement on several imbalanced data sets has been observed.
Propagation Kernels
Neumann, Marion, Garnett, Roman, Bauckhage, Christian, Kersting, Kristian
We introduce propagation kernels, a general graph-kernel framework for efficiently measuring the similarity of structured data. Propagation kernels are based on monitoring how information spreads through a set of given graphs. They leverage early-stage distributions from propagation schemes such as random walks to capture structural information encoded in node labels, attributes, and edge information. This has two benefits. First, off-the-shelf propagation schemes can be used to naturally construct kernels for many graph types, including labeled, partially labeled, unlabeled, directed, and attributed graphs. Second, by leveraging existing efficient and informative propagation schemes, propagation kernels can be considerably faster than state-of-the-art approaches without sacrificing predictive performance. We will also show that if the graphs at hand have a regular structure, for instance when modeling image or video data, one can exploit this regularity to scale the kernel computation to large databases of graphs with thousands of nodes. We support our contributions by exhaustive experiments on a number of real-world graphs from a variety of application domains.
Markov Random Fields and Mass Spectra Discrimination
Mass spectrometry can involve two soft ionization techniques: matrix-assisted laser desorption ionization (MALDI) and surface-enhanced laser desorption and ionization (SELDI). For each analyzed fluid sample, MALDI or SELDI hardwares generate a high-dimensional mass spectrum, recording between 10,000 and 20,000 "mass-to-charge (m/z) ratios" corresponding to the ionized peptides present in the fluid sample, as well as "intensities" roughly quantifying the concentrations of these peptides in the sample. Generally m/z ratios take values anywhere between 200 and 20,000 Daltons, and are acquired with a known relative accuracy ρ which depends on the acquisition modalities, and ranges from 0.1% to 0.3%. Analyzing this type of high dimensional data oftern requires specialized software tools, implementing sophisticated machine learning techniques such as SVM (support vector machines) (Li and others (2004), Yu and others (2005)), artificial neural networks (Ball and others (2002)), or random forests (Izmirlian (2004)). These techniques typically generate "black-box" classifiers, which often reach good discrimination levels between cancerous and control groups, but are difficult to interpret biologically in terms of characteristic biomarkers patterns. This often leads to unexpected performance variations on totally new data sets. To develop clinically usable software tools for analysis of mass spectra acuired by MALDI or SELDI hardwares, a key step is to implement automated discovery of explicit "signatures", i.e. short lists of proteomic biomarkers with high discriminating powers between cancer groups (Yasui and others (2003)). Some easily interpretable automatic classifiers, such as linear combinations of biomarker weights (Wang and Chang (2011)), can be found in previous studies, but these approaches do not attempt to quantify the discriminating impact of simultaneous presence for specific pairs of biomarkers.
Multi-Scale Local Shape Analysis and Feature Selection in Machine Learning Applications
Bendich, Paul, Gasparovic, Ellen, Harer, John, Izmailov, Rauf, Ness, Linda
The goal of this paper is to introduce a preliminary version of what we call multi-scale local shape analysis (MLSA), a method for extracting features of a dataset that describe the local structure, both manifold and singular, of points within the dataset. MLSA is a mixture of multi-scale local principal component analysis (MLPCA) and persistent local homology (PLH). In this paper, we will describe both of these techniques and our merger of them, and we will demonstrate the potential of MLSA on two synthetic datasets and one real one. The potential of these methods and their merger is investigated in the context of one of the typical applications for data analytics: the classification problem for multidimensional datasets. Thus the relevance of the developed techniques is assessed as the quality of the resulting classification decision rule, measured by the expected test misclassification error, its sensitivity and specificity (false positive and false negative error rates).
On model selection consistency of regularized M-estimators
Lee, Jason D., Sun, Yuekai, Taylor, Jonathan E.
Regularized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Usually the low-dimensional structure is encoded by the presence of the (unknown) parameters in some low-dimensional model subspace. In such settings, it is desirable for estimates of the model parameters to be \emph{model selection consistent}: the estimates also fall in the model subspace. We develop a general framework for establishing consistency and model selection consistency of regularized M-estimators and show how it applies to some special cases of interest in statistical learning. Our analysis identifies two key properties of regularized M-estimators, referred to as geometric decomposability and irrepresentability, that ensure the estimators are consistent and model selection consistent.