AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Simultaneous Model Selection and Optimization through Parameter-free Stochastic Learning

arXiv.org Machine LearningJun-15-2014

Stochastic gradient descent algorithms for training linear and kernel predictors are gaining more and more importance, thanks to their scalability. While various methods have been proposed to speed up their convergence, the model selection phase is often ignored. In fact, in theoretical works most of the time assumptions are made, for example, on the prior knowledge of the norm of the optimal solution, while in the practical world validation methods remain the only viable approach. In this paper, we propose a new kernel-based stochastic gradient descent algorithm that performs model selection while training, with no parameters to tune, nor any form of cross-validation. The algorithm builds on recent advancement in online learning theory for unconstrained settings, to estimate over time the right regularization in a data-dependent way. Optimal rates of convergence are proved under standard smoothness assumptions on the target function, using the range space of the fractional integral operator associated with the kernel.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1406.3816

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Education > Educational Setting > Online (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)

Add feedback

Dimensionality reduction for time series data

Vidaurre, Diego, Rezek, Iead, Harrison, Samuel L., Smith, Stephen S., Woolrich, Mark

arXiv.org Machine LearningJun-14-2014

Despite the fact that they do not consider the temporal nature of data, classic dimensionality reduction techniques, such as PCA, are widely applied to time series data. In this paper, we introduce a factor decomposition specific for time series that builds upon the Bayesian multivariate autoregressive model and hence evades the assumption that data points are mutually independent. The key is to find a low-rank estimation of the autoregressive matrices. As in the probabilistic version of other factor models, this induces a latent low-dimensional representation of the original data. We discuss some possible generalisations and alternatives, with the most relevant being a technique for simultaneous smoothing and dimensionality reduction. To illustrate the potential applications, we apply the model on a synthetic data set and different types of neuroimaging data (EEG and ECoG).

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Machine Learning

1406.3711

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Health Care Technology (0.54)
Health & Medicine > Diagnostic Medicine > Imaging (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.83)

Add feedback

Optimality of Graphlet Screening in High Dimensional Variable Selection

Jin, Jiashun, Zhang, Cun-Hui, Zhang, Qi

arXiv.org Machine LearningJun-13-2014

Consider a linear regression model where the design matrix X has n rows and p columns. We assume (a) p is much large than n, (b) the coefficient vector beta is sparse in the sense that only a small fraction of its coordinates is nonzero, and (c) the Gram matrix G = X'X is sparse in the sense that each row has relatively few large coordinates (diagonals of G are normalized to 1). The sparsity in G naturally induces the sparsity of the so-called graph of strong dependence (GOSD). We find an interesting interplay between the signal sparsity and the graph sparsity, which ensures that in a broad context, the set of true signals decompose into many different small-size components of GOSD, where different components are disconnected. We propose Graphlet Screening (GS) as a new approach to variable selection, which is a two-stage Screen and Clean method. The key methodological innovation of GS is to use GOSD to guide both the screening and cleaning. Compared to m-variate brute-forth screening that has a computational cost of p^m, the GS only has a computational cost of p (up to some multi-log(p) factors) in screening. We measure the performance of any variable selection procedure by the minimax Hamming distance. We show that in a very broad class of situations, GS achieves the optimal rate of convergence in terms of the Hamming distance. Somewhat surprisingly, the well-known procedures subset selection and the lasso are rate non-optimal, even in very simple settings and even when their tuning parameters are ideally set.

artificial intelligence, machine learning, selection, (14 more...)

arXiv.org Machine Learning

1204.6452

Country: North America > United States > Wisconsin (0.27)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)

Add feedback

An eigenvector-based hotspot detection

Fanaee-T, Hadi, Gama, Joao

arXiv.org Artificial IntelligenceJun-13-2014

Space and time are two critical components of many real world systems. For this reason, analysis of anomalies in spatiotemporal data has been a great of interest. In this work, application of tensor decomposition and eigenspace techniques on spatiotemporal hotspot detection is investigated. An algorithm called SST-Hotspot is proposed which accounts for spatiotemporal variations in data and detect hotspots using matching of eigenvector elements of two cases and population tensors. The experimental results reveal the interesting application of tensor decomposition and eigenvector-based techniques in hotspot analysis.

artificial intelligence, machine learning, spatial reasoning, (19 more...)

arXiv.org Artificial Intelligence

1406.3191

Country: North America > United States > New Mexico (0.31)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.47)

Add feedback

Eigenspace Method for Spatiotemporal Hotspot Detection

Fanaee-T, Hadi, Gama, João

arXiv.org Artificial IntelligenceJun-13-2014

Hotspot detection aims at identifying subgroups in the observations that are unexpected, with respect to the some baseline information. For instance, in disease surveillance, the purpose is to detect sub-regions in spatiotemporal space, where the count of reported diseases (e.g. Cancer) is higher than expected, with respect to the population. The state-of-the-art method for this kind of problem is the Space-Time Scan Statistics (STScan), which exhaustively search the whole space through a sliding window looking for significant spatiotemporal clusters. STScan makes some restrictive assumptions about the distribution of data, the shape of the hotspots and the quality of data, which can be unrealistic for some nontraditional data sources. A novel methodology called EigenSpot is proposed where instead of an exhaustive search over the space, tracks the changes in a space-time correlation structure. Not only does the new approach presents much more computational efficiency, but also makes no assumption about the data distribution, hotspot shape or the data quality. The principal idea is that with the joint combination of abnormal elements in the principal spatial and the temporal singular vectors, the location of hotspots in the spatiotemporal space can be approximated. A comprehensive experimental evaluation, both on simulated and real data sets reveals the effectiveness of the proposed method.

artificial intelligence, hotspot, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1111/exsy.12088

1406.3506

Country: North America > United States > New Mexico (0.15)

Genre:

Research Report > Experimental Study (0.70)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.88)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Kernel Adaptive Metropolis-Hastings

Sejdinovic, Dino, Strathmann, Heiko, Garcia, Maria Lomeli, Andrieu, Christophe, Gretton, Arthur

arXiv.org Machine LearningJun-12-2014

A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.

artificial intelligence, machine learning, proposal distribution, (12 more...)

arXiv.org Machine Learning

1307.5302

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Event and Anomaly Detection Using Tucker3 Decomposition

Fanaee-T, Hadi, Oliveira, Márcia D. B., Gama, João, Malinowski, Simon, Morla, Ricardo

arXiv.org Artificial IntelligenceJun-12-2014

Failure detection in telecommunication networks is a vital task. So far, several supervised and unsupervised solutions have been provided for discovering failures in such networks. Among them unsupervised approaches has attracted more attention since no label data is required. Often, network devices are not able to provide information about the type of failure. In such cases the type of failure is not known in advance and the unsupervised setting is more appropriate for diagnosis. Among unsupervised approaches, Principal Component Analysis (PCA) is a well-known solution which has been widely used in the anomaly detection literature and can be applied to matrix data (e.g. Users-Features). However, one of the important properties of network data is their temporal sequential nature. So considering the interaction of dimensions over a third dimension, such as time, may provide us better insights into the nature of network failures. In this paper we demonstrate the power of three-way analysis to detect events and anomalies in time-evolving network data.

data mining, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

1406.3266

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Input Warping for Bayesian Optimization of Non-stationary Functions

Snoek, Jasper, Swersky, Kevin, Zemel, Richard S., Adams, Ryan P.

arXiv.org Machine LearningJun-11-2014

Bayesian optimization has proven to be a highly effective methodology for the global optimization of unknown, expensive and multimodal functions. The ability to accurately model distributions over functions is critical to the effectiveness of Bayesian optimization. Although Gaussian processes provide a flexible prior over functions which can be queried efficiently, there are various classes of functions that remain difficult to model. One of the most frequently occurring of these is the class of non-stationary functions. The optimization of the hyperparameters of machine learning algorithms is a problem domain in which parameters are often manually transformed a priori, for example by optimizing in "log-space," to mitigate the effects of spatially-varying length scale. We develop a methodology for automatically learning a wide family of bijective transformations or warpings of the input space using the Beta cumulative distribution function. We further extend the warping framework to multi-task Bayesian optimization so that multiple tasks can be warped into a jointly stationary space. On a set of challenging benchmark optimization tasks, we observe that the inclusion of warping greatly improves on the state-of-the-art, producing better results faster and more reliably.

bayesian optimization, gaussian process, optimization, (11 more...)

arXiv.org Machine Learning

1402.0929

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Learning ELM network weights using linear discriminant analysis

de Chazal, Philip, Tapson, Jonathan, van Schaik, André

arXiv.org Machine LearningJun-11-2014

We present an alternative to the pseudo-inverse method for determining the hidden to output weight values for Extreme Learning Machines performing classification tasks. The method is based on linear discriminant analysis and provides Bayes optimal single point estimates for the weight values.

artificial intelligence, machine learning, mnist database, (15 more...)

arXiv.org Machine Learning

1406.31

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Discriminant Analysis (0.62)

Add feedback

Distributed Parameter Estimation in Probabilistic Graphical Models

Mizrahi, Yariv Dror, Denil, Misha, de Freitas, Nando

arXiv.org Machine LearningJun-11-2014

This paper presents foundational theoretical results on distributed parameter estimation for undirected probabilistic graphical models. It introduces a general condition on composite likelihood decompositions of these models which guarantees the global consistency of distributed estimators, provided the local estimators are consistent.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

1406.307

Country: Europe > United Kingdom > England (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)

Add feedback