AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Asymptotic Normality of Support Vector Machine Variants and Other Regularized Kernel Methods

arXiv.org Machine LearningApr-12-2011

In nonparametric classification and regression problems, regularized kernel methods, in particular support vector machines, attract much attention in theoretical and in applied statistics. In an abstract sense, regularized kernel methods (simply called SVMs here) can be seen as regularized M-estimators for a parameter in a (typically infinite dimensional) reproducing kernel Hilbert space. For smooth loss functions, it is shown that the difference between the estimator, i.e.\ the empirical SVM, and the theoretical SVM is asymptotically normal with rate $\sqrt{n}$. That is, the standardized difference converges weakly to a Gaussian process in the reproducing kernel Hilbert space. As common in real applications, the choice of the regularization parameter may depend on the data. The proof is done by an application of the functional delta-method and by showing that the SVM-functional is suitably Hadamard-differentiable.

artificial intelligence, loss function, machine learning, (15 more...)

arXiv.org Machine Learning

1010.0535

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression

Kakade, Sham, Kalai, Adam Tauman, Kanade, Varun, Shamir, Ohad

arXiv.org Artificial IntelligenceApr-11-2011

Generalized Linear Models (GLMs) and Single Index Models (SIMs) provide powerful generalizations of linear regression, where the target variable is assumed to be a (possibly unknown) 1-dimensional function of a linear predictor. In general, these problems entail non-convex estimation procedures, and, in practice, iterative local search heuristics are often used. Kalai and Sastry (2009) recently provided the first provably efficient method for learning SIMs and GLMs, under the assumptions that the data are in fact generated under a GLM and under certain monotonicity and Lipschitz constraints. However, to obtain provable performance, the method requires a fresh sample every iteration. In this paper, we provide algorithms for learning GLMs and SIMs, which are both computationally and statistically efficient. We also provide an empirical study, demonstrating their feasibility in practice.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1104.2018

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)

Add feedback

Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling

Duchi, John, Agarwal, Alekh, Wainwright, Martin

arXiv.org Machine LearningApr-10-2011

The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. It arises in various application domains, including distributed tracking and localization, multi-agent co-ordination, estimation in sensor networks, and large-scale optimization in machine learning. We develop and analyze distributed algorithms based on dual averaging of subgradients, and we provide sharp bounds on their convergence rates as a function of the network size and topology. Our method of analysis allows for a clear separation between the convergence of the optimization algorithm itself and the effects of communication constraints arising from the network structure. In particular, we show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. Our approach includes both the cases of deterministic optimization and communication, as well as problems with stochastic optimization and/or communication.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TAC.2011.2161027

1005.2012

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Efficient First Order Methods for Linear Composite Regularizers

Argyriou, Andreas, Micchelli, Charles A., Pontil, Massimiliano, Shen, Lixin, Xu, Yuesheng

arXiv.org Machine LearningApr-7-2011

A wide class of regularization problems in machine learning and statistics employ a regularization term which is obtained by composing a simple convex function \omega with a linear transformation. This setting includes Group Lasso methods, the Fused Lasso and other total variation methods, multi-task learning methods and many more. In this paper, we present a general approach for computing the proximity operator of this class of regularizers, under the assumption that the proximity operator of the function \omega is known in advance. Our approach builds on a recent line of research on optimal first order optimization methods and uses fixed point iterations for numerically computing the proximity operator. It is more general than current approaches and, as we show with numerical simulations, computationally more efficient than available first order methods which do not achieve the optimal rate. In particular, our method outperforms state of the art O(1/T) methods for overlapping Group Lasso and matches optimal O(1/T^2) methods for the Fused Lasso and tree structured Group Lasso.

artificial intelligence, machine learning, optimization problem, (16 more...)

arXiv.org Machine Learning

1104.1436

Country: North America > United States > New York (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Negative Example Aided Transcription Factor Binding Site Search

Lee, Chih, Huang, Chun-Hsi

arXiv.org Machine LearningApr-6-2011

Computational approaches to transcription factor binding site identification have been actively researched for the past decade. Negative examples have long been utilized in de novo motif discovery and have been shown useful in transcription factor binding site search as well. However, understanding of the roles of negative examples in binding site search is still very limited. We propose the 2-centroid and optimal discriminating vector methods, taking into account negative examples. Cross-validation results on E. coli transcription factors show that the proposed methods benefit from negative examples, outperforming the centroid and position-specific scoring matrix methods. We further show that our proposed methods perform better than a state-of-the-art method. We characterize the proposed methods in the context of the other compared methods and show that, coupled with motif subtype identification, the proposed methods can be effectively applied to a wide range of transcription factors. Finally, we argue that the proposed methods are well-suited for eukaryotic transcription factors as well. Software tools are available at: http://biogrid.engr.uconn.edu/tfbs_search/.

artificial intelligence, inductive learning, machine learning, (16 more...)

arXiv.org Machine Learning

1104.1234

Country: North America > United States > Connecticut (0.28)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
Water & Waste Management > Water Management > Constituents > Bacteria (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Robust Nonparametric Regression via Sparsity Control with Application to Load Curve Data Cleansing

Mateos, Gonzalo, Giannakis, Georgios B.

arXiv.org Machine LearningApr-3-2011

Nonparametric methods are widely applicable to statistical inference problems, since they rely on a few modeling assumptions. In this context, the fresh look advocated here permeates benefits from variable selection and compressive sampling, to robustify nonparametric regression against outliers - that is, data markedly deviating from the postulated models. A variational counterpart to least-trimmed squares regression is shown closely related to an L0-(pseudo)norm-regularized estimator, that encourages sparsity in a vector explicitly modeling the outliers. This connection suggests efficient solvers based on convex relaxation, which lead naturally to a variational M-type estimator equivalent to the least-absolute shrinkage and selection operator (Lasso). Outliers are identified by judiciously tuning regularization parameters, which amounts to controlling the sparsity of the outlier vector along the whole robustification path of Lasso solutions. Reduced bias and enhanced generalization capability are attractive features of an improved estimator obtained after replacing the L0-(pseudo)norm with a nonconvex surrogate. The novel robust spline-based smoother is adopted to cleanse load curve data, a key task aiding operational decisions in the envisioned smart grid system. Computer simulations and tests on real load curve data corroborate the effectiveness of the novel sparsity-controlling robust estimators.

data mining, data quality, machine learning, (20 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2011.2181837

1104.0455

Country: North America > United States > Minnesota (0.28)

Genre: Research Report (0.81)

Industry:

Government > Regional Government > North America Government > United States Government (0.93)
Energy > Power Industry (0.66)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)

Add feedback

Identifying Aspects for Web-Search Queries

Wu, F., Madhavan, J., Halevy, A.

Journal of Artificial Intelligence ResearchMar-31-2011

Many web-search queries serve as the beginning of an exploration of an unknown space of information, rather than looking for a specific web page. To answer such queries effec- tively, the search engine should attempt to organize the space of relevant information in a way that facilitates exploration. We describe the Aspector system that computes aspects for a given query. Each aspect is a set of search queries that together represent a distinct information need relevant to the original search query. To serve as an effective means to explore the space, Aspector computes aspects that are orthogonal to each other and to have high combined coverage. Aspector combines two sources of information to compute aspects. We discover candidate aspects by analyzing query logs, and cluster them to eliminate redundancies. We then use a mass-collaboration knowledge base (e.g., Wikipedia) to compute candidate aspects for queries that occur less frequently and to group together aspects that are likely to be semantically related. We present a user study that indicates that the aspects we compute are rated favorably against three competing alternatives related searches proposed by Google, cluster labels assigned by the Clusty search engine, and navigational searches proposed by Bing.

aspector, information, query, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3182

AI Access Foundation

10699

Journal of Artificial Intelligence Research

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.14)
Asia > Cambodia (0.05)
Asia > Laos (0.05)
(11 more...)

Genre: Research Report > Experimental Study (0.68)

Industry: Leisure & Entertainment > Sports (0.47)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Auto-associative models, nonlinear Principal component analysis, manifolds and projection pursuit

Girard, Stéphane, Iovleff, Serge

arXiv.org Machine LearningMar-31-2011

In this paper, auto-associative models are proposed as candidates to the generalization of Principal Component Analysis. We show that these models are dedicated to the approximation of the dataset by a manifold. Here, the word "manifold" refers to the topology properties of the structure. The approximating manifold is built by a projection pursuit algorithm. At each step of the algorithm, the dimension of the manifold is incremented. Some theoretical properties are provided. In particular, we can show that, at each step of the algorithm, the mean residuals norm is not increased. Moreover, it is also established that the algorithm converges in a finite number of steps. Some particular auto-associative models are exhibited and compared to the classical PCA and some neural networks models. Implementation aspects are discussed. We show that, in numerous cases, no optimization procedure is required. Some illustrations on simulated and real data are presented.

artificial intelligence, auto-associative model, machine learning, (15 more...)

arXiv.org Machine Learning

1103.6119

Country:

North America > United States (0.93)
Europe (0.68)

Genre:

Workflow (0.54)
Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.61)

Add feedback

Regularizers for Structured Sparsity

Micchelli, Charles A., Morales, Jean M., Pontil, Massimiliano

arXiv.org Machine LearningMar-30-2011

We study the problem of learning a sparse linear regression vector under additional conditions on the structure of its sparsity pattern. This problem is relevant in machine learning, statistics and signal processing. It is well known that a linear regression can benefit from knowledge that the underlying regression vector is sparse. The combinatorial problem of selecting the nonzero components of this vector can be "relaxed" by regularizing the squared error with a convex penalty function like the $\ell_1$ norm. However, in many applications, additional conditions on the structure of the regression vector and its sparsity pattern are available. Incorporating this information into the learning method may lead to a significant decrease of the estimation error. In this paper, we present a family of convex penalty functions, which encode prior knowledge on the structure of the vector formed by the absolute values of the regression coefficients. This family subsumes the $\ell_1$ norm and is flexible enough to include different models of sparsity patterns, which are of practical and theoretical importance. We establish the basic properties of these penalty functions and discuss some examples where they can be computed explicitly. Moreover, we present a convergent optimization algorithm for solving regularized least squares with these penalty functions. Numerical simulations highlight the benefit of structured sparsity and the advantage offered by our approach over the Lasso method and other related methods.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1010.0556

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.74)

Add feedback

From Sparse Signals to Sparse Residuals for Robust Sensing

Kekatos, Vassilis, Giannakis, Georgios B.

arXiv.org Machine LearningMar-27-2011

Recent advances in sensor technology have made it feasible to deploy a network of inexpensive sensors for carrying out synergistically even sophisticated inference tasks. In applications such as environmental monitoring, surveillance of critical infrastructure, agriculture, or medical imaging, the typical concept of operation involves a large and possibly heterogeneous set of sensors locally observing the signal of interest, and transmitting their measurements to a higher-layer agent (fusion center). This so-termed layered sensing apparatus entails three operational conditions: (c1) Each node's measurement vector comprising either a collection of scalar observations across time, or a snapshot of different sensor readings, is typically assumed to be linearly related to the unknown variable(s). Such a linear model can arise when the sensing system is viewed as a linear filter with known impulse response. Even when the underlying model is nonlinear, the observations are approximately modeled as adhering to a (multivariate) linear regression; (c2) Either because readings are costly to sense and transmit, due to delay or stationarity constraints, or simply because dimensionality reduction is invoked to cope with the "curse of dimensionality," the linear model is oftentimes under-determined, i.e., the dimension of the unknown vector is larger than that of each sensor's vector observation; and (c3) Not all sensors are reliable because failures in the sensing devices, fades of the sensor-agent communication link, physical obstruction of the scene of interest, and (un)intentional interference, all can severely deteriorate the consistency and reliability of sensor data.

artificial intelligence, machine learning, sensor, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2011.2141661

1011.045

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback