AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Streamed Learning: One-Pass SVMs

Rai, Piyush, Daumé, Hal III, Venkatasubramanian, Suresh

arXiv.org Machine LearningAug-4-2009

We present a streaming model for large-scale classification (in the context of $\ell_2$-SVM) by leveraging connections between learning and computational geometry. The streaming model imposes the constraint that only a single pass over the data is allowed. The $\ell_2$-SVM is known to have an equivalent formulation in terms of the minimum enclosing ball (MEB) problem, and an efficient algorithm based on the idea of \emph{core sets} exists (Core Vector Machine, CVM). CVM learns a $(1+\varepsilon)$-approximate MEB for a set of points and yields an approximate solution to corresponding SVM instance. However CVM works in batch mode requiring multiple passes over the data. This paper presents a single-pass SVM which is based on the minimum enclosing ball of streaming data. We show that the MEB updates for the streaming case can be easily adapted to learn the SVM weight vector in a way similar to using online stochastic gradient updates. Our algorithm performs polylogarithmic computation at each example, and requires very small and constant storage. Experimental results show that, even in such restrictive settings, we can learn efficiently in just one pass and get accuracies comparable to other state-of-the-art SVM solvers (batch and online). We also give an analysis of the algorithm, and discuss some open issues and possible extensions.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

0908.0572

Country: North America > United States (0.49)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

The Infinite Hierarchical Factor Regression Model

Rai, Piyush, Daumé, Hal III

arXiv.org Machine LearningAug-4-2009

We propose a nonparametric Bayesian factor regression model that accounts for uncertainty in the number of factors, and the relationship between factors. To accomplish this, we propose a sparse variant of the Indian Buffet Process and couple this with a hierarchical model over factors, based on Kingman's coalescent. We apply this model to two problems (factor analysis and factor regression) in gene-expression data analysis.

artificial intelligence, machine learning, matrix, (13 more...)

arXiv.org Machine Learning

0908.0570

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area (0.50)
Health & Medicine > Pharmaceuticals & Biotechnology (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.85)

Add feedback

How the initialization affects the stability of the k-means algorithm

Bubeck, Sebastien, Meila, Marina, von Luxburg, Ulrike

arXiv.org Machine LearningJul-31-2009

We investigate the role of the initialization for the stability of the k-means clustering algorithm. As opposed to other papers, we consider the actual k-means algorithm and do not ignore its property of getting stuck in local optima. We are interested in the actual clustering, not only in the costs of the solution. We analyze when different initializations lead to the same local optimum, and when they lead to different local optima. This enables us to prove that it is reasonable to select the number of clusters based on stability scores.

algorithm, initialization, k-means algorithm, (16 more...)

arXiv.org Machine Learning

0907.5494

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > United States (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Add feedback

A Unified Semi-Supervised Dimensionality Reduction Framework for Manifold Learning

Chatpatanasiri, Ratthachat, Kijsirikul, Boonserm

arXiv.org Artificial IntelligenceJul-29-2009

We present a general framework of semi-supervised dimensionality reduction for manifold learning which naturally generalizes existing supervised and unsupervised learning frameworks which apply the spectral decomposition. Algorithms derived under our framework are able to employ both labeled and unlabeled examples and are able to handle complex problems where data form separate clusters of manifolds. Our framework offers simple views, explains relationships among existing frameworks and provides further extensions which can improve existing algorithms. Furthermore, a new semi-supervised kernelization framework called ``KPCA trick'' is proposed to handle non-linear problems.

algorithm, artificial intelligence, machine learning, (14 more...)

arXiv.org Artificial Intelligence

0804.0924

Country: North America > United States (0.71)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine > Public Health (0.30)
Health & Medicine > Government Relations & Public Policy (0.30)
Government > Regional Government > North America Government > United States Government > FDA (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.64)

Add feedback

Restart Strategy Selection using Machine Learning Techniques

Haim, Shai, Walsh, Toby

arXiv.org Artificial IntelligenceJul-28-2009

Restart strategies are an important factor in the performance of conflict-driven Davis Putnam style SAT solvers. Selecting a good restart strategy for a problem instance can enhance the performance of a solver. Inspired by recent success applying machine learning techniques to predict the runtime of SAT solvers, we present a method which uses machine learning to boost solver performance through a smart selection of the restart strategy. Based on easy to compute features, we train both a satisfiability classifier and runtime models. We use these models to choose between restart strategies. We present experimental results comparing this technique with the most commonly used restart strategies. Our results demonstrate that machine learning is effective in improving solver performance.

artificial intelligence, machine learning, restart strategy, (13 more...)

arXiv.org Artificial Intelligence

0907.5032

Country: Oceania > Australia (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Empirical Bernstein Bounds and Sample Variance Penalization

Maurer, Andreas, Pontil, Massimiliano

arXiv.org Machine LearningJul-21-2009

W e give improved constants for data dependent and variance sensitive confidence bounds, called empirical Bernstein bounds, and extend these inequalities to hold uniformly over classes of functions whose growth function is polynomial in the sample size n . The bounds lead us to consider sample variance penalization, a novel learning method which takes into account the empirical variance of the loss function. W e give conditions under which sample variance penalization is effective. In particular, we present a bound on the excess risk incurred by the method. Using this, we argue that there are situations in which the excess risk of our method is of order 1 /n, while the excess risk of empirical risk minimization is of order 1 / n . W e show some experimental results, which confirm the theory. Finally, we discuss the potential application of our results to sample compression schemes.

artificial intelligence, hypothesis, machine learning, (14 more...)

arXiv.org Machine Learning

0907.3740

Country:

North America > United States > California (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Inter Genre Similarity Modelling For Automatic Music Genre Classification

Bagci, Ulas, Erzin, Engin

arXiv.org Machine LearningJul-18-2009

Music genre classification is an essential tool for music information retrieval systems and it has been finding critical applications in various media platforms. Two important problems of the automatic music genre classification are feature extraction and classifier design. This paper investigates inter-genre similarity modelling (IGS) to improve the performance of automatic music genre classification. Inter-genre similarity information is extracted over the mis-classified feature population. Once the inter-genre similarity is modelled, elimination of the inter-genre similarity reduces the inter-genre confusion and improves the identification rates. Inter-genre similarity modelling is further improved with iterative IGS modelling(IIGS) and score modelling for IGS elimination(SMIGS). Experimental results with promising classification improvements are provided.

genre classification, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

0907.3220

Country: Asia > Middle East > Republic of Türkiye (0.14)

Genre: Research Report (1.00)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.90)

Add feedback

Online Learning of Spacecraft Simulation Models

Thomas, Justin R. (United Space Alliance) | Eick, Christoph F. (University of Houston)

AAAI ConferencesJul-14-2009

Spacecraft simulation is an integral part of NASA mission planning, real-time mission support, training, and systems engineering. Existing approaches that power these simulations cannot quickly react to the dynamic and complex behavior of the International Space Station (ISS). To address this problem, this paper introduces a unique and efficient method for continuously learning highly accurate models from real-time streaming sensor data, relying on an online learning approach. This approach revolutionizes NASA simulation techniques for space missions by providing models that quickly adapt to real-world feedback without human intervention. A novel regional sliding-window technique for online learning of simulation models is proposed that regionally maintains the most recent data. We also explore a knowledge fusion approach to reduce predictive error spikes when confronted with making predictions in situations that are quite different from training scenarios. We demonstrate substantial error reductions up to 74% in our experimental evaluation on the ISS Electrical Power System and discuss the early deployment of our software in the ISS Mission Control Center (MCC) for ground-based simulations.

artificial intelligence, machine learning, training dataset, (17 more...)

AAAI Conferences

Twenty-First IAAI Conference

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.46)

Industry:

Government > Space Agency (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.67)

Add feedback

Not So Naive Online Bayesian Spam Filter

Su, Baojun (Zhejiang University) | Xu, Congfu (Zhejiang University)

AAAI ConferencesJul-14-2009

Spam filtering, as a key problem in electronic communication, has drawn significant attention due to increasingly huge amounts of junk email on the Internet. Content-based filtering is one reliable method in combating with spammers' changing tactics. Naive Bayes (NB) is one of the earliest content-based machine learning methods both in theory and practice in combating with spammers, which is easy to implement while can achieve considerable accuracy. In this paper, the traditional online Bayesian classifier are enhanced by two ways. First, from theory's point of view, we devise a self-adaptive mechanism to gradually weaken the assumption of independence required by original NB in the online training process, and as a result of that our NSNB is no longer ``naive''. Second, we propose other engineering ways to make the classifier more robust and accuracy. The experiment results show that our NSNB does give state-of-the-art classification performance on online spam filtering on large benchmark data sets while it is extremely fast and takes up little memory in comparison with other statistical methods.

machine learning, spam, spam filtering, (18 more...)

AAAI Conferences

Twenty-First IAAI Conference

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
North America (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Security & Privacy > Spam Filtering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Real-time Automatic Price Prediction for eBay Online Trading

Raykhel, Ilya (Brigham Young University) | Ventura, Dan (Brigham Young University)

AAAI ConferencesJul-14-2009

We develop a system for attribute-based prediction of final (online) auction pricing, focusing on the eBay laptop category. The system implements a feature-weighted k -NN algorithm, using evolutionary computation to determine feature weights, with prior trades used as training data. The resulting average prediction error is 16%. Mostly automatic trading using the system greatly reduces the time a reseller needs to spend on trading activities, since the bulk of market research is now done automatically with the help of the learned model. The result is a 562% increase in trading efficiency (measured as profit/hour).

artificial intelligence, laptop, machine learning, (16 more...)

AAAI Conferences

Twenty-First IAAI Conference

Industry:

Information Technology > Services (1.00)
Banking & Finance > Trading (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback