AITopics | arXiv.org Machine Learning

Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts.

artificial intelligence, social media, vertex, (20 more...)

arXiv.org Machine Learning

0801.0848

Country:

Europe (1.00)
North America > United States > New Jersey (0.14)

Genre:

Instructional Material (0.46)
Research Report (0.40)

Industry: Information Technology > Services (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Kernels and Ensembles: Perspectives on Statistical Learning

Zhu, Mu

arXiv.org Machine LearningDec-6-2007

Since their emergence in the 1990's, the support vector machine and the AdaBoost algorithm have spawned a wave of research in statistical machine learning. Much of this new research falls into one of two broad categories: kernel methods and ensemble methods. In this expository article, I discuss the main ideas behind these two types of methods, namely how to transform linear algorithms into nonlinear ones by using kernel functions, and how to make predictions with an ensemble or a collection of models rather than a single model. I also share my personal perspectives on how these ideas have influenced and shaped my own research. In particular, I present two recent algorithms that I have invented with my collaborators: LAGO, a fast kernel algorithm for unbalanced classification and rare target detection; and Darwinian evolution in parallel universes, an ensemble method for variable selection.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1198/000313008X306367

0712.1027

Country:

Oceania > Australia (0.46)
North America > United States (0.28)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning

Catoni, Olivier

arXiv.org Machine LearningDec-3-2007

This monograph deals with adaptive supervised classification, using tools borrowed from statistical mechanics and information theory, stemming from the PACBayesian approach pioneered by David McAllester and applied to a conception of statistical learning theory forged by Vladimir Vapnik. Using convex analysis on the set of posterior probability measures, we show how to get local measures of the complexity of the classification model involving the relative entropy of posterior distributions with respect to Gibbs posterior measures. We then discuss relative bounds, comparing the generalization error of two classification rules, showing how the margin assumption of Mammen and Tsybakov can be replaced with some empirical measure of the covariance structure of the classification model.We show how to associate to any posterior distribution an effective temperature relating it to the Gibbs prior distribution with the same level of expected error rate, and how to estimate this effective temperature from data, resulting in an estimator whose expected error rate converges according to the best possible power of the sample size adaptively under any margin and parametric complexity assumptions. We describe and study an alternative selection scheme based on relative bounds between estimators, and present a two step localization technique which can handle the selection of a parametric model from a family of those. We show how to extend systematically all the results obtained in the inductive setting to transductive learning, and use this to improve Vapnik's generalization bounds, extending them to the case when the sample is made of independent non-identically distributed pairs of patterns and labels. Finally we review briefly the construction of Support Vector Machines and show how to derive generalization bounds for them, measuring the complexity either through the number of support vectors or through the value of the transductive or inductive margin.

artificial intelligence, bayesian inference, exp, (19 more...)

arXiv.org Machine Learning

doi: 10.1214/074921707000000391

0712.0248

Country: North America > United States > California > Santa Cruz County > Santa Cruz (0.13)

Genre: Research Report > New Finding (0.45)

Add feedback

A Method for Compressing Parameters in Bayesian Models with Application to Logistic Sequence Prediction Models

Li, Longhai, Neal, Radford M.

arXiv.org Machine LearningNov-30-2007

Bayesian classification and regression with high order interactions is largely infeasible because Markov chain Monte Carlo (MCMC) would need to be applied with a great many parameters, whose number increases rapidly with the order. In this paper we show how to make it feasible by effectively reducing the number of parameters, exploiting the fact that many interactions have the same values for all training cases. Our method uses a single ``compressed'' parameter to represent the sum of all parameters associated with a set of patterns that have the same value for all training cases. Using symmetric stable distributions as the priors of the original parameters, we can easily find the priors of these compressed parameters. We therefore need to deal only with a much smaller number of compressed parameters when training the model with MCMC. The number of compressed parameters may have converged before considering the highest possible order. After training the model, we can split these compressed parameters into the original ones as needed to make predictions for test cases. We show in detail how to compress parameters for logistic sequence prediction models. Experiments on both simulated and real data demonstrate that a huge number of parameters can indeed be reduced by our compression method.

acf 0, bayesian inference, lag 5, (19 more...)

arXiv.org Machine Learning

doi: 10.1214/08-BA330

0711.4983

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Add feedback

Getting started in probabilistic graphical models

Airoldi, Edoardo M

arXiv.org Machine LearningNov-10-2007

Probabilistic graphical models (PGMs) have become a popular tool for computational analysis of biological data in a variety of domains. But, what exactly are they and how do they work? How can we use PGMs to discover patterns that are biologically relevant? And to what extent can PGMs help us formulate new hypotheses that are testable at the bench? This note sketches out some answers and illustrates the main ideas behind the statistical approach to biological pattern discovery.

bayesian inference, functional process, graphical models, (20 more...)

arXiv.org Machine Learning

doi: 10.1371/journal.pcbi.0030252

0706.2040

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)

Add feedback

Supervised Machine Learning with a Novel Pointwise Density Estimator

Oyang, Yen-Jen, Chen, Chien-Yu, Chang, Darby Tien-Hao, Wu, Chih-Peng

arXiv.org Machine LearningNov-6-2007

This article proposes a novel density estimation based algorithm for carrying out supervised machine learning. The proposed algorithm features O(n) time complexity for generating a classifier, where n is the number of sampling instances in the training dataset. This feature is highly desirable in contemporary applications that involve large and still growing databases. In comparison with the kernel density estimation based approaches, the mathe-matical fundamental behind the proposed algorithm is not based on the assump-tion that the number of training instances approaches infinite. As a result, a classifier generated with the proposed algorithm may deliver higher prediction accuracy than the kernel density estimation based classifier in some cases.

artificial intelligence, inductive learning, supervised machine learning, (10 more...)

arXiv.org Machine Learning

0710.5896

Country: Asia > Taiwan (0.19)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.79)

Add feedback

Simultaneous adaptation to the margin and to complexity in classification

Lecué, Guillaume

arXiv.org Machine LearningOct-19-2007

We consider the problem of adaptation to the margin and to complexity in binary classification. We suggest an exponential weighting aggregation scheme. We use this aggregation procedure to construct classifiers which adapt automatically to margin and complexity. Two main examples are worked out in which adaptivity is achieved in frameworks proposed by Steinwart and Scovel [Learning Theory. Lecture Notes in Comput. Sci. 3559 (2005) 279--294. Springer, Berlin; Ann. Statist. 35 (2007) 575--607] and Tsybakov [Ann. Statist. 32 (2004) 135--166]. Adaptive schemes, like ERM or penalized ERM, usually involve a minimization step. This is not the case for our procedure.

artificial intelligence, classifier, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1214/009053607000000055

math/0509696

Country: North America > United States > Wisconsin (0.14)

Genre:

Instructional Material > Course Syllabus & Notes (0.75)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Supervised Machine Learning with a Novel Kernel Density Estimator

Oyang, Yen-Jen, Chang, Darby Tien-Hao, Ou, Yu-Yen, Hung, Hao-Geng, Wu, Chih-Peng, Chen, Chien-Yu

arXiv.org Machine LearningOct-16-2007

In recent years, kernel density estimation has been exploited by computer scientists to model machine learning problems. The kernel density estimation based approaches are of interest due to the low time complexity of either O(n) or O(n*log(n)) for constructing a classifier, where n is the number of sampling instances. Concerning design of kernel density estimators, one essential issue is how fast the pointwise mean square error (MSE) and/or the integrated mean square error (IMSE) diminish as the number of sampling instances increases. In this article, it is shown that with the proposed kernel function it is feasible to make the pointwise MSE of the density estimator converge at O(n^-2/3) regardless of the dimension of the vector space, provided that the probability density function at the point of interest meets certain conditions.

artificial intelligence, kernel density estimator, supervised machine learning, (9 more...)

arXiv.org Machine Learning

0709.2760

Country: North America > United States > Nevada (0.14)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.67)

Add feedback

Semantic distillation: a method for clustering objects by their contextual specificity

Sierocinski, Thomas, Béchec, Anthony Le, Théret, Nathalie, Petritis, Dimitri

arXiv.org Machine LearningOct-6-2007

Techniques for data-mining, latent semantic analysis, contextual search of databases, etc. have long ago been developed by computer scientists working on information retrieval (IR). Experimental scientists, from all disciplines, having to analyse large collections of raw experimental data (astronomical, physical, biological, etc.) have developed powerful methods for their statistical analysis and for clustering, categorising, and classifying objects. Finally, physicists have developed a theory of quantum measurement, unifying the logical, algebraic, and probabilistic aspects of queries into a single formalism. The purpose of this paper is twofold: first to show that when formulated at an abstract level, problems from IR, from statistical data analysis, and from physical measurement theories are very similar and hence can profitably be cross-fertilised, and, secondly, to propose a novel method of fuzzy hierarchical clustering, termed \textit{semantic distillation} -- strongly inspired from the theory of quantum measurement --, we developed to analyse raw data coming from various types of experiments on DNA arrays. We illustrate the method by analysing DNA arrays experiments and clustering the genes of the array according to their specificity.

fuzzy logic, graph, health & medicine, (21 more...)

arXiv.org Machine Learning

0710.1203

Country: North America > United States (0.28)

Genre: Research Report (0.70)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.48)

Add feedback

Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

Lecchini-Visintini, A., Lygeros, J., Maciejowski, J.

arXiv.org Machine LearningSep-19-2007

Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete combinatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated annealing which allows one to guarantee finite-time performance in the optimization of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.

global optimizer, health & medicine, optimization problem, (18 more...)

arXiv.org Machine Learning

0709.2989

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Filters

arXiv.org Machine Learning

Batch kernel SOM and related Laplacian methods for social network analysis

Kernels and Ensembles: Perspectives on Statistical Learning

Pac-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning

A Method for Compressing Parameters in Bayesian Models with Application to Logistic Sequence Prediction Models

Getting started in probabilistic graphical models

Supervised Machine Learning with a Novel Pointwise Density Estimator

Simultaneous adaptation to the margin and to complexity in classification

Supervised Machine Learning with a Novel Kernel Density Estimator

Semantic distillation: a method for clustering objects by their contextual specificity

Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains