AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Combination Strategies for Semantic Role Labeling

Surdeanu, M., Marquez, L., Carreras, X., Comas, P. R.

Journal of Artificial Intelligence ResearchJun-14-2007

This paper introduces and analyzes a battery of inference models for the problem of semantic role labeling: one based on constraint satisfaction, and several strategies that model the inference as a meta-learning problem using discriminative classifiers. These classifiers are developed with a rich set of novel features that encode proposition and sentence-level information. To our knowledge, this is the first work that: (a) performs a thorough analysis of learning-based inference models for semantic role labeling, and (b) compares several inference strategies in this context. We evaluate the proposed inference strategies in the framework of the CoNLL-2005 shared task using only automatically-generated syntactic information. The extensive experimental evaluation and analysis indicates that all the proposed inference strategies are successful -they all outperform the current best results reported in the CoNLL-2005 evaluation exercise- but each of the proposed approaches has its advantages and disadvantages. Several important traits of a state-of-the-art SRL combination strategy emerge from this analysis: (i) individual models should be combined at the granularity of candidate arguments rather than at the granularity of complete solutions; (ii) the best combination strategy uses an inference model based in learning; and (iii) the learning-based inference benefits from max-margin classifiers and global feedback.

argument, candidate argument, predicate, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2088

AI Access Foundation

10500

Journal of Artificial Intelligence Research

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(3 more...)

Add feedback

A Novel Model of Working Set Selection for SMO Decomposition Methods

Zhao, Zhendong, Yuan, Lei, Wang, Yuxuan, Bao, Forrest Sheng, Sun, Shunyi Zhang Yanfei

arXiv.org Artificial IntelligenceJun-5-2007

In the process of training Support Vector Machines (SVMs) by decomposition methods, working set selection is an important technique, and some exciting schemes were employed into this field. To improve working set selection, we propose a new model for working set selection in sequential minimal optimization (SMO) decomposition methods. In this model, it selects B as working set without reselection. Some properties are given by simple proof, and experiments demonstrate that the proposed method is in general faster than existing methods.

artificial intelligence, machine learning, wss 3, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICTAI.2007.99

0706.0585

Country:

Asia > China (0.15)
North America (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.91)

Add feedback

Mixed membership stochastic blockmodels

Airoldi, Edoardo M, Blei, David M, Fienberg, Stephen E, Xing, Eric P

arXiv.org Machine LearningMay-30-2007

Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

0705.4485

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

The Google Similarity Distance

Cilibrasi, Rudi, Vitanyi, Paul M. B.

arXiv.org Artificial IntelligenceMay-30-2007

Words and phrases acquire meaning from the way they are used in society, from their relative semantics to other words and phrases. For computers the equivalent of `society' is `database,' and the equivalent of `use' is `way to search the database.' We present a new theory of similarity between words and phrases based on information distance and Kolmogorov complexity. To fix thoughts we use the world-wide-web as database, and Google as search engine. The method is also applicable to other search engines and databases. This theory is then applied to construct a method to automatically extract similarity, the Google similarity distance, of words and phrases from the world-wide-web using Google page counts. The world-wide-web is the largest database on earth, and the context information entered by millions of independent users averages out to provide automatic semantics of useful quality. We give applications in hierarchical clustering, classification, and language translation. We give examples to distinguish between colors and numbers, cluster names of paintings by 17th century Dutch masters and names of books by English novelists, the ability to understand emergencies, and primes, and we demonstrate the ability to do a simple automatic English-Spanish translation. Finally, we use the WordNet database as an objective baseline against which to judge the performance of our method. We conduct a massive randomized trial in binary classification using support vector machines to learn categories based on our Google distance, resulting in an a mean agreement of 87% with the expert crafted WordNet categories.

machine learning, natural language, search term, (20 more...)

arXiv.org Artificial Intelligence

cs/0412098

Country:

North America > United States (0.46)
Europe > Netherlands (0.28)
North America > Canada > Alberta (0.28)

Genre:

Research Report > Experimental Study (0.48)
Research Report > Strength High (0.34)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

Truecluster matching

Oehlschlägel, Jens

arXiv.org Artificial IntelligenceMay-29-2007

Cluster matching by permuting cluster labels is important in many clustering contexts such as cluster validation and cluster ensemble techniques. The classic approach is to minimize the euclidean distance between two cluster solutions which induces inappropriate stability in certain settings. Therefore, we present the truematch algorithm that introduces two improvements best explained in the crisp case. First, instead of maximizing the trace of the cluster crosstable, we propose to maximize a chi-square transformation of this crosstable. Thus, the trace will not be dominated by the cells with the largest counts but by the cells with the most non-random observations, taking into account the marginals. Second, we suggest a probabilistic component in order to break ties and to make the matching algorithm truly random on random data. The truematch algorithm is designed as a building block of the truecluster framework and scales in polynomial time. First simulation results confirm that the truematch algorithm gives more consistent truecluster results for unequal cluster sizes. Free R software is available.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

0705.4302

Country: Europe (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Add feedback

Truecluster: robust scalable clustering with model selection

Oehlschlägel, Jens

arXiv.org Artificial IntelligenceMay-28-2007

Data-based classification is fundamental to most branches of science. While recent years have brought enormous progress in various areas of statistical computing and clustering, some general challenges in clustering remain: model selection, robustness, and scalability to large datasets. We consider the important problem of deciding on the optimal number of clusters, given an arbitrary definition of space and clusteriness. We show how to construct a cluster information criterion that allows objective model selection. Differing from other approaches, our truecluster method does not require specific assumptions about underlying distributions, dissimilarity definitions or cluster models. Truecluster puts arbitrary clustering algorithms into a generic unified (sampling-based) statistical framework. It is scalable to big datasets and provides robust cluster assignments and case-wise diagnostics. Truecluster will make clustering more objective, allows for automation, and will save time and costs. Free R software is available.

artificial intelligence, base cluster algorithm, machine learning, (15 more...)

arXiv.org Artificial Intelligence

cs/0601001

Country:

North America > United States (0.28)
Europe > Austria (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Undercomplete Blind Subspace Deconvolution

Szabo, Zoltan, Poczos, Barnabas, Lorincz, Andras

arXiv.org Machine LearningMay-20-2007

We introduce the blind subspace deconvolution (BSSD) problem, which is the extension of both the blind source deconvolution (BSD) and the independent subspace analysis (ISA) tasks. We examine the case of the undercomplete BSSD (uBSSD). Applying temporal concatenation we reduce this problem to ISA. The associated `high dimensional' ISA problem can be handled by a recent technique called joint f-decorrelation (JFD). Similar decorrelation methods have been used previously for kernel independent component analysis (kernel-ICA). More precisely, the kernel canonical correlation (KCCA) technique is a member of this family, and, as is shown in this paper, the kernel generalized variance (KGV) method can also be seen as a decorrelation method in the feature space. These kernel based algorithms will be adapted to the ISA task. In the numerical examples, we (i) examine how efficiently the emerging higher dimensional ISA tasks can be tackled, and (ii) explore the working and advantages of the derived kernel-ISA methods.

artificial intelligence, isa task, machine learning, (17 more...)

arXiv.org Machine Learning

math/0701210

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Networks (0.93)

Add feedback

Lasso type classifiers with a reject option

Wegkamp, Marten

arXiv.org Machine LearningMay-16-2007

A discriminant function f: X R yields a classifier sgn(f(x)) { 1, 1} that represents our guess of the label Y of a future observation X and we err if the margin y · f(x) 0. Since observations x for which the conditional probability η(x) P{Y 1 X x} (1) is close to 1/2 are difficult to classify, we introduce a reject option for classifiers, by allowing for a third decision, R (reject), expressing doubt. We built in the reject option by using a threshold value 0 τ 1 as follows. Given a discriminant function f: X R, we report sgn(f(x)) { 1,1} if f(x) τ, but we withhold decision if f(x) τ and report R. We assume that the cost of making a wrong decision is 1 and the cost of utilizing the reject option is d 0. The appropriate risk function is then E[l(Yf(X))] P{Yf(X) τ} dP{ Yf(X) τ} (2) Research is supported in part by NSF grant DMS 0706829 155 M. Wegkamp/Lasso type classifiers with a reject option 156

artificial intelligence, bayesian inference, machine learning, (13 more...)

arXiv.org Machine Learning

doi: 10.1214/07-EJS058

0705.2363

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

The Parameter-Less Self-Organizing Map algorithm

Berglund, Erik, Sitte, Joaquin

arXiv.org Artificial IntelligenceMay-7-2007

The Parameter-Less Self-Organizing Map (PLSOM) is a new neural network algorithm based on the Self-Organizing Map (SOM). It eliminates the need for a learning rate and annealing schemes for learning rate and neighbourhood size. We discuss the relative performance of the PLSOM and the SOM and demonstrate some tasks in which the SOM fails but the PLSOM performs satisfactory. Finally we discuss some example applications of the PLSOM and present a proof of ordering under certain limited conditions.

artificial intelligence, machine learning, plsom, (19 more...)

arXiv.org Artificial Intelligence

0705.0199

Country:

North America > United States (0.46)
Oceania > Australia (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Support vector machine for functional data classification

Rossi, Fabrice, Villa, Nathalie

arXiv.org Machine LearningMay-2-2007

In many applications, input data are sampled functions taking their values in infinite dimensional spaces rather than standard vectors. This fact has complex consequences on data analysis algorithms that motivate modifications of them. In fact most of the traditional data analysis tools for regression, classification and clustering have been adapted to functional inputs under the general name of functional Data Analysis (FDA). In this paper, we investigate the use of Support Vector Machines (SVMs) for functional data analysis and we focus on the problem of curves discrimination. SVMs are large margin classifier tools based on implicit non linear mappings of the considered data into high dimensional spaces thanks to kernels. We show how to define simple kernels that take into account the unctional nature of the data and lead to consistent classification. Experiments conducted on real world data emphasize the benefit of taking into account some functional aspects of the problems.

hilbert space, kernel, svm, (14 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neucom.2005.12.010

0705.0209

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Austria > Vienna (0.14)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.78)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback