AITopics

0809.4530

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > Middlesex County > London (0.14)
Oceania > New Zealand > North Island > Waikato (0.04)
(31 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Media > Film (0.92)
Leisure & Entertainment > Sports (0.92)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(7 more...)

Tsatsaronis, George, Halkidi, Maria, Giakoumakis, Emmanouel A.

Quality Classifiers for Open Source Software Repositories

arXiv.org Artificial IntelligenceApr-29-2009

Initial open source software (OSS) projects rely on large repositories for hosting and distribution until they become independent. A huge amount of project metadata is collected and maintained in such software repositories providing useful information about projects and their success. In this paper we propose a data mining approach that processes the metadata contained in such OSS repositories. The proposed approach aims at the construction of a classifier that is trained on the metadata of existing projects and predicts the successful continuation of any given OSS. The successfulness of a project is defined with regard to the confidence level of the classifier which predicts that this project will be ported in widely used OSS projects (e.g.

classifier, data mining, machine learning, (15 more...)

0904.4708

Genre: Research Report (0.82)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Journal of Artificial Intelligence ResearchApr-24-2009

Sentence Compression as Tree Transduction

Cohn, T. A., Lapata, M.

This paper presents a tree-to-tree transduction method for sentence compression. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture structural mismatches. We describe an algorithm for decoding in this framework and show how the model can be trained discriminatively within a large margin framework. Experimental results on sentence compression bring significant improvements over a state-of-the-art model.

compression, derivation, source tree, (14 more...)

doi: 10.1613/jair.2655

10600

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(18 more...)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Gabrilovich, E., Markovitch, S.

Wikipedia-based Semantic Interpretation for Natural Language Processing

Journal of Artificial Intelligence ResearchMar-30-2009

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

category, proceedings, text categorization, (15 more...)

doi: 10.1613/jair.2669

10595

Country:

Asia > Middle East > Iraq (0.28)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
South America > Brazil (0.04)
(21 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Information Technology (1.00)
(14 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Journal of Artificial Intelligence ResearchMar-27-2009

Identification of Pleonastic It Using the Web

Li, Y., Musilek, P., Reformat, M., Wyard-Scott, L.

In a significant minority of cases, certain pronouns, especially the pronoun it, can be used without referring to any specific entity. This phenomenon of pleonastic pronoun usage poses serious problems for systems aiming at even a shallow understanding of natural language texts. In this paper, a novel approach is proposed to identify such uses of it: the extrapositional cases are identified using a series of queries against the web, and the cleft cases are identified using a simple set of syntactic rules. The system is evaluated with four sets of news articles containing 679 extrapositional cases as well as 78 cleft constructs. The identification results are comparable to those obtained by human efforts.

category, extraposition, query, (17 more...)

doi: 10.1613/jair.2622

10593

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York (0.04)
(14 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Law (0.67)
Government > Regional Government (0.45)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(3 more...)

Journal of Artificial Intelligence ResearchMar-24-2009

Unsupervised Methods for Determining Object and Relation Synonyms on the Web

Yates, A., Etzioni, O.

The task of identifying synonymous relations and objects, or synonym resolution, is critical for high-quality information extraction. This paper investigates synonym resolution in the context of unsupervised information extraction, where neither hand-tagged training examples nor domain knowledge is available. The paper presents a scalable, fully-implemented system that runs in O(KN log N) time in the number of extractions, N, and the maximum number of synonyms per word, K. The system, called Resolver , introduces a probabilistic relational model for predicting whether two strings are co-referential based on the similarity of the assertions containing them. On a set of two million assertions extracted from the Web, Resolver resolves objects with 78% precision and 68% recall, and resolves relations with 90% precision and 35% recall. Several variations of resolver's probabilistic model are explored, and experiments demonstrate that under appropriate conditions these variations can improve F1 by 5%. An extension to the basic Resolver system allows it to handle polysemous names with 97% precision and 95% recall on a data set from the TREC corpus.

extraction, relation, resolver, (17 more...)

doi: 10.1613/jair.2772

10591

Country:

North America > United States > Virginia (0.04)
North America > United States > West Virginia (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre:

Overview (0.85)
Research Report > New Finding (0.67)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Shen, Chunhua, Paisitkriangkrai, Sakrapee, Zhang, Jian

Efficiently Learning a Detection Cascade with Sparse Eigenvectors

arXiv.org Artificial IntelligenceMar-18-2009

In this work, we first show that feature selection methods other than boosting can also be used for training an efficient object detector. In particular, we introduce Greedy Sparse Linear Discriminant Analysis (GSLDA) \cite{Moghaddam2007Fast} for its conceptual simplicity and computational efficiency; and slightly better detection performance is achieved compared with \cite{Viola2004Robust}. Moreover, we propose a new technique, termed Boosted Greedy Sparse Linear Discriminant Analysis (BGSLDA), to efficiently train a detection cascade. BGSLDA exploits the sample re-weighting property of boosting and the class-separability criterion of GSLDA.

bgslda, classifier, weak classifier, (14 more...)

0903.3103

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Australian Capital Territory > Canberra (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Journal of Artificial Intelligence ResearchMar-13-2009

Behavior Bounding: An Efficient Method for High-Level Behavior Comparison

Wallace, S. A.

In this paper, we explore methods for comparing agent behavior with human behavior to assist with validation. Our exploration begins by considering a simple method of behavior comparison. Motivated by shortcomings in this initial approach, we introduce behavior bounding, an automated model-based approach for comparing behavior that is inspired, in part, by Mitchell's Version Spaces. We show that behavior bounding can be used to compactly represent both human and agent behavior. We argue that relatively low amounts of human effort are required to build, maintain, and use the data structures that underlie behavior bounding, and we provide a theoretical basis for these arguments using notions of PAC Learnability. Next, we show empirical results indicating that this approach is effective at identifying differences in certain types of behaviors and that it performs well when compared against our initial benchmark methods. Finally, we demonstrate that behavior bounding can produce information that allows developers to identify and fix problems in an agent's behavior much more efficiently than standard debugging techniques.

agent, behavior trace, representation, (16 more...)

doi: 10.1613/jair.2646

10589

Country:

North America > United States > Washington > Clark County > Vancouver (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Austria (0.04)

Genre:

Research Report > Experimental Study (0.45)
Research Report > New Finding (0.45)

Industry:

Health & Medicine (0.92)
Education > Educational Technology > Educational Software > Computer Based Training (0.67)
Government > Military > Air Force (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Liu, Han, Lafferty, John, Wasserman, Larry

The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs

arXiv.org Machine LearningMar-3-2009

Recent methods for estimating sparse undirected graphs for real-valued data in high dimensional problems rely heavily on the assumption of normality. We show how to use a semiparametric Gaussian copula--or "nonparanormal"--for high dimensional inference. Just as additive models extend linear models by replacing linear functions with a set of one-dimensional smooth functions, the nonparanormal extends the normal by transforming the variables by smooth functions. We derive a method for estimating the nonparanormal, study the method's theoretical properties, and show that it works well in many examples.

artificial intelligence, logn, machine learning, (17 more...)

arXiv.org Machine Learning

0903.0649

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Mansour, Yishay, Mohri, Mehryar, Rostamizadeh, Afshin

Domain Adaptation: Learning Bounds and Algorithms

arXiv.org Artificial IntelligenceFeb-23-2009

This paper addresses the general problem of domain adaptation which arises in a variety of applications where the distribution of the labeled sample available somewhat differs from that of the test data. Building on previous work by Ben-David et al. (2007), we introduce a novel distance between distributions, discrepancy distance, that is tailored to adaptation problems with arbitrary loss functions. We give Rademacher complexity bounds for estimating the discrepancy distance from finite samples for different loss functions. Using this distance, we derive novel generalization bounds for domain adaptation for a wide family of loss functions. We also present a series of novel adaptation bounds for large classes of regularization-based algorithms, including support vector machines and kernel ridge regression based on the empirical discrepancy. This motivates our analysis of the problem of minimizing the empirical discrepancy for various loss functions for which we also give novel algorithms. We report the results of preliminary experiments that demonstrate the benefits of our discrepancy minimization algorithms for domain adaptation.

artificial intelligence, machine learning, natural language, (17 more...)

0902.3430

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
(2 more...)