AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

A Minimum Description Length Approach to Multitask Feature Selection

arXiv.org Artificial IntelligenceMay-29-2009

Many regression problems involve not one but several response variables (y's). Often the responses are suspected to share a common underlying structure, in which case it may be advantageous to share information across them; this is known as multitask learning. As a special case, we can use multiple responses to better identify shared predictive features -- a project we might call multitask feature selection. This thesis is organized as follows. Section 1 introduces feature selection for regression, focusing on ell_0 regularization methods and their interpretation within a Minimum Description Length (MDL) framework. Section 2 proposes a novel extension of MDL feature selection to the multitask setting. The approach, called the "Multiple Inclusion Criterion" (MIC), is designed to borrow information across regression tasks by more easily selecting features that are associated with multiple responses. We show in experiments on synthetic and real biological data sets that MIC can reduce prediction error in settings where features are at least partially shared across responses. Section 3 surveys hypothesis testing by regression with a single response, focusing on the parallel between the standard Bonferroni correction and an MDL approach. Mirroring the ideas in Section 2, Section 4 proposes a novel MIC approach to hypothesis testing with multiple responses and shows that on synthetic data with significant sharing of features across responses, MIC sometimes outperforms standard FDR-controlling methods in terms of finding true positives for a given level of false positives. Section 5 concludes.

artificial intelligence, coefficient, machine learning, (19 more...)

arXiv.org Artificial Intelligence

0906.0052

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology (0.45)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Advanced Measures for Empirical Testing

Baumeister, Joachim (University of Würzburg)

AAAI ConferencesMay-21-2009

Empirical testing is a very popular evaluation method for the development of intelligent systems. Here, previously solved problems with correct solutions are given as cases to the system. Validity is tested by comparing the expected results with the derived solutions. Besides classic forms of boolean testing of occurring solutions more refined methods are required for a thorough evaluation of real world knowledge systems. We present extended precision and recall functions for interactive knowledge systems that are generalizations of the existing measures. Additionally, we propose a visualization method for inspecting the validation result for interactive systems. A case study with a second-opinion system from the medical domain demonstrates the usefulness of the approach.

exp rs, precision, test case, (16 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country: Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

Confidence-based Tuning of Nomogram Predictions

Mancill, Tony (Washington State University Vancouver) | Wallace, Scott A (Washington State University Vancouver)

AAAI ConferencesMay-21-2009

Instance classification using machine learning techniques has numerous applications, from automation to medical diagnosis. In many problem domains, such as spam filtering, classification must be performed quickly across large datasets. In this paper we begin with machine learning techniques based on the naive Bayes classification and attempt to improve classification performance by taking into account attribute confidence intervals. Our prediction functions operate over nominal datasets and retain the asymptotic complexity of one-pass learning and prediction functions. We present preliminary results indicating a modest, albeit inconsistent improvement over the naive Bayes classifier alone.

dataset, prediction, predictor, (13 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > Washington > Clark County > Vancouver (0.15)
South America > Paraguay > Asunción > Asunción (0.05)
North America > United States > New York > New York County > New York City (0.05)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

VipBoost: A More Accurate Boosting Algorithm

Su, Xiaoyuan (Florida Atlantic University) | Khoshgoftaar, Taghi M | Greiner, Russell

AAAI ConferencesMay-21-2009

Boosting is a well-known method for improving the accuracy of many learning algorithms. In this paper, we propose a novel boosting algorithm, VipBoost (voting on boosting classifications from imputed learning sets), which first generates multiple incomplete datasets from the original dataset by randomly removing a small percentage of observed attribute values, then uses an imputer to fill in the missing values. It then applies AdaBoost (using some base learner) to produce classifiers trained on each of the imputed learning sets, to produce multiple classifiers. The subsequent prediction on a new test case is the most frequent classification from these classifiers. Our empirical results show that VipBoost produces very effective classifiers that significantly improve accuracy for unstable base learners and some stable learners, especially when the initial dataset is incomplete.

classification accuracy, classifier, dataset, (13 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Florida > Palm Beach County > Boca Raton (0.05)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Add feedback

Multivariate Time Series Classification with Temporal Abstractions

Batal, Iyad (University of Pittsburgh) | Sacchi, Lucia (University of Pavia) | Bellazzi, Riccardo (University of Pavia) | Hauskrecht, Milos (University of Pittsburgh)

AAAI ConferencesMay-21-2009

The increase in the number of complex temporal datasets collected today has prompted the development of methods that extend classical machine learning and data mining methods to time-series data. This work focuses on methods for multivariate time-series classification. Time series classification is a challenging problem mostly because the number of temporal features that describe the data and are potentially useful for classification is enormous. We study and develop a temporal abstraction framework for generating multivariate time series features suitable for classification tasks. We propose the STF-Mine algorithm that automatically mines discriminative temporal abstraction patterns from the time series data and uses them to learn a classification model. Our experimental evaluations, carried out on both synthetic and real world medical data, demonstrate the benefit of our approach in learning accurate classifiers for time-series datasets.

abstraction, relation, temporal pattern, (14 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
Europe > Italy (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)
(2 more...)

Add feedback

Testing Analogical Proportions with Google using Kolmogorov Information Theory

Prade, Henri (Institut de Recherche en Informatique de Toulouse) | Richard, Gilles (British Institute of Technology and E-Commerce)

AAAI ConferencesMay-21-2009

Analogical reasoning is considered as one of the main mechanisms underlying creativity. "Thinking out of the box" allows the paradigm shift essential to a creative process. More common is the concept of analogical proportion ("2 is to 4 as 4 is to 8") which can be described within an algebraic framework. When it comes to concepts ("engine is to the car as heart is to the human"), we need to investigate a new way to understand this analogical ratio. In this paper, we take inspiration from the formal framework of information theory for proposing a new approach to the evaluation of analogy between concepts. Using Kolmogorov complexity as a backbone providing a clear semantics, we give a practical interpretation for analogy between words viewed as labeling concepts. Making use of Google as a linguistic resource, we provide an implementation of our definitions: experiments show that the accuracy of our definition is quite acceptable and justify the approach.

analogical proportion, analogy, proportion, (15 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

Europe > Germany (0.05)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Mapping Grounded Object Properties across Perceptually Heterogeneous Embodiments

Kira, Zsolt (Georgia Institute of Technology)

AAAI ConferencesMay-21-2009

As robots become more common, it becomes increasingly useful for them to communicate and effectively share knowledge that they have learned through their individual experiences. Learning from experiences, however, is often-times embodiment-specific; that is, the knowledge learned is grounded in the robot’s unique sensors and actuators. This type of learning raises questions as to how communication and knowledge exchange via social interaction can occur, as properties of the world can be grounded differently in different robots. This is especially true when the robots are heterogeneous, with different sensors and perceptual features used to define the properties. In this paper, we present methods and representations that allow heterogeneous robots to learn grounded property representations, such as that of color categories, and then build models of their similarities and differences in order to map their respective representations. We use a conceptual space representation, where object properties are learned and represented as regions in a metric space, implemented via supervised learning of Gaussian Mixture Models. We then propose to use confusion matrices that are built using instances from each robot, obtained in a shared context, in order to learn mappings between the properties of each robot. Results are demonstrated using two perceptually heterogeneous Pioneer robots, one with a web camera and another with a camcorder.

confusion matrix, representation, robot, (14 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
Europe > Sweden (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

Mining Meaning from Wikipedia

Medelyan, Olena, Milne, David, Legg, Catherine, Witten, Ian H.

arXiv.org Artificial IntelligenceMay-9-2009

Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.

relation, wikipedia, wikipedia article, (17 more...)

arXiv.org Artificial Intelligence

0809.4530

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > Middlesex County > London (0.14)
Oceania > New Zealand > North Island > Waikato (0.04)
(31 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Media > Film (0.92)
Leisure & Entertainment > Sports (0.92)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(7 more...)

Add feedback

Quality Classifiers for Open Source Software Repositories

Tsatsaronis, George, Halkidi, Maria, Giakoumakis, Emmanouel A.

arXiv.org Artificial IntelligenceApr-29-2009

Initial open source software (OSS) projects rely on large repositories for hosting and distribution until they become independent. A huge amount of project metadata is collected and maintained in such software repositories providing useful information about projects and their success. In this paper we propose a data mining approach that processes the metadata contained in such OSS repositories. The proposed approach aims at the construction of a classifier that is trained on the metadata of existing projects and predicts the successful continuation of any given OSS. The successfulness of a project is defined with regard to the confidence level of the classifier which predicts that this project will be ported in widely used OSS projects (e.g.

classifier, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

0904.4708

Genre: Research Report (0.82)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Sentence Compression as Tree Transduction

Cohn, T. A., Lapata, M.

Journal of Artificial Intelligence ResearchApr-24-2009

This paper presents a tree-to-tree transduction method for sentence compression. Our model is based on synchronous tree substitution grammar, a formalism that allows local distortion of the tree topology and can thus naturally capture structural mismatches. We describe an algorithm for decoding in this framework and show how the model can be trained discriminatively within a large margin framework. Experimental results on sentence compression bring significant improvements over a state-of-the-art model.

compression, derivation, source tree, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2655

AI Access Foundation

10600

Journal of Artificial Intelligence Research

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(18 more...)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback