AITopics | Accuracy

Collaborating Authors

Accuracy

News Overviews Instructional Materials AI-Alerts Classics

Student Speech Act Classification Using Machine Learning

Rasor, Travis (University of Memphis) | Olney, Andrew ( University of Memphis ) | D' ( University of Memphis ) | Mello, Sidney

AAAI ConferencesMay-18-2011

Dialogue-based intelligent tutoring systems use speech act classifiers to categorize student input into answers, questions, and other speech acts. Previous work has primarily focused on question classification. In this paper, we present a complimentary speech act classifier that focuses primarily on non-questions, which was developed using machine learning techniques. Our results show that an effective speech act classifier can be developed directly from labeled data using decision trees.

classification, classifier, dialogue act, (15 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > Tennessee > Shelby County > Memphis (0.04)
North America > United States > New York (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

How Many Software Metrics Should be Selected for Defect Prediction?

Wang, Huanjing (Western Kentucky University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Seliya, Naeem (University of Michigan, Dearborn)

AAAI ConferencesMay-18-2011

A software practitioner is interested in the solution to “for a given project, what is the minimum number of software metrics that should be considered for building an effective defect prediction model?” During the development life cycle various software metrics are collected for different reasons. In the case of a metricsbased defect prediction model, an intelligent selection of software metrics prior to building defect predictors is likely to improve model performance. This study utilizes the proposed threshold-based feature selection technique to remove irrelevant and redundant software metrics (a.k.a. features or attributes). A comparative investigation is presented for evaluating the size of the selected feature subsets. The case study is based on software measurement data obtained from a real-world project, and the defect predictors are trained using three commonly used classifiers. The empirical case study results demonstrate that an effective defect predictor can be built with only three metrics; and moreover, model performances improved when over 98.5% of the software metrics were eliminated.

classification model, classifier, software metric, (13 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > Michigan > Wayne County > Dearborn (0.04)
North America > United States > Kentucky (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Feature Level Sensor Fusion for Improved Fault Detection in MCM Systems for Ocean Turbines

Duhaney, Janell (Florida Atlantic University) | Khoshgoftaar, Taghi M. (Florida Atlantic University) | Sloan, John C. (Florida Atlantic University)

AAAI ConferencesMay-18-2011

This paper investigates feature level fusion for enhancing fault detection from vibration signals in an ocean turbine. Changes in vibration signatures from such rotating machinery typically indicate the presence of a problem such as a shift in its orientation or mechanical impact from its environment. We applied feature level fusion to vibration data acquired from two accelerometers attached to a box fan, and then assessed the abilities of twelve well known machine learners to detect changes in state from the raw accelerometer data and from the fused data. Analysis of the performance of these classifiers showed an overall performance improvement in all twelve classifiers in detecting the state of the fan from the fused data versus from the data from the two individual sensor channels.

fusion, level fusion, turbine, (16 more...)

AAAI Conferences

Twenty-Fourth International FLAIRS Conference

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(3 more...)

Genre:

Research Report (0.69)
Overview (0.54)

Industry: Energy > Renewable (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.71)
(3 more...)

Add feedback

Feature Selection for MAUC-Oriented Classification Systems

Wang, Rui, Tang, Ke

arXiv.org Artificial IntelligenceMay-15-2011

Feature selection is an important pre-processing step for many pattern classification tasks. Traditionally, feature selection methods are designed to obtain a feature subset that can lead to high classification accuracy. However, classification accuracy has recently been shown to be an inappropriate performance metric of classification systems in many cases. Instead, the Area Under the receiver operating characteristic Curve (AUC) and its multi-class extension, MAUC, have been proved to be better alternatives. Hence, the target of classification system design is gradually shifting from seeking a system with the maximum classification accuracy to obtaining a system with the maximum AUC/MAUC. Previous investigations have shown that traditional feature selection methods need to be modified to cope with this new objective. These methods most often are restricted to binary classification problems only. In this study, a filter feature selection method, namely MAUC Decomposition based Feature Selection (MDFS), is proposed for multi-class classification problems. To the best of our knowledge, MDFS is the first method specifically designed to select features for building classification systems with maximum MAUC. Extensive empirical results demonstrate the advantage of MDFS over several compared feature selection methods.

artificial intelligence, feature selection method, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2012.01.013

1105.2943

Country: North America > United States (0.94)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Evaluating the diagnostic powers of variables and their linear combinations when the gold standard is continuous

Wang, Zhanfeng, Chang, Yuan-chin Ivan

arXiv.org Machine LearningMay-8-2011

The receiver operating characteristic (ROC) curve is a very useful tool for analyzing the diagnostic/classification power of instruments/classification schemes as long as a binary-scale gold standard is available. When the gold standard is continuous and there is no confirmative threshold, ROC curve becomes less useful. Hence, there are several extensions proposed for evaluating the diagnostic potential of variables of interest. However, due to the computational difficulties of these nonparametric based extensions, they are not easy to be used for finding the optimal combination of variables to improve the individual diagnostic power. Therefore, we propose a new measure, which extends the AUC index for identifying variables with good potential to be used in a diagnostic scheme. In addition, we propose a threshold gradient descent based algorithm for finding the best linear combination of variables that maximizes this new measure, which is applicable even when the number of variables is huge. The estimate of the proposed index and its asymptotic property are studied. The performance of the proposed method is illustrated using both synthesized and real data sets.

artificial intelligence, gold standard, machine learning, (18 more...)

arXiv.org Machine Learning

1105.1575

Country: Asia (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.67)
Health & Medicine > Diagnostic Medicine (0.67)
Health & Medicine > Therapeutic Area > Endocrinology (0.48)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Notes on a New Philosophy of Empirical Science

Burfoot, Daniel

arXiv.org Machine LearningApr-28-2011

This book presents a methodology and philosophy of empirical science based on large scale lossless data compression. In this view a theory is scientific if it can be used to build a data compression program, and it is valuable if it can compress a standard benchmark database to a small size, taking into account the length of the compressor itself. This methodology therefore includes an Occam principle as well as a solution to the problem of demarcation. Because of the fundamental difficulty of lossless compression, this type of research must be empirical in nature: compression can only be achieved by discovering and characterizing empirical regularities in the data. Because of this, the philosophy provides a way to reformulate fields such as computer vision and computational linguistics as empirical sciences: the former by attempting to compress databases of natural images, the latter by attempting to compress large text databases. The book argues that the rigor and objectivity of the compression principle should set the stage for systematic progress in these fields. The argument is especially strong in the context of computer vision, which is plagued by chronic problems of evaluation. The book also considers the field of machine learning. Here the traditional approach requires that the models proposed to solve learning problems be extremely simple, in order to avoid overfitting. However, the world may contain intrinsically complex phenomena, which would require complex models to understand. The compression philosophy can justify complex models because of the large quantity of data being modeled (if the target database is 100 Gb, it is easy to justify a 10 Mb model). The complex models and abstractions learned on the basis of the raw data (images, language, etc) can then be reused to solve any specific learning problem, such as face recognition or machine translation.

artificial intelligence, machine learning, natural language, (25 more...)

arXiv.org Machine Learning

1104.5466

Country:

Europe (0.67)
Asia (0.67)
North America > United States > New York (0.45)

Genre:

Summary/Review (1.00)
Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (1.00)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Media (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
(11 more...)

Add feedback

An expert system for detecting automobile insurance fraud using social network analysis

Šubelj, Lovro, Furlan, Štefan, Bajec, Marko

arXiv.org Artificial IntelligenceApr-19-2011

The article proposes an expert system for detection, and subsequent investigation, of groups of collaborating automobile insurance fraudsters. The system is described and examined in great detail, several technical difficulties in detecting fraud are also considered, for it to be applicable in practice. Opposed to many other approaches, the system uses networks for representation of data. Networks are the most natural representation of such a relational domain, allowing formulation and analysis of complex relations between entities. Fraudulent entities are found by employing a novel assessment algorithm, Iterative Assessment Algorithm (IAA), also presented in the article. Besides intrinsic attributes of entities, the algorithm explores also the relations between entities. The prototype was evaluated and rigorously analyzed on real world data. Results show that automobile insurance fraud can be efficiently detected with the proposed system and that appropriate data representation is vital. Key words: Fraud detection, Automobile insurance, Social network analysis, Link analysis, Assessment propagation 1. Introduction Fraud is encountered in a variety of domains. It comes in all different shapes and sizes, from traditional fraud, e.g. Such groups can be found in the automobile insurance domain. Here fraudsters stage traffic accidents and issue fake insurance claims to gain (unjustified) funds from their general or vehicle insurance. There are also cases where an accident has never occurred, and the vehicles have only been placed onto the road. Still, the majority of such fraud is not planned (opportunistic fraud) - an individual only seizes the opportunity arising from the accident and issues exaggerated insurance claims or claims for past damages. Staged accidents have several common characteristics. They occur in late hours and non-urban areas in order to reduce the probability of witnesses. Drivers are usually younger males, there are many passengers in the vehicles, but never children or elders. The police is always called to the scene to make the subsequent acquisition of means easier. It is also not uncommon that all of the participants have multiple (serious) injuries, when there is almost no damage on the vehicles. Many other suspicious characteristics exist, not mentioned here.

data mining, detection, machine learning, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.eswa.2010.07.143

1104.3904

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.87)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Law Enforcement & Public Safety > Fraud (1.00)
Banking & Finance > Insurance (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Automatic Seizure Detection in an In-Vivo Model of Epilepsy

Saulnier, Guillaume (McGill University) | Pineau, Joelle (McGill University)

AAAI ConferencesMar-19-2011

The goal of our research is to find patterns of EEG activity that will allow us to correctly identify seizures in living rats using machine learning techniques. Features are extracted from the EEG to characterize the signal over time. We perform model selection to reduce the set of features, as the goal is to have the algorithm running on a small personal device. The chosen features are used within a supervised classifier, based on randomized forests, in order to separate the different brain states. One of the challenges of this research is to detect all seizures, while preserving a low false positive rate, and low detection latency. We present results showing we can achieve this using data from three separate animals. The long-term goal of this research is to use this seizure detection method as part of a closed-loop adaptive neuro-stimulation device to reduce the incidence and duration of seizures.

classifier, seizure, seizure detection, (12 more...)

AAAI Conferences

2011 AAAI Spring Symposium Series

Country: North America > Canada > Quebec > Montreal (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Stochastic Stepwise Ensembles for Variable Selection

Xin, Lu, Zhu, Mu

arXiv.org Machine LearningMar-2-2011

The ensemble approach for statistical modelling was first made popular by such algorithms as boosting (Freund and Schapire 1996; Friedman et al. 2000), bagging (Breiman 1996), random forest (Breiman 2001), and the gradient boosting machine (Friedman 2001). They are powerful algorithms for solving prediction problems. This article is concerned with using the ensemble approach for a different problem, variable selection. We shall use the terms "prediction ensemble" and "variableselection ensemble" to differentiate ensembles used for these different purposes.

artificial intelligence, machine learning, selection, (18 more...)

arXiv.org Machine Learning

1003.593

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Inferring Disease and Gene Set Associations with Rank Coherence in Networks

Hwang, TaeHyun, Zhang, Wei, Xie, Maoqiang, Kuang, Rui

arXiv.org Artificial IntelligenceFeb-18-2011

A computational challenge to validate the candidate disease genes identified in a high-throughput genomic study is to elucidate the associations between the set of candidate genes and disease phenotypes. The conventional gene set enrichment analysis often fails to reveal associations between disease phenotypes and the gene sets with a short list of poorly annotated genes, because the existing annotations of disease causative genes are incomplete. We propose a network-based computational approach called rcNet to discover the associations between gene sets and disease phenotypes. Assuming coherent associations between the genes ranked by their relevance to the query gene set, and the disease phenotypes ranked by their relevance to the hidden target disease phenotypes of the query gene set, we formulate a learning framework maximizing the rank coherence with respect to the known disease phenotype-gene associations. An efficient algorithm coupling ridge regression with label propagation, and two variants are introduced to find the optimal solution of the framework. We evaluated the rcNet algorithms and existing baseline methods with both leave-one-out cross-validation and a task of predicting recently discovered disease-gene associations in OMIM. The experiments demonstrated that the rcNet algorithms achieved the best overall rankings compared to the baselines. To further validate the reproducibility of the performance, we applied the algorithms to identify the target diseases of novel candidate disease genes obtained from recent studies of GWAS, DNA copy number variation analysis, and gene expression profiling. The algorithms ranked the target disease of the candidate genes at the top of the rank list in many cases across all the three case studies. The rcNet algorithms are available as a webtool for disease and gene set association analysis at http://compbio.cs.umn.edu/dgsa_rcNet.

artificial intelligence, machine learning, phenotype, (15 more...)

arXiv.org Artificial Intelligence

1102.3919

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Genetic Disease (0.93)
Health & Medicine > Therapeutic Area > Gastroenterology (0.68)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback