AITopics | Performance Analysis

We tackle the problem of multi-class relational sequence learning using relevant patterns discovered from a set of labelled sequences. To deal with this problem, firstly each relational sequence is mapped into a feature vector using the result of a feature construction method. Since, the efficacy of sequence learning algorithms strongly depends on the features used to represent the sequences, the second step is to find an optimal subset of the constructed features leading to high classification accuracy. This feature selection task has been solved adopting a wrapper approach that uses a stochastic local search algorithm embedding a naive Bayes classifier. The performance of the proposed method applied to a real-world dataset shows an improvement when compared to other established methods, such as hidden Markov models, Fisher kernels and conditional random fields for relational sequences.

artificial intelligence, machine learning, sequence, (16 more...)

arXiv.org Artificial Intelligence

1006.5188

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Middle East > Malta > Port Region > Southern Harbour District > Floriana (0.05)
Europe > Italy > Apulia > Bari (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Computing p-values of LiNGAM outputs via Multiscale Bootstrap

Komatsu, Yusuke, Shimizu, Shohei, Shimodaira, Hidetoshi

arXiv.org Machine LearningJun-22-2010

Structural equation models and Bayesian networks have been widely used to study causal relationships between continuous variables. Recently, a non-Gaussian method called LiNGAM was proposed to discover such causal models and has been extended in various directions. An important problem with LiNGAM is that the results are affected by the random sampling of the data as with any statistical method. Thus, some analysis of the statistical reliability or confidence level should be conducted. A common method to evaluate a confidence level is a bootstrap method. However, a confidence level computed by ordinary bootstrap method is known to be biased as a probability-value ($p$-value) of hypothesis testing. In this paper, we propose a new procedure to apply an advanced bootstrap method called multiscale bootstrap to compute confidence levels, i.e., p-values, of LiNGAM outputs. The multiscale bootstrap method gives unbiased $p$-values with asymptotic much higher accuracy. Experiments on artificial data demonstrate the utility of our approach.

artificial intelligence, bayesian inference, bootstrap, (17 more...)

arXiv.org Machine Learning

0909.2904

Country:

Asia > Japan > Honshū (0.15)
North America > United States (0.14)

Genre: Research Report > Experimental Study (0.68)

Industry: Energy > Oil & Gas (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback

Detecting Anomalous Process Behaviour using Second Generation Artificial Immune Systems

Twycross, Jamie, Aickelin, Uwe, Whitbrook, Amanda

arXiv.org Artificial IntelligenceJun-18-2010

Artificial Immune Systems have been successfully applied to a number of problem domains including fault tolerance and data mining, but have been shown to scale poorly when applied to computer intrusion detec- tion despite the fact that the biological immune system is a very effective anomaly detector. This may be because AIS algorithms have previously been based on the adaptive immune system and biologically-naive mod- els. This paper focuses on describing and testing a more complex and biologically-authentic AIS model, inspired by the interactions between the innate and adaptive immune systems. Its performance on a realistic process anomaly detection problem is shown to be better than standard AIS methods (negative-selection), policy-based anomaly detection methods (systrace), and an alternative innate AIS approach (the DCA). In addition, it is shown that runtime information can be used in combination with system call information to enhance detection capability.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1006.3654

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
North America > United States > New Mexico (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

An Immuno-Inspired Approach to Misbehavior Detection in Ad Hoc Wireless Networks

Drozda, Martin, Schildt, Sebastian, Schaust, Sven, Szczerbicka, Helena

arXiv.org Artificial IntelligenceJun-17-2010

We propose and evaluate an immuno-inspired approach to misbehavior detection in ad hoc wireless networks. Node misbehavior can be the result of an intrusion, or a software or hardware failure. Our approach is motivated by co-stimulatory signals present in the Biological immune system. The results show that co-stimulation in ad hoc wireless networks can both substantially improve energy efficiency of detection and, at the same time, help achieve low false positives rates. The energy efficiency improvement is almost two orders of magnitude, if compared to misbehavior detection based on watchdogs. We provide a characterization of the trade-offs between detection approaches executed by a single node and by several nodes in cooperation. Additionally, we investigate several feature sets for misbehavior detection. These feature sets impose different requirements on the detection system, most notably from the energy efficiency point of view.

artificial intelligence, data packet, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1001.3113

Country:

North America > United States > New Mexico (0.04)
Europe > Germany > Lower Saxony > Hanover (0.04)

Genre: Research Report > New Finding (0.88)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.89)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

The DCA:SOMe Comparison A comparative study between two biologically-inspired algorithms

Greensmith, Julie, Feyereisl, Jan, Aickelin, Uwe

arXiv.org Artificial IntelligenceJun-8-2010

The Dendritic Cell Algorithm (DCA) is an immune-inspired algorithm, developed for the purpose of anomaly detection. The algorithm performs multi-sensor data fusion and correlation which results in a 'context aware' detection system. Previous applications of the DCA have included the detection of potentially malicious port scanning activity, where it has produced high rates of true positives and low rates of false positives. In this work we aim to compare the performance of the DCA and of a Self-Organizing Map (SOM) when applied to the detection of SYN port scans, through experimental analysis. A SOM is an ideal candidate for comparison as it shares similarities with the DCA in terms of the data fusion method employed. It is shown that the results of the two systems are comparable, and both produce false positives for the same processes. This shows that the DCA can produce anomaly detection results to the same standard as an established technique.

data mining, evolutionary algorithm, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1006.1518

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
(6 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.92)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.96)
(2 more...)

Add feedback

Rasch-based high-dimensionality data reduction and class prediction with applications to microarray gene expression data

Kastrin, Andrej, Peterlin, Borut

arXiv.org Artificial IntelligenceJun-5-2010

Class prediction is an important application of microarray gene expression data analysis. The high-dimensionality of microarray data, where number of genes (variables) is very large compared to the number of samples (obser- vations), makes the application of many prediction techniques (e.g., logistic regression, discriminant analysis) difficult. An efficient way to solve this prob- lem is by using dimension reduction statistical techniques. Increasingly used in psychology-related applications, Rasch model (RM) provides an appealing framework for handling high-dimensional microarray data. In this paper, we study the potential of RM-based modeling in dimensionality reduction with binarized microarray gene expression data and investigate its prediction ac- curacy in the context of class prediction using linear discriminant analysis. Two different publicly available microarray data sets are used to illustrate a general framework of the approach. Performance of the proposed method is assessed by re-randomization scheme using principal component analysis (PCA) as a benchmark method. Our results show that RM-based dimension reduction is as effective as PCA-based dimension reduction. The method is general and can be applied to the other high-dimensional data problems.

artificial intelligence, bioinformatics, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1006.103

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.47)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Chi-square-based scoring function for categorization of MEDLINE citations

Kastrin, Andrej, Peterlin, Borut, Hristovski, Dimitar

arXiv.org Machine LearningJun-5-2010

Objectives: Text categorization has been used in biomedical informatics for identifying documents containing relevant topics of interest. We developed a simple method that uses a chi-square-based scoring function to determine the likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our procedure requires construction of a genetic and a nongenetic domain document corpus. We used MeSH descriptors assigned to MEDLINE citations for this categorization task. We compared frequencies of MeSH descriptors between two corpora applying chi-square test. A MeSH descriptor was considered to be a positive indicator if its relative observed frequency in the genetic domain corpus was greater than its relative observed frequency in the nongenetic domain corpus. The output of the proposed method is a list of scores for all the citations, with the highest score given to those citations containing MeSH descriptors typical for the genetic domain. Results: Validation was done on a set of 734 manually annotated MEDLINE citations. It achieved predictive accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method by comparing it to three machine learning algorithms (support vector machines, decision trees, na\"ive Bayes). Although the differences were not statistically significantly different, results showed that our chi-square scoring performs as good as compared machine learning algorithms. Conclusions: We suggest that the chi-square scoring is an effective solution to help categorize MEDLINE citations. The algorithm is implemented in the BITOLA literature-based discovery support system as a preprocessor for gene symbol disambiguation process.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1006.1029

Country:

North America > United States (1.00)
Europe > North Macedonia > Pelagonia Statistical Region > Bitola Municipality > Bitola (0.25)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Reconstruction of Causal Networks by Set Covering

Fyson, Nick, De Bie, Tijl, Cristianini, Nello

arXiv.org Machine LearningJun-4-2010

We present a method for the reconstruction of networks, based on the order of nodes visited by a stochastic branching process. Our algorithm reconstructs a network of minimal size that ensures consistency with the data. Crucially, we show that global consistency with the data can be achieved through purely local considerations, inferring the neighbourhood of each node in turn. The optimisation problem solved for each individual node can be reduced to a Set Covering Problem, which is known to be NP-hard but can be approximated well in practice. We then extend our approach to account for noisy data, based on the Minimum Description Length principle. We demonstrate our algorithms on synthetic data, generated by an SIR-like epidemiological model.

artificial intelligence, machine learning, node, (17 more...)

arXiv.org Machine Learning

1006.0849

Country: Europe (0.46)

Genre: Research Report (0.82)

Industry: Health & Medicine > Epidemiology (0.34)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.43)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory > Minimum Complexity Machines (0.35)

Add feedback

A Survey of Paraphrasing and Textual Entailment Methods

Androutsopoulos, Ion, Malakasiotis, Prodromos

arXiv.org Artificial IntelligenceMay-30-2010

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1613/jair.2985

0912.3747

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(35 more...)

Genre:

Overview (0.69)
Research Report (0.63)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Law > Litigation (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

A Survey of Paraphrasing and Textual Entailment Methods

Androutsopoulos, I., Malakasiotis, P.

Journal of Artificial Intelligence ResearchMay-28-2010

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.

expression, proc, translation, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2985

AI Access Foundation

10651

Journal of Artificial Intelligence Research

Country: