AITopics | Performance Analysis

As an immune-inspired algorithm, the Dendritic Cell Algorithm (DCA), produces promising performances in the field of anomaly detection. This paper presents the application of the DCA to a standard data set, the KDD 99 data set. The results of different implementation versions of the DXA, including the antigen multiplier and moving time windows are reported. The real-valued Negative Selection Algorithm (NSA) using constant-sized detectors and the C4.5 decision tree algorithm are used, to conduct a baseline comparison. The results suggest that the DCA is applicable to KDD 99 data set, and the antigen multiplier and moving time windows have the same effect on the DCA for this particular data set. The real-valued NSA with constant-sized detectors is not applicable to the data set, and the C4.5 decision tree algorithm provides a benchmark of the classification performance for this data set.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1003.0319

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.96)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback

Security Analysis of Online Centroid Anomaly Detection

Kloft, Marius, Laskov, Pavel

arXiv.org Machine LearningFeb-27-2010

Security issues are crucial in a number of machine learning applications, especially in scenarios dealing with human activity rather than natural phenomena (e.g., information ranking, spam detection, malware detection, etc.). It is to be expected in such cases that learning algorithms will have to deal with manipulated data aimed at hampering decision making. Although some previous work addressed the handling of malicious data in the context of supervised learning, very little is known about the behavior of anomaly detection methods in such scenarios. In this contribution we analyze the performance of a particular method -- online centroid anomaly detection -- in the presence of adversarial noise. Our analysis addresses the following security-related issues: formalization of learning and attack processes, derivation of an optimal attack, analysis of its efficiency and constraints. We derive bounds on the effectiveness of a poisoning attack against centroid anomaly under different conditions: bounded and unbounded percentage of traffic, and bounded false positive rate. Our bounds show that whereas a poisoning attack can be effectively staged in the unconstrained case, it can be made arbitrarily difficult (a strict upper bound on the attacker's gain) if external constraints are properly used. Our experimental evaluation carried out on real HTTP and exploit traces confirms the tightness of our theoretical bounds and practicality of our protection mechanisms.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1003.0078

Country:

Europe (0.92)
North America > United States > California > Alameda County (0.28)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Detecting Botnets Through Log Correlation

Al-Hammadi, Yousof, Aickelin, Uwe

arXiv.org Artificial IntelligenceJan-15-2010

Abstract-- Botnets, which consist of thousands of compromised machines, can cause significant threats to other systems by launching Distributed Denial of Service (DDoS) attacks, keylogging, and backdoors. In response to these threats, new effective techniques are needed to detect the presence of botnets. In this paper, we have used an interception technique to monitor Windows Application Programming Interface (API) functions calls made by communication applications and store these calls with their arguments in log files. Our algorithm detects botnets based on monitoring abnormal activity by correlating the changes in log file sizes from different hosts. Recently, an explosive growth of coordinated attacks has been noticed [1][6].

artificial intelligence, log file size, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1001.2665

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Genre: Research Report (0.66)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.99)

Add feedback

ICD 10 Based Medical Expert System Using Fuzzy Temporal Logic

Chinniah, P., Muttan, Dr. S.

arXiv.org Artificial IntelligenceJan-12-2010

The expert opinion is necessary in medical decision making, since there are wide variations in clinical practices. Moreover, the growing need to assess and improve quality of health care has brought to light the possibility of developing and implementing clinical practice guidelines based on expert opinions. Even though the colleague's opinion helps in accessing information about real cases which is another important source of information, an important goal to reach when dealing with real medical cases is to have simultaneous access to the expert's opinion about the same indications of the real case being treated. The increase of the information volume in each medical field, due to the emergence of new discoveries, treatments, medicines and technologies, leads to a frequent need of consulting medical literature and in particular specialized revues and journals. Certainly, due to the huge volume of this information, a classified, targeted, access is necessary. In the field of medicine, Imprecision and Uncertainty play a large role in the process of diagnosis of disease that has most frequently been the focus of these applications. With the increased volume of information available to physicians from new medical technologies, the process of classifying different sets of symptoms under a single name and determining the appropriate therapeutic actions become increasingly difficult.

artificial intelligence, expert system, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1001.1979

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > India > Tamil Nadu > Chennai (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Austria (0.04)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Diagnostic Medicine (0.94)
Health & Medicine > Health Care Technology (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.49)

Add feedback

Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

Downey, Doug, Etzioni, Oren

Neural Information Processing SystemsDec-31-2009

Is accurate classification possible in the absence of hand-labeled data? This paper introduces the Monotonic Feature (MF) abstraction--where the probability of class membership increases monotonically with the MF's value. The paper proves that when an MF is given, PAC learning is possible with no hand-labeled data under certain assumptions. We argue that MFs arise naturally in a broad range of textual classification applications. On the classic "20 Newsgroups" data set, a learner given an MF and unlabeled data achieves classification accuracy equal to that of a state-of-the-art semi-supervised learner relying on 160 hand-labeled examples. Even when MFs are not given as input, their presence or absence can be determined from a small amount of hand-labeled data, which yields a new semi-supervised learning method that reduces error by 15% on the 20 Newsgroups data.

assumption, classifier, monotonic feature, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
(4 more...)

Add feedback

Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification

Downey, Doug, Etzioni, Oren

Neural Information Processing SystemsDec-31-2009

Is accurate classification possible in the absence of hand-labeled data? This paper introduces the Monotonic Feature (MF) abstraction--where the probability of class membership increases monotonically with the MF's value. The paper proves that when an MF is given, PAC learning is possible with no hand-labeled data under certain assumptions. We argue that MFs arise naturally in a broad range of textual classification applications. On the classic "20 Newsgroups" data set, a learner given an MF and unlabeled data achieves classification accuracy equal to that of a state-of-the-art semi-supervised learner relying on 160 hand-labeled examples. Even when MFs are not given as input, their presence or absence can be determined from a small amount of hand-labeled data, which yields a new semi-supervised learning method that reduces error by 15% on the 20 Newsgroups data.

assumption, classifier, monotonic feature, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York (0.05)
North America > United States > Illinois > Cook County > Evanston (0.04)
Europe > France > Pays de la Loire > Loire-Atlantique > Nantes (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
(4 more...)

Add feedback

Dirichlet-Bernoulli Alignment: A Generative Model for Multi-Class Multi-Label Multi-Instance Corpora

Yang, Shuang-hong, Zha, Hongyuan, Hu, Bao-gang

Neural Information Processing SystemsDec-31-2009

We propose Dirichlet-Bernoulli Alignment (DBA), a generative model for corpora in which each pattern (e.g., a document) contains a set of instances (e.g., paragraphs in the document) and belongs to multiple classes. By casting predefined classes as latent Dirichlet variables (i.e., instance level labels), and modeling the multi-label of each pattern as Bernoulli variables conditioned on the weighted empirical average of topic assignments, DBA automatically aligns the latent topics discovered from data to human-defined classes. DBA is useful for both pattern classification and instance disambiguation, which are tested on text classification and named entity disambiguation for web search queries respectively.

classification, dba, dirichlet-bernoulli alignment, (16 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.05)
North America > United States (0.04)
Asia > China (0.04)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Bootstrapping from Game Tree Search

Veness, Joel, Silver, David, Blair, Alan, Uther, William

Neural Information Processing SystemsDec-31-2009

In this paper we introduce a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search. Our algorithm differs from previous approaches to learning from search, such as Samuels checkers player and the TD-Leaf algorithm, in two key ways. First, we update all nodes in the search tree, rather than a single node. Second, we use the outcome of a deep search, instead of the outcome of a subsequent search, as the training signal for the evaluation function. We implemented our algorithm in a chess program Meep, using a linear heuristic function. After initialising its weight vector to small random values, Meep was able to learn high quality weights from self-play alone. When tested online against human opponents, Meep played at a master level, the best performance of any chess program with a heuristic learned entirely from self-play.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.28)
Oceania > Australia > New South Wales > Sydney (0.14)

Genre: Research Report > New Finding (0.69)

Industry: Leisure & Entertainment > Games > Chess (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Games (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.42)

Add feedback

Overlaying classifiers: a practical approach for optimal ranking

Clémençcon, Stéphan J., Vayatis, Nicolas

Neural Information Processing SystemsDec-31-2009

ROC curves are one of the most widely used displays to evaluate performance of scoring functions. In the paper, we propose a statistical method for directly optimizing the ROC curve. The target is known to be the regression function up to an increasing transformation and this boils down to recovering the level sets of the latter. We propose to use classifiers obtained by empirical risk minimization of a weighted classification error and then to construct a scoring rule by overlaying these classifiers. We show the consistency and rate of convergence to the optimal ROC curve of this procedure in terms of supremum norm and also, as a byproduct of the analysis, we derive an empirical estimate of the optimal ROC curve.

artificial intelligence, machine learning, roc curve, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

AUC optimization and the two-sample problem

Vayatis, Nicolas, Depecker, Marine, Clémençcon, Stéphan J.

Neural Information Processing SystemsDec-31-2009

The purpose of the paper is to explore the connection between multivariate homogeneity testsand AUC optimization. The latter problem has recently received much attention in the statistical learning literature. From the elementary observation that,in the two-sample problem setup, the null assumption corresponds to the situation where the area under the optimal ROC curve is equal to 1/2, we propose atwo-stage testing method based on data splitting. A nearly optimal scoring function in the AUC sense is first learnt from one of the two half-samples. Data from the remaining half-sample are then projected onto the real line and eventually rankedaccording to the scoring function computed at the first stage. The last step amounts to performing a standard Mann-Whitney Wilcoxon test in the onedimensional framework.We show that the learning step of the procedure does not affect the consistency of the test as well as its properties in terms of power, provided the ranking produced is accurate enough in the AUC sense. The results of a numerical experiment are eventually displayed in order to show the efficiency of the method.

artificial intelligence, auc, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Industry: Education (0.34)

Technology: