AITopics | Decision Tree Learning

Collaborating Authors

Decision Tree Learning

Learning to Classify with Branching Tests: "A decision tree takes as input an object or situation described by a set of properties, and outputs a yes/no decision. Decision trees therefore represent Boolean functions. Functions with a larger range of outputs can also be represented...."
– Artificial Intelligence: A Modern Approach. By Stuart Russell & Peter Norvig. 2002. Section 18.3; page 531.

News Overviews Instructional Materials AI-Alerts Classics

Application of Data Mining to Network Intrusion Detection: Classifier Selection Model

Nguyen, Huy, Choi, Deokjai

arXiv.org Artificial IntelligenceJul-7-2010

As network attacks have increased in number and severity over the past few years, intrusion detection system (IDS) is increasingly becoming a critical component to secure the network. Due to large volumes of security audit data as well as complex and dynamic properties of intrusion behaviors, optimizing performance of IDS becomes an important open problem that is receiving more and more attention from the research community. The uncertainty to explore if certain algorithms perform better for certain attack classes constitutes the motivation for the reported herein. In this paper, we evaluate performance of a comprehensive set of classifier algorithms using KDD99 dataset. Based on evaluation results, best algorithms for each attack category is chosen and two classifier algorithm selection models are proposed. The simulation result comparison indicates that noticeable performance improvement and real-time intrusion detection can be achieved as we apply the proposed models to detect different kinds of network attacks.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1007.1268

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Minnesota (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Canada > Quebec (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
(5 more...)

Add feedback

Graph-Valued Regression

Liu, Han, Chen, Xi, Lafferty, John, Wasserman, Larry

arXiv.org Machine LearningJun-20-2010

Undirected graphical models encode in a graph $G$ the dependency structure of a random vector $Y$. In many applications, it is of interest to model $Y$ given another random vector $X$ as input. We refer to the problem of estimating the graph $G(x)$ of $Y$ conditioned on $X=x$ as ``graph-valued regression.'' In this paper, we propose a semiparametric method for estimating $G(x)$ that builds a tree on the $X$ space just as in CART (classification and regression trees), but at each leaf of the tree estimates a graph. We call the method ``Graph-optimized CART,'' or Go-CART. We study the theoretical properties of Go-CART using dyadic partitioning trees, establishing oracle inequalities on risk minimization and tree partition consistency. We also demonstrate the application of Go-CART to a meteorological dataset, showing how graph-valued regression can provide a useful tool for analyzing complex data.

go-cart, graph, partition, (15 more...)

arXiv.org Machine Learning

1006.3972

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > South Dakota (0.04)
North America > United States > Nebraska (0.04)
(5 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data

Shah, Mohak, Marchand, Mario, Corbeil, Jacques

arXiv.org Artificial IntelligenceMay-4-2010

One of the objectives of designing feature selection learning algorithms is to obtain classifiers that depend on a small number of attributes and have verifiable future performance guarantees. There are few, if any, approaches that successfully address the two goals simultaneously. Performance guarantees become crucial for tasks such as microarray data analysis due to very small sample sizes resulting in limited empirical evaluation. To the best of our knowledge, such algorithms that give theoretical bounds on the future performance have not been proposed so far in the context of the classification of gene expression data. In this work, we investigate the premise of learning a conjunction (or disjunction) of decision stumps in Occam's Razor, Sample Compression, and PAC-Bayes learning settings for identifying a small subset of attributes that can be used to perform reliable classification tasks. We apply the proposed approaches for gene identification from DNA microarray data and compare our results to those of well known successful approaches proposed for the task. We show that our algorithm not only finds hypotheses with much smaller number of genes while giving competitive classification accuracy but also have tight risk guarantees on future performance unlike other approaches. The proposed approaches are general and extensible in terms of both designing novel algorithms and application to other domains.

bioinformatics, classifier, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1005.053

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.68)
Health & Medicine > Therapeutic Area > Hematology (0.68)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
(2 more...)

Add feedback

Mining Road Traffic Accident Data to Improve Safety: Role of Road-Related Factors on Accident Severity in Ethiopia

Beshah, Tibebe (Addis Ababa University) | Hill, Shawndra (University of Pennsylvania)

AAAI ConferencesMar-22-2010

Road traffic accidents (RTAs) are a major public health concern, resulting in an estimated 1.2 million deaths and 50 million injuries worldwide each year. In the developing world, RTAs are among the leading cause of death and injury; Ethiopia in particular experiences the highest rate of such accidents. Thus, methods to reduce accident severity are of great interest to traffic agencies and the public at large. In this work, we applied data mining technologies to link recorded road characteristics to accident severity in Ethiopia, and developed a set of rules that could be used by the Ethiopian Traffic Agency to improve safety.

accident, artificial intelligence, machine learning, (13 more...)

AAAI Conferences

2010 AAAI Spring Symposium Series

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.07)
Asia > Thailand > Bangkok > Bangkok (0.05)
(5 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Transportation > Ground > Road (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Further Exploration of the Dendritic Cell Algorithm: Antigen Multiplier and Time Windows

Gu, Feng, Greensmith, Julie, Aickelin, Uwe

arXiv.org Artificial IntelligenceMar-1-2010

As an immune-inspired algorithm, the Dendritic Cell Algorithm (DCA), produces promising performances in the field of anomaly detection. This paper presents the application of the DCA to a standard data set, the KDD 99 data set. The results of different implementation versions of the DXA, including the antigen multiplier and moving time windows are reported. The real-valued Negative Selection Algorithm (NSA) using constant-sized detectors and the C4.5 decision tree algorithm are used, to conduct a baseline comparison. The results suggest that the DCA is applicable to KDD 99 data set, and the antigen multiplier and moving time windows have the same effect on the DCA for this particular data set. The real-valued NSA with constant-sized detectors is not applicable to the data set, and the C4.5 decision tree algorithm provides a benchmark of the classification performance for this data set.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1003.0319

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Information Technology > Security & Privacy (0.96)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback

On the Efficient Minimization of Classification Calibrated Surrogates

Nock, Richard, Nielsen, Frank

Neural Information Processing SystemsDec-31-2009

Bartlett et al (2006) recently proved that a ground condition for convex surrogates, classification calibration, ties up the minimization of the surrogates and classification risks, and left as an important problem the algorithmic questions about the minimization of these surrogates. In this paper, we propose an algorithm which provably minimizes any classification calibrated surrogate strictly convex and differentiable --- a set whose losses span the exponential, logistic and squared losses ---, with boosting-type guaranteed convergence rates under a weak learning assumption. A particular subclass of these surrogates, that we call balanced convex surrogates, has a key rationale that ties it to maximum likelihood estimation, zero-sum games and the set of losses that satisfy some of the most common requirements for losses in supervised learning. We report experiments on more than 50 readily available domains of 11 flavors of the algorithm, that shed light on new surrogates, and the potential of data dependent strategies to tune surrogates.

artificial intelligence, bayesian inference, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)

Add feedback

Using Fuzzy Decision Trees and Information Visualization to Study the Effects of Cultural Diversity on Team Planning and Communication

Liu, Yan (Wright State University) | Warren, Rik (Wright-Patterson Air Force Base)

AAAI ConferencesDec-9-2009

Virtual teams that span multiple geographic and cultural boundaries have become commonplace in numerous organizations due to the competitive advantages they provide in human resources, products, financial means, knowledge sharing and many others. However, the promises of multinational and multicultural (MNMC) distributed teams are accompanied by a number of challenges. Many research studies have suggested that one of the most challenging barriers to the effective implementation of MNMC distributed teams is culture. In this study, data collected from the experiment conducted by the NATO RTO Human Factors and Medicine Panel Research Task Group (HFM-138/RTG) on “Adapatability in Multinational Coalitions” has been analyzed to study the effects of cultural diversity on team planning and communication. Fuzzy decision trees have been derived to model the effects, and information visualization techniques are used to facilitate understanding of the derived classification patterns. Results of the research suggest that there are no single and straightforward conclusions on how cultural diversity affects team planning and communication. Different dimensions of culture values interact in influencing team behaviors. However, diversities in power distance and masculinity seem to play more influential roles than others.

comm, communication, team planning, (17 more...)

AAAI Conferences

Third International Conference on Computational Cultural Dynamics

Country:

South America > Argentina (0.04)
North America > United States > Ohio > Montgomery County > Dayton (0.04)
North America > United States > Maryland (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Government > Military (0.51)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.68)

Add feedback

A Massive Local Rules Search Approach to the Classification Problem

Malyshkin, Vladislav, Bakhramov, Ray, Gorodetsky, Andrey

arXiv.org Artificial IntelligenceDec-1-2009

An approach to the classification problem of machine learning, based on building local classification rules, is developed. The local rules are considered as projections of the global classification rules to the event we want to classify. A massive global optimization algorithm is used for optimization of quality criterion. The algorithm, which has polynomial complexity in typical case, is used to find all high--quality local rules. The other distinctive feature of the algorithm is the integration of attributes levels selection (for ordered attributes) with rules searching and original conflicting rules resolution strategy. The algorithm is practical; it was tested on a number of data sets from UCI repository, and a comparison with the other predicting techniques is presented.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

cs/0609007

Country:

Europe (1.00)
North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

A Theory of Probabilistic Boosting, Decision Trees and Matryoshki

Grossmann, Etienne

arXiv.org Artificial IntelligenceDec-1-2009

We present a theory of boosting probabilistic classifiers. We place ourselves in the situation of a user who only provides a stopping parameter and a probabilistic weak learner/classifier and compare three types of boosting algorithms: probabilistic Adaboost, decision tree, and tree of trees of ... of trees, which we call matryoshka. "Nested tree," "embedded tree" and "recursive tree" are also appropriate names for this algorithm, which is one of our contributions. Our other contribution is the theoretical analysis of the algorithms, in which we give training error bounds. This analysis suggests that the matryoshka leverages probabilistic weak classifiers more efficiently than simple decision trees.

artificial intelligence, classifier, machine learning, (13 more...)

arXiv.org Artificial Intelligence

cs/0607110

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Group-based Query Learning for rapid diagnosis in time-critical situations

Bellala, Gowtham, Bhavnani, Suresh, Scott, Clayton

arXiv.org Machine LearningNov-24-2009

In query learning, the goal is to identify an unknown object while minimizing the number of "yes or no" questions (queries) posed about that object. We consider three extensions of this fundamental problem that are motivated by practical considerations in real-world, time-critical identification tasks such as emergency response. First, we consider the problem where the objects are partitioned into groups, and the goal is to identify only the group to which the object belongs. Second, we address the situation where the queries are partitioned into groups, and an algorithm may suggest a group of queries to a human user, who then selects the actual query. Third, we consider the problem of query learning in the presence of persistent query noise, and relate it to group identification. To address these problems we show that a standard algorithm for query learning, known as the splitting algorithm or generalized binary search, may be viewed as a generalization of Shannon-Fano coding. We then extend this result to the group-based settings, leading to new algorithms. The performance of our algorithms is demonstrated on simulated data and on a database used by first responders for toxic chemical identification.

artificial intelligence, machine learning, optimization problem, (18 more...)

arXiv.org Machine Learning

0911.4511

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Israel (0.04)
Asia > Japan > Shikoku > Ehime Prefecture > Matsuyama (0.04)

Genre: Research Report (0.63)

Industry:

Health & Medicine (0.68)
Materials > Chemicals (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback