Goto

Collaborating Authors

 Decision Tree Learning


A Comparison of Different Machine Transliteration Models

arXiv.org Artificial Intelligence

Machine transliteration is a method for automatically converting words in one language into phonetically equivalent ones in another language. Machine transliteration plays an important role in natural language applications such as information retrieval and machine translation, especially for handling proper nouns and technical terms. Four machine transliteration models -- grapheme-based transliteration model, phoneme-based transliteration model, hybrid transliteration model, and correspondence-based transliteration model -- have been proposed by several researchers. To date, however, there has been little research on a framework in which multiple transliteration models can operate simultaneously. Furthermore, there has been no comparison of the four models within the same framework and using the same data. We addressed these problems by 1) modeling the four models within the same framework, 2) comparing them under the same conditions, and 3) developing a way to improve machine transliteration through this comparison. Our comparison showed that the hybrid and correspondence-based models were the most effective and that the four models can be used in a complementary manner to improve machine transliteration performance.


Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods

arXiv.org Machine Learning

Ensemble methods for supervised machine learning have become popular due to their ability to accurately predict class labels with groups of simple, lightweight "base learners." While ensembles offer computationally efficient models that have good predictive capability they tend to be large and offer little insight into the patterns or structure in a dataset. We consider an ensemble technique that returns a model of ranked rules. The model accurately predicts class labels and has the advantage of indicating which parameter constraints are most useful for predicting those labels. An example of the rule ensemble method successfully ranking rules and selecting attributes is given with a dataset containing images of potential supernovas where the number of necessary features is reduced from 39 to 21. We also compare the rule ensemble method on a set of multi-class problems with boosting and bagging, which are two well known ensemble techniques that use decision trees as base learners, but do not have a rule ranking scheme.


A theory of multiclass boosting

arXiv.org Machine Learning

Boosting combines weak classifiers to form highly accurate predictors. Although the case of binary classification is well understood, in the multiclass setting, the "correct" requirements on the weak classifier, or the notion of the most efficient boosting algorithms are missing. In this paper, we create a broad and general framework, within which we make precise and identify the optimal requirements on the weak-classifier, as well as design the most effective, in a certain sense, boosting algorithms that assume such requirements.


Temporal Decision Trees: Model-based Diagnosis of Dynamic Systems On-Board

arXiv.org Artificial Intelligence

The automatic generation of decision trees based on off-line reasoning on models of a domain is a reasonable compromise between the advantages of using a model-based approach in technical domains and the constraints imposed by embedded applications. In this paper we extend the approach to deal with temporal information. We introduce a notion of temporal decision tree, which is designed to make use of relevant information as long as it is acquired, and we present an algorithm for compiling such trees from a model-based reasoning system.


Random forest models of the retention constants in the thin layer chromatography

arXiv.org Artificial Intelligence

In the current study we examine an application of the machine learning methods to model the retention constants in the thin layer chromatography (TLC). This problem can be described with hundreds or even thousands of descriptors relevant to various molecular properties, most of them redundant and not relevant for the retention constant prediction. Hence we employed feature selection to significantly reduce the number of attributes. Additionally we have tested application of the bagging procedure to the feature selection. The random forest regression models were built using selected variables. The resulting models have better correlation with the experimental data than the reference models obtained with linear regression. The cross-validation confirms robustness of the models.


Inducing Interpretable Voting Classifiers without Trading Accuracy for Simplicity: Theoretical Results, Approximation Algorithms

arXiv.org Artificial Intelligence

Recent advances in the study of voting classification algorithms have brought empirical and theoretical results clearly showing the discrimination power of ensemble classifiers. It has been previously argued that the search of this classification power in the design of the algorithms has marginalized the need to obtain interpretable classifiers. Therefore, the question of whether one might have to dispense with interpretability in order to keep classification strength is being raised in a growing number of machine learning or data mining papers. The purpose of this paper is to study both theoretically and empirically the problem. First, we provide numerous results giving insight into the hardness of the simplicity-accuracy tradeoff for voting classifiers. Then we provide an efficient "top-down and prune" induction heuristic, WIDC, mainly derived from recent results on the weak learning and boosting frameworks. It is to our knowledge the first attempt to build a voting classifier as a base formula using the weak learning framework (the one which was previously highly successful for decision tree induction), and not the strong learning framework (as usual for such classifiers with boosting-like approaches). While it uses a well-known induction scheme previously successful in other classes of concept representations, thus making it easy to implement and compare, WIDC also relies on recent or new results we give about particular cases of boosting known as partition boosting and ranking loss boosting. Experimental results on thirty-one domains, most of which readily available, tend to display the ability of WIDC to produce small, accurate, and interpretable decision committees.


An Analysis of Reduced Error Pruning

arXiv.org Artificial Intelligence

Top-down induction of decision trees has been observed to suffer from the inadequate functioning of the pruning phase. In particular, it is known that the size of the resulting tree grows linearly with the sample size, even though the accuracy of the tree does not improve. Reduced Error Pruning is an algorithm that has been used as a representative technique in attempts to explain the problems of decision tree learning. In this paper we present analyses of Reduced Error Pruning in three different settings. First we study the basic algorithmic properties of the method, properties that hold independent of the input decision tree and pruning examples. Then we examine a situation that intuitively should lead to the subtree under consideration to be replaced by a leaf node, one in which the class label and attribute values of the pruning examples are independent of each other. This analysis is conducted under two different assumptions. The general analysis shows that the pruning probability of a node fitting pure noise is bounded by a function that decreases exponentially as the size of the tree grows. In a specific analysis we assume that the examples are distributed uniformly to the tree. This assumption lets us approximate the number of subtrees that are pruned because they do not receive any pruning examples. This paper clarifies the different variants of the Reduced Error Pruning algorithm, brings new insight to its algorithmic properties, analyses the algorithm with less imposed assumptions than before, and includes the previously overlooked empty subtrees to the analysis.


Issues in Stacked Generalization

arXiv.org Artificial Intelligence

Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy. In this paper we address two crucial issues which have been considered to be a `black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input. We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones. We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks. We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging.


Learning about Machine Learning: An Extended Assignment to Classify Twitter Accounts

AAAI Conferences

We describe a four-week series of assignments in an undergraduate AI course at a liberal arts college developing a supervised learning solution to the problem of classifying Twitter accounts as either a person account or a non-person account (e.g. organization or spambot). This problem employs real data in an ongoing research project by the first author, yet is accessible to students with limited programming expertise.The students were able to experience a complete cycle of creating a machine learning solution: exploring raw data,creating a training set, engineering features, comparing different classifiers, evaluating the results, and performing erroranalysis. We received positive feedback from the students and intend to refine the assignment and make it available (together with the created training data) for use by the research community.


Using Decision Trees to Find Patterns in an Ophthalmology Dataset

AAAI Conferences

We present research in decision tree analysis that studies a data set and finds new patterns that were not obvious using statistical methods. Our method is applied to a database of accommodative esotropic patients. Accommodative esotropia is an eye disease that when left untreated leads to blindness. Patients whose muscles deteriorate often need corrective surgery, since less invasive methods of treatment tend to fail in these patients. Using a learn and prune methodology, decision tree analysis of 354 accommodative esotropic patients led to the discovery of two conjunctive variables that predicted deterioration in the initial year of treatment better than what was previously determined using standard statistical methods.