AITopics

doi: 10.1613/jair.279

10157

Industry: Health & Medicine > Therapeutic Area (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Zhao, Ying, Schwartz, Richard M., Sroka, Jason J., Makhoul, John

Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

Neural Information Processing SystemsDec-31-1995

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuous speech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform a more constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems.

decision tree, hierarchical mixture, hmm system, (10 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Industry: Government > Military (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Zhao, Ying, Schwartz, Richard M., Sroka, Jason J., Makhoul, John

Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

Neural Information Processing SystemsDec-31-1995

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuous speech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform a more constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems.

decision tree, hierarchical mixture, hmm system, (10 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Industry: Government > Military (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Zhao, Ying, Schwartz, Richard M., Sroka, Jason J., Makhoul, John

Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

Neural Information Processing SystemsDec-31-1995

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuousspeech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform amore constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems. We present here a new CD-HMM system which **MIT, Cambridge MA 02139 860 YingZhao, Richard Schwartz, Jason Sroka, John Makhoul has similar properties and possesses the same advantages as decision trees, but has the additional important advantage of having automatically trainable "soft" decision boundaries. 2 Hierarchical Mixtures of Experts The method of Hierarchical Mixtures of Experts (HME) developed recently by Jordan [1] breaks a large scale task into many small ones by partitioning the input space into a nested set of regions, then building a simple but specific model (local expert) in each region.

decision tree, hierarchical mixture, hmm system, (10 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.45)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.25)

Industry: Government > Military (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Journal of Artificial Intelligence ResearchMar-1-1995

Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm

Turney, P. D.

This paper introduces ICET, a new algorithm for cost-sensitive classification. ICET uses a genetic algorithm to evolve a population of biases for a decision tree induction algorithm. The fitness function of the genetic algorithm is the average cost of classification when using the decision tree, including both the costs of tests (features, measurements) and the costs of classification errors. ICET is compared here with three other algorithms for cost-sensitive classification - EG2, CS-ID3, and IDX - and also with C4.5, which classifies without regard to cost. The five algorithms are evaluated empirically on five real-world medical datasets. Three sets of experiments are performed. The first set examines the baseline performance of the five algorithms on the five datasets and establishes that ICET performs significantly better than its competitors. The second set tests the robustness of ICET under a variety of conditions and shows that ICET maintains its advantage. The third set looks at ICET's search in bias space and discovers a way to improve the search.

algorithm, decision tree, icet, (13 more...)

doi: 10.1613/jair.120

10129

Country:

North America > Canada > Ontario > National Capital Region > Ottawa (0.14)
North America > United States > Michigan > Wayne County > Detroit (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)
Health & Medicine > Pharmaceuticals & Biotechnology (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Dietterich, T. G., Bakiri, G.

Solving Multiclass Learning Problems via Error-Correcting Output Codes

Journal of Artificial Intelligence ResearchJan-1-1995

Multiclass learning problems involve finding a definitionfor an unknown function f(x) whose range is a discrete setcontaining k > 2 values (i.e., k ``classes''). Thedefinition is acquired by studying collections of training examples ofthe form [x_i, f (x_i)]. Existing approaches tomulticlass learning problems include direct application of multiclassalgorithms such as the decision-tree algorithms C4.5 and CART,application of binary concept learning algorithms to learn individualbinary functions for each of the k classes, and application ofbinary concept learning algorithms with distributed outputrepresentations. This paper compares these three approaches to a newtechnique in which error-correcting codes are employed as adistributed output representation. We show that these outputrepresentations improve the generalization performance of both C4.5and backpropagation on a wide range of multiclass learning tasks. Wealso demonstrate that this approach is robust with respect to changesin the size of the training sample, the assignment of distributedrepresentations to particular classes, and the application ofoverfitting avoidance techniques such as decision-tree pruning.Finally, we show that---like the other methods---the error-correctingcode technique can provide reliable class probability estimates.Taken together, these results demonstrate that error-correcting outputcodes provide a general-purpose method for improving the performanceof inductive learning programs on multiclass problems.

error-correcting output code, isolet letter nettalk performance relative, multiclass learning problem, (4 more...)

doi: 10.1613/jair.105

10127

Industry: Education > Focused Education > Special Education (0.80)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.99)

Journal of Artificial Intelligence ResearchAug-1-1994

A System for Induction of Oblique Decision Trees

Murthy, S. K., Kasif, S., Salzberg, S.

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.

algorithm, decision tree, hyperplane, (16 more...)

doi: 10.1613/jair.63

10121

Country:

North America > United States > Maryland > Baltimore (0.14)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Murphy, P. M., Pazzani, M. J.

Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction

Journal of Artificial Intelligence ResearchMar-1-1994

We report on a series of experiments in which all decision trees consistent with the training data are constructed. These experiments were run to gain an understanding of the properties of the set of consistent decision trees and the factors that affect the accuracy of individual trees. In particular, we investigated the relationship between the size of a decision tree consistent with some training data and the accuracy of the tree on test data. The experiments were performed on a massively parallel Maspar computer. The results of the experiments on several artificial and two real world problems indicate that, for many of the problems investigated, smaller consistent decision trees are on average less accurate than the average accuracy of slightly larger trees.

error number, node cardinality error number, prob, (11 more...)

doi: 10.1613/jair.41

10116

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Journal of Artificial Intelligence ResearchFeb-1-1994

Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models

Ling, C. X.

Learning the past tense of English verbs - a seemingly minor aspect of language acquisition - has generated heated debates since 1986, and has become a landmark task for testing the adequacy of cognitive modeling. Several artificial neural networks (ANNs) have been implemented, and a challenge for better symbolic models has been posed. In this paper, we present a general-purpose Symbolic Pattern Associator (SPA) based upon the decision-tree learning algorithm ID3. We conduct extensive head-to-head comparisons on the generalization ability between ANN models and the SPA under different representations. We conclude that the SPA generalizes the past tense of unseen verbs better than ANN models by a wide margin, and we offer insights as to why this should be the case. We also discuss a new default strategy for decision-tree learning algorithms.

connectionist model, english verb, full connection, (1 more...)

doi: 10.1613/jair.39

10114

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Schlimmer, J. C., Hermens, L. A.

Software Agents: Completing Patterns and Constructing User Interfaces

Journal of Artificial Intelligence ResearchNov-1-1993

To support the goal of allowing users to record and retrieve information, this paper describes an interactive note-taking system for pen-based computers with two distinctive features. First, it actively predicts what the user is going to write. Second, it automatically constructs a custom, button-box user interface on request. The system is an example of a learning-apprentice software- agent. A machine learning component characterizes the syntax and semantics of the user's information. A performance system uses this learned information to generate completion strings and construct a user interface. Description of Online Appendix: People like to record information. Doing this on paper is initially efficient, but lacks flexibility. Recording information on a computer is less efficient but more powerful. In our new note taking softwre, the user records information directly on a computer. Behind the interface, an agent acts for the user. To help, it provides defaults and constructs a custom user interface. The demonstration is a QuickTime movie of the note taking agent in action. The file is a binhexed self-extracting archive. Macintosh utilities for binhex are available from mac.archive.umich.edu. QuickTime is available from ftp.apple.com in the dts/mac/sys.soft/quicktime.

fsm, software, transition, (15 more...)

doi: 10.1613/jair.25

10110

Country:

North America > United States > California > Santa Clara County > San Jose (0.04)
North America > United States > Washington > Whitman County > Pullman (0.04)
North America > United States > Texas > Tarrant County > Arlington (0.04)
(7 more...)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.46)