Machine Learning



Clustering using Monte Carlo Cross-Validation

AAAI Conferences

In this paper a new cross-validated likelihood criterion is investigated for determining cluster structure. A practical clustering algorithm based on Monte Carlo crossvalidation (MCCV) is introduced.


Rethinking the Learning of Belief Network Probabilities

AAAI Conferences

Belief networks have been accepted as a tool for knowledge discovery in databases for several years now, and have been a growing focus of machine learning research for the past decade. Several uses have been demonstrated in the literature in domains as distinct as document retrieval, medical diagnosis, and telecommunications (D'Ambrosio 1994; Ezawa & Norton 1995; Park, Han, & Choi 1995). A common need across all of these application domains is for robust, flexible and powerful methods for the automatic induction of belief networks.


Error-Based and Entropy-Based Discretization of Continuous Features

AAAI Conferences

Our study includes both an extensive empirical comparison as well as an analysis of scenarios where error minimization may be an inappropriate discretization criterion.



Learning from biased data using mixture models

AAAI Conferences

Data bases sometimes contain a nonrandom sample from the population of interest. This complicates the use of extracted knowledge for predictive purposes.


Linear-Time Rule Induction

AAAI Conferences

Very large datasets pose special problems for machine learning algorithms.



Knowledge Discovery and Data Mining: Towards a Unifying Framework

AAAI Conferences

Dependency Modeling: finding a model which describes significant dependencies between variables (e.g., learning of belief networks). Change and Deviation Detection: discovering the most significant changes in the data from previously measured or normative values 5.2 The Components of Data Mining Algorithms Having outlined the general methods of data mining, the next step is to construct specific algorithms to implement these methods. One can identify three primary components in any data mining algorithm: model representation, model evaluation, and search. This reductionist view is not necessarily complete or fully encompassing: rather, it is a convenient way to express the key concepts of data mining algorithms in a relatively unified and compact manner--(Cheeseman 1990) outlines a similar structure. Model Representation: the language used to describe discoverable patterns.


Detecting Early Indicator Cars in an Automotive Database: A Multi-Strategy Approach

AAAI Conferences

No company so far achieved the ultimate goal of zero faults in manufacturing. Even high-quality products occasionally show problems that must be handled as warranty cases. In this paper, we report work done during the development of an early warning system for a large quality information database in the automotive industry. We present a multi-strategy approach to flexible prediction of upcoming quality problems. We used existing techniques and combined them in a novel way to solve a concrete application problem. The basic idea is to identify sub populations that, at an early point in time, behave like the whole population at a later time. Such sub populations act as early indicators for future developments.