Goto

Collaborating Authors

 Optimization


Unsupervised On-line Learning of Decision Trees for Hierarchical Data Analysis

Neural Information Processing Systems

An adaptive online algorithm is proposed to estimate hierarchical data structures for non-stationary data sources. The approach is based on the principle of minimum cross entropy to derive a decision tree for data clustering and it employs a metalearning idea (learning to learn) to adapt to changes in data characteristics. Its efficiency is demonstrated by grouping non-stationary artifical data and by hierarchical segmentation of LANDSAT images. 1 Introduction Unsupervised learning addresses the problem to detect structure inherent in unlabeled andunclassified data. N. The encoding usually is represented by an assignment matrix M (Mia), where Mia 1 if and only if Xi belongs to cluster L: 1 MiaV (Xi, Ya) measures the quality of a data partition, Le., optimal assignments and prototypes (M,y)OPt argminM,y1i (M,Y) minimize the inhomogeneity of clusters w.r.t. a given distance measure V. For reasons of simplicity we restrict the presentation to the ' sum-of-squared-error criterion V(x, y) To facilitate this minimization a deterministic annealing approach was proposed in [5] which maps the discrete optimization problem, i.e. how to determine the data assignments, viathe Maximum Entropy Principle [2] to a continuous parameter es- Unsupervised Online Learning ofDecision Trees for Data Analysis 515 timation problem.


Selecting Weighting Factors in Logarithmic Opinion Pools

Neural Information Processing Systems

A simple linear averaging of the outputs of several networks as e.g. in bagging [3], seems to follow naturally from a bias/variance decomposition of the sum-squared error. The sum-squared error of the average model is a quadratic function of the weighting factors assigned to the networks in the ensemble [7], suggesting a quadratic programming algorithm for finding the "optimal" weighting factors. If we interpret the output of a network as a probability statement, the sum-squared error corresponds to minus the loglikelihood or the Kullback-Leibler divergence, and linear averaging of the outputs tologarithmic averaging of the probability statements: the logarithmic opinion pool. The crux of this paper is that this whole story about model averaging, bias/variancedecompositions, and quadratic programming to find the optimal weighting factors, is not specific for the sumsquared error,but applies to the combination of probability statements of any kind in a logarithmic opinion pool, as long as the Kullback-Leibler divergence plays the role of the error measure. As examples we treat model averaging for classification models under a cross-entropy error measure and models for estimating variances.


A Convergence Proof for the Softassign Quadratic Assignment Algorithm

Neural Information Processing Systems

The softassign quadratic assignment algorithm has recently emerged as an effective strategy for a variety of optimization problems in pattern recognition and combinatorial optimization. While the effectiveness of the algorithm was demonstrated in thousands of simulations, there was no known proof of convergence. Here, we provide a proof of convergence for the most general form of the algorithm.


Triangulation by Continuous Embedding

Neural Information Processing Systems

When triangulating a belief network we aim to obtain a junction tree of minimum state space. According to (Rose, 1970), searching for the optimal triangulation can be cast as a search over all the permutations of the graph's vertices. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. This paper presents two ways of embedding the triangulation problem into continuous domain and shows that they perform well compared to the best known heuristic.


MIMIC: Finding Optima by Estimating Probability Densities

Neural Information Processing Systems

In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely related and should not be explored independently. Similarly, experience may establish that a subset of parameters must take on particular values. Any search of the cost landscape should take advantage of these relationships. We present MIMIC, a framework in which we analyze the global structure of the optimization landscape. A novel and efficient algorithm for the estimation of this structure is derived. We use knowledge of this structure to guide a randomized search through the solution space and, in turn, to refine our estimate ofthe structure.


Clustering via Concave Minimization

Neural Information Processing Systems

There are many approaches to this problem, including statistical [9], machine learning [7], integer and mathematical programming [18,1]. In this paper we concentrate on a simple concave minimization formulation of the problem that leads to a finite and fast algorithm.


Genetic Algorithms and Explicit Search Statistics

Neural Information Processing Systems

The genetic algorithm (GA) is a heuristic search procedure based on mechanisms abstracted from population genetics. In a previous paper [Baluja & Caruana, 1995], we showed that much simpler algorithms, such as hillcIimbing and Population Based Incremental Learning (PBIL), perform comparably to GAs on an optimization problem custom designed to benefit from the GA's operators. This paper extends these results in two directions. First, in a large-scale empirical comparison of problems that have been reported in GA literature, we show that on many problems, simpler algorithms can perform significantly better than GAs. Second, we describe when crossover is useful, and show how it can be incorporated into PBIL. 1 IMPLICIT VS.


A Convergence Proof for the Softassign Quadratic Assignment Algorithm

Neural Information Processing Systems

The softassign quadratic assignment algorithm has recently emerged as an effective strategy for a variety of optimization problems in pattern recognition and combinatorial optimization. While the effectiveness of the algorithm was demonstrated in thousands of simulations, there was no known proof of convergence. Here, we provide a proof of convergence for the most general form of the algorithm.


Triangulation by Continuous Embedding

Neural Information Processing Systems

When triangulating a belief network we aim to obtain a junction tree of minimum state space. According to (Rose, 1970), searching for the optimal triangulation can be cast as a search over all the permutations of the graph's vertices. Our approach is to embed the discrete set of permutations in a convex continuous domain D. By suitably extending the cost function over D and solving the continous nonlinear optimization task we hope to obtain a good triangulation with respect to the aformentioned cost. This paper presents two ways of embedding the triangulation problem into continuous domain and shows that they perform well compared to the best known heuristic.


MIMIC: Finding Optima by Estimating Probability Densities

Neural Information Processing Systems

In many optimization problems, the structure of solutions reflects complex relationships between the different input parameters. For example, experience may tell us that certain parameters are closely related and should not be explored independently. Similarly, experience may establish that a subset of parameters must take on particular values. Any search of the cost landscape should take advantage of these relationships. We present MIMIC, a framework in which we analyze the global structure of the optimization landscape. A novel and efficient algorithm for the estimation of this structure is derived. We use knowledge of this structure to guide a randomized search through the solution space and, in turn, to refine our estimate ofthe structure.