Country
Matching Free Trees with Replicator Equations
Motivated by our recent work on rooted tree matching, in this paper we provide a solution to the problem of matching two free (i.e., unrooted) trees by constructing an association graph whose maximal cliques are in one-to-one correspondence with maximal common subtrees. We then solve the problem using simple replicator dynamics from evolutionary game theory. Experiments on hundreds of uniformly random trees are presented. The results are impressive: despite the inherent inability of these simple dynamics to escape from local optima, they always returned a globally optimal solution.
On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes
Ng, Andrew Y., Jordan, Michael I.
Discriminative classifiers model the posterior p(ylx)directly, or learn a direct map from inputs x to the class labels. There are several compelling reasons for using discriminative rather than generative classifiers, oneof which, succinctly articulated by Vapnik [6], is that "one should solve the [classification] problem directly and never solve a more general problem as an intermediate step [such as modeling p(xly)]." Indeed, leaving aside computational issues and matters such as handling missing data, the prevailing consensus seems to be that discriminative classifiers are almost always to be preferred to generative ones. Anotherpiece of prevailing folk wisdom is that the number of examples needed to fit a model is often roughly linear in the number of free parameters of a model. This has its theoretical basis in the observation that for "many" models, the VC dimension is roughly linear or at most some low-order polynomial in the number of parameters (see, e.g., [1, 3]), and it is known that sample complexity in the discriminative setting is linear in the VC dimension [6]. In this paper, we study empirically and theoretically the extent to which these beliefs are true. A parametric family of probabilistic models p(x, y) can be fit either to optimize the joint likelihood of the inputs and the labels, or fit to optimize the conditional likelihood p(ylx), or even fit to minimize the 0-1 training error obtained by thresholding p(ylx) to make predictions.
An Efficient, Exact Algorithm for Solving Tree-Structured Graphical Games
Littman, Michael L., Kearns, Michael J., Singh, Satinder P.
The algorithm is the first to compute equilibria both efficiently and exactly for a nontrivial class of graphical games. 1 Introduction Seeking to replicate the representational and computational benefits that graphical modelshave provided to probabilistic inference, several recent works have introduced graph-theoretic frameworks for the study of multi-agent systems (LaMura 2000; Koller and Milch 2001; Kearns et al. 2001). In the simplest of these formalisms, each vertex represents a single agent, and the edges represent pairwise interaction between agents. As with many familiar network models, the macroscopic behavior of a large system is thus implicitly described by its local interactions, andthe computational challenge is to extract the global states of interest. Classical game theory is typically used to model multi-agent interactions, and the global states of interest are thus the so-called Nash equilibria, in which no agent has a unilateral incentive to deviate. In a recent paper (Kearns et al. 2001), we introduced such a graphical formalism for multi-agent game theory, and provided two algorithms for computing Nash equilibria whenthe underlying graph is a tree (or is sufficiently sparse).
(Not) Bounding the True Error
We present a new approach to bounding the true error rate of a continuous valued classifier based upon PAC-Bayes bounds. The method first constructs adistribution over classifiers by determining how sensitive each parameter in the model is to noise. The true error rate of the stochastic classifier found with the sensitivity analysis can then be tightly bounded using a PAC-Bayes bound.
Minimax Probability Machine
Lanckriet, Gert, Ghaoui, Laurent E., Bhattacharyya, Chiranjib, Jordan, Michael I.
When constructing a classifier, the probability of correct classification offuture data points should be maximized. In the current paper this desideratum is translated in a very direct way into an optimization problem, which is solved using methods from convex optimization.We also show how to exploit Mercer kernels in this setting to obtain nonlinear decision boundaries. A worst-case bound on the probability of misclassification of future data is obtained explicitly. 1 Introduction Consider the problem of choosing a linear discriminant by minimizing the probabilities thatdata vectors fall on the wrong side of the boundary. One way to attempt to achieve this is via a generative approach in which one makes distributional assumptions aboutthe class-conditional densities and thereby estimates and controls the relevant probabilities. The need to make distributional assumptions, however, casts doubt on the generality and validity of such an approach, and in discriminative solutionsto classification problems it is common to attempt to dispense with class-conditional densities entirely.
A Dynamic HMM for On-line Segmentation of Sequential Data
Kohlmorgen, Jens, Lemm, Steven
We propose a novel method for the analysis of sequential data that exhibits an inherent mode switching. In particular, the data might be a non-stationary time series from a dynamical system that switches between multiple operating modes. Unlike other approaches, ourmethod processes the data incrementally and without any training of internal parameters. We use an HMM with a dynamically changingnumber of states and an online variant of the Viterbi algorithm that performs an unsupervised segmentation and classification of the data on-the-fly, i.e. the method is able to process incomingdata in real-time. The main idea of the approach is to track and segment changes of the probability density of the data in a sliding window on the incoming data stream.
Online Learning with Kernels
Kivinen, Jyrki, Smola, Alex J., Williamson, Robert C.
We consider online learning in a Reproducing Kernel Hilbert Space. Our method is computationally efficient and leads to simple algorithms. In particular we derive update equations for classification, regression, and novelty detection. The inclusion of the -trick allows us to give a robust parameterization.
Active Information Retrieval
Jaakkola, Tommi, Siegelmann, Hava T.
In classical large information retrieval systems, the system responds to a user initiated query with a list of results ranked by relevance. The users may further refine their query as needed. This process may result in a lengthy correspondence without conclusion. We propose an alternative active learning approach, where the system respondsto the initial user's query by successively probing the user for distinctions at multiple levels of abstraction. The system's initiated queries are optimized for speedy recovery and the user is permitted to respond with multiple selections or may reject the query. The information is in each case unambiguously incorporated by the system and the subsequent queries are adjusted to minimize the need for further exchange. The system's initiated queries are subject to resource constraints pertaining to the amount of information thatcan be presented to the user per iteration.