Singer, Yoram
Learning to Order Things
Cohen, William W., Schapire, Robert E., Singer, Yoram
Most previous work in inductive learning has concentrated on learning to classify. However, there are many applications in which it is desirable to order rather than classify instances. An example might be a personalized email filter that gives a priority ordering to unread mail. Here we will consider the problem of learning how to construct such orderings, given feedback in the form of preference judgments, i.e., statements that one instance should be ranked ahead of another. Such orderings could be constructed based on a learned classifier or regression model, and in fact often are.
Learning to Order Things
Cohen, William W., Schapire, Robert E., Singer, Yoram
Most previous work in inductive learning has concentrated on learning to classify. However, there are many applications in which it is desirable to order rather than classify instances. An example might be a personalized email filter that gives a priority ordering to unread mail. Here we will consider the problem of learning how to construct such orderings, given feedback in the form of preference judgments, i.e., statements that one instance should be ranked ahead of another. Such orderings could be constructed based on a learned classifier or regression model, and in fact often are.
Shared Context Probabilistic Transducers
Bengio, Yoshua, Bengio, Samy, Isabelle, Jean-Franc, Singer, Yoram
Recently, a model for supervised learning of probabilistic transducers represented by suffix trees was introduced. However, this algorithm tends to build very large trees, requiring very large amounts of computer memory. In this paper, we propose anew, more compact, transducer model in which one shares the parameters of distributions associated to contexts yielding similar conditional output distributions. We illustrate the advantages of the proposed algorithm with comparative experiments on inducing a noun phrase recogmzer.
Shared Context Probabilistic Transducers
Bengio, Yoshua, Bengio, Samy, Isabelle, Jean-Franc, Singer, Yoram
Recently, a model for supervised learning of probabilistic transducers representedby suffix trees was introduced. However, this algorithm tendsto build very large trees, requiring very large amounts of computer memory. In this paper, we propose anew, more compact, transducermodel in which one shares the parameters of distributions associatedto contexts yielding similar conditional output distributions. We illustrate the advantages of the proposed algorithm withcomparative experiments on inducing a noun phrase recogmzer.
Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions
Singer, Yoram, Warmuth, Manfred K. K.
By adapting a framework used for supervised learning, we construct iterative algorithms that maximize the likelihood of the observations while also attempting to stay "close" to the current estimated parameters. We use a bound on the relative entropy between the two HMMs as a distance measure between them. The result is new iterative training algorithms which are similar to the EM (Baum-Welch) algorithm for training HMMs. The proposed algorithms are composed of a step similar to the expectation step of Baum-Welch and a new update of the parameters which replaces the maximization (re-estimation) step. The algorithm takes only negligibly more time per iteration and an approximated version uses the same expectation step as Baum-Welch.
Training Algorithms for Hidden Markov Models using Entropy Based Distance Functions
Singer, Yoram, Warmuth, Manfred K.
By adapting a framework used for supervised learning, we construct iterative algorithms that maximize the likelihood of the observations while also attempting to stay "close" to the current estimated parameters. We use a bound on the relative entropy between the two HMMs as a distance measure betweenthem. The result is new iterative training algorithms which are similar to the EM (Baum-Welch) algorithm for training HMMs. The proposed algorithms are composed of a step similar to the expectation step of Baum-Welch and a new update of the parameters which replaces the maximization (re-estimation) step. The algorithm takes only negligibly moretime per iteration and an approximated version uses the same expectation step as Baum-Welch.
Adaptive Mixture of Probabilistic Transducers
Singer, Yoram
We introduce and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each model in the mixture. Theoretical analysis and comparative simulations indicate that the learning algorithm tracks the best model from an arbitrarily large (possibly infinite) pool of models. We also present an application of the model for inducing a noun phrase recognizer.
Adaptive Mixture of Probabilistic Transducers
Singer, Yoram
We introduce and analyze a mixture model for supervised learning of probabilistic transducers. We devise an online learning algorithm that efficiently infers the structure and estimates the parameters of each model in the mixture. Theoretical analysis and comparative simulations indicate that the learning algorithm tracks the best model from an arbitrarily large (possibly infinite) pool of models. We also present an application of the model for inducing a noun phrase recognizer.
Decoding Cursive Scripts
Singer, Yoram, Tishby, Naftali
Online cursive handwriting recognition is currently one of the most intriguing challenges in pattern recognition. This study presents a novel approach to this problem which is composed of two complementary phases.The first is dynamic encoding of the writing trajectory into a compact sequence of discrete motor control symbols. In this compact representation we largely remove the redundancy of the script, while preserving most of its intelligible components. In the second phase these control sequences are used to train adaptive probabilistic acyclic automata (PAA) for the important ingredients of the writing trajectories, e.g.
The Power of Amnesia
Ron, Dana, Singer, Yoram, Tishby, Naftali
We propose a learning algorithm for a variable memory length Markov process. Human communication, whether given as text, handwriting, or speech, has multi characteristic time scales. On short scales it is characterized mostly by the dynamics that generate the process, whereas on large scales, more syntactic and semantic information is carried. For that reason the conventionally used fixed memory Markov models cannot capture effectively the complexity of such structures. On the other hand using long memory models uniformly is not practical even for as short memory as four.