Mozer, Michael C.
Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems
Hochreiter, Sepp, Mozer, Michael C., Obermayer, Klaus
We introduce a family of classifiers based on a physical analogy to an electrostatic system of charged conductors. The family, called Coulomb classifiers, includes the two best-known support-vector machines (SVMs), the ฮฝ-SVM and the C-SVM. In the electrostatics analogy,a training example corresponds to a charged conductor at a given location in space, the classification function corresponds to the electrostatic potential function, and the training objective function corresponds to the Coulomb energy. The electrostatic framework provides not only a novel interpretation of existing algorithms andtheir interrelationships, but it suggests a variety of new methods for SVMs including kernels that bridge the gap between polynomial and radial-basis functions, objective functions that do not require positive-definite kernels, regularization techniques that allow for the construction of an optimal classifier in Minkowski space. Based on the framework, we propose novel SVMs and perform simulationstudies to show that they are comparable or superior tostandard SVMs. The experiments include classification tasks on data which are represented in terms of their pairwise proximities, wherea Coulomb Classifier outperformed standard SVMs.
Coulomb Classifiers: Generalizing Support Vector Machines via an Analogy to Electrostatic Systems
Hochreiter, Sepp, Mozer, Michael C., Obermayer, Klaus
We introduce a family of classifiers based on a physical analogy to an electrostatic system of charged conductors. The family, called Coulomb classifiers, includes the two best-known support-vector machines (SVMs), the ฮฝ-SVM and the C-SVM. In the electrostatics analogy, a training example corresponds to a charged conductor at a given location in space, the classification function corresponds to the electrostatic potential function, and the training objective function corresponds to the Coulomb energy. The electrostatic framework provides not only a novel interpretation of existing algorithms and their interrelationships, but it suggests a variety of new methods for SVMs including kernels that bridge the gap between polynomial and radial-basis functions, objective functions that do not require positive-definite kernels, regularization techniques that allow for the construction of an optimal classifier in Minkowski space. Based on the framework, we propose novel SVMs and perform simulation studies to show that they are comparable or superior to standard SVMs. The experiments include classification tasks on data which are represented in terms of their pairwise proximities, where a Coulomb Classifier outperformed standard SVMs.
A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
Mozer, Michael C., Colagrosso, Michael D., Huber, David E.
We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conflict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulate behavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time.
Prodding the ROC Curve: Constrained Optimization of Classifier Performance
Mozer, Michael C., Dodier, Robert, Colagrosso, Michael D., Guerra-Salcedo, Cesar, Wolniewicz, Richard
When designing a two-alternative classifier, one ordinarily aims to maximize the classifier's ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: that the classifier achieve a specified correct acceptance or correct rejection rate (i.e., that it achieve a fixed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We propose four algorithms for training a classifier subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance.
A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
Mozer, Michael C., Colagrosso, Michael D., Huber, David E.
We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conflict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulate behavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time.
Prodding the ROC Curve: Constrained Optimization of Classifier Performance
Mozer, Michael C., Dodier, Robert, Colagrosso, Michael D., Guerra-Salcedo, Cesar, Wolniewicz, Richard
When designing a two-alternative classifier, one ordinarily aims to maximize the classifier's ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: that the classifier achieve a specified correct acceptance or correct rejection rate (i.e., that it achieve a fixed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We propose four algorithms for training a classifier subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance.
A Rational Analysis of Cognitive Control in a Speeded Discrimination Task
Mozer, Michael C., Colagrosso, Michael D., Huber, David E.
We are interested in the mechanisms by which individuals monitor and adjust their performance of simple cognitive tasks. We model a speeded discrimination task in which individuals are asked to classify a sequence of stimuli (Jones & Braver, 2001). Response conflict arises when one stimulus class is infrequent relative to another, resulting in more errors and slower reaction times for the infrequent class. How do control processes modulatebehavior based on the relative class frequencies? We explain performance from a rational perspective that casts the goal of individuals as minimizing a cost that depends both on error rate and reaction time.With two additional assumptions of rationality--that class prior probabilities are accurately estimated and that inference is optimal subject to limitations on rate of information transmission--we obtain a good fit to overall RT and error data, as well as trial-by-trial variations in performance.
Prodding the ROC Curve: Constrained Optimization of Classifier Performance
Mozer, Michael C., Dodier, Robert, Colagrosso, Michael D., Guerra-Salcedo, Cesar, Wolniewicz, Richard
When designing a two-alternative classifier, one ordinarily aims to maximize the classifier's ability to discriminate between members of the two classes. We describe a situation in a real-world business application of machine-learning prediction in which an additional constraint is placed on the nature of the solution: thatthe classifier achieve a specified correct acceptance or correct rejection rate (i.e., that it achieve a fixed accuracy on members of one class or the other). Our domain is predicting churn in the telecommunications industry. Churn refers to customers who switch from one service provider to another. We propose fouralgorithms for training a classifier subject to this domain constraint, and present results showing that each algorithm yields a reliable improvement in performance.
The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving
Grimes, David B., Mozer, Michael C.
Although connectionist models have provided insights into the nature of perception and motor control, connectionist accounts of higher cognition seldom go beyond an implementation of traditional symbol-processing theories. We describe a connectionist constraint satisfaction model of how people solve anagram problems. The model exploits statistics of English orthography, but also addresses the interplay of sub symbolic and symbolic computation by a mechanism that extracts approximate symbolic representations (partial orderings of letters) from sub symbolic structures and injects the extracted representation back into the model to assist in the solution of the anagram. We show the computational benefit of this extraction-injection process and discuss its relationship to conscious mental processes and working memory. We also account for experimental data concerning the difficulty of anagram solution based on the orthographic structure of the anagram string and the target word.
Beyond Maximum Likelihood and Density Estimation: A Sample-Based Criterion for Unsupervised Learning of Complex Models
Hochreiter, Sepp, Mozer, Michael C.
Two well known classes of unsupervised procedures that can be cast in this manner are generative and recoding models. In a generative unsupervised framework, the environment generates training exampleswhich we will refer to as observations-by sampling from one distribution; the other distribution is embodied in the model. Examples of generative frameworks are mixtures of Gaussians (MoG) [2], factor analysis [4], and Boltzmann machines [8]. In the recoding unsupervised framework, the model transforms points from an obser- vation space to an output space, and the output distribution is compared either to a reference distribution or to a distribution derived from the output distribution. An example is independent component analysis (leA) [11], a method that discovers a representation of vector-valued observations in which the statistical dependence among the vector elements in the output space is minimized.