Asia
Gaussian and Wishart Hyperkernels
We propose a new method for constructing hyperkenels and define two promising special cases that can be computed in closed form. These we call the Gaussian and Wishart hyperkernels. The former is especially attractive in that it has an interpretable regularization scheme reminiscent of that of the Gaussian RBF kernel. We discuss how kernel learning can be used not just for improving the performance of classification and regression methods, but also as a stand-alone algorithm for dimensionality reduction and relational or metric learning.
Predicting spike times from subthreshold dynamics of a neuron
Kobayashi, Ryota, Shinomoto, Shigeru
It has been established that a neuron reproduces highly precise spike response to identical fluctuating input currents. We wish to accurately predict the firing times of a given neuron for any input current. For this purpose we adopt a model that mimics the dynamics of the membrane potential, and then take a cue from its dynamics for predicting the spike occurrence for a novel input current. It is found that the prediction is significantly improved by observing the state space of the membrane potential and its time derivative(s) in advance of a possible spike, in comparison to simply thresholding an instantaneous value of the estimated potential.
Hierarchical Dirichlet Processes with Random Effects
Data sets involving multiple groups with shared characteristics frequently arise in practice. In this paper we extend hierarchical Dirichlet processes to model such data. Each group is assumed to be generated from a template mixture model with group level variability in both the mixing proportions and the component parameters. Variabilities in mixing proportions across groups are handled using hierarchical Dirichlet processes, also allowing for automatic determination of the number of components. In addition, each group is allowed to have its own component parameters coming from a prior described by a template mixture model. This group-level variability in the component parameters is handled using a random effects model. We present a Markov Chain Monte Carlo (MCMC) sampling algorithm to estimate model parameters and demonstrate the method by applying it to the problem of modeling spatial brain activation patterns across multiple images collected via functional magnetic resonance imaging (fMRI).
Combining causal and similarity-based reasoning
Kemp, Charles, Shafto, Patrick, Berke, Allison, Tenenbaum, Joshua B.
Everyday inductive reasoning draws on many kinds of knowledge, including knowledge about relationships between properties and knowledge about relationships between objects. Previous accounts of inductive reasoning generally focus on just one kind of knowledge: models of causal reasoning often focus on relationships between properties, and models of similarity-based reasoning often focus on similarity relationships between objects. We present a Bayesian model of inductive reasoning that incorporates both kinds of knowledge, and show that it accounts well for human inferences about the properties of biological species.
A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems
Kawahara, Yoshinobu, Yairi, Takehisa, Machida, Kazuo
In this paper, we present a subspace method for learning nonlinear dynamical systems based on stochastic realization, in which state vectors are chosen using kernel canonical correlation analysis, and then state-space systems are identified through regression with the state vectors. We construct the theoretical underpinning and derive a concrete algorithm for nonlinear identification. The obtained algorithm needs no iterative optimization procedure and can be implemented on the basis of fast and reliable numerical schemes. The simulation result shows that our algorithm can express dynamics with a high degree of accuracy.
A Humanlike Predictor of Facial Attractiveness
Kagian, Amit, Dror, Gideon, Leyvand, Tommer, Cohen-or, Daniel, Ruppin, Eytan
This work presents a method for estimating human facial attractiveness, based on supervised learning techniques. Numerous facial features that describe facial geometry, color and texture, combined with an average human attractiveness score for each facial image, are used to train various predictors. Facial attractiveness ratings produced by the final predictor are found to be highly correlated with human ratings, markedly improving previous machine learning achievements. Simulated psychophysical experiments with virtually manipulated images reveal preferences in the machine's judgments which are remarkably similar to those of humans. These experiments shed new light on existing theories of facial attractiveness such as the averageness, smoothness and symmetry hypotheses. It is intriguing to find that a machine trained explicitly to capture an operational performance criteria such as attractiveness rating, implicitly captures basic human psychophysical biases characterizing the perception of facial attractiveness in general.
Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models
Johnson, Mark, Griffiths, Thomas L., Goldwater, Sharon
This paper introduces adaptor grammars, a class of probabilistic models of language that generalize probabilistic context-free grammars (PCFGs). Adaptor grammars augment the probabilistic rules of PCFGs with "adaptors" that can induce dependencies among successive uses. With a particular choice of adaptor, based on the Pitman-Yor process, nonparametric Bayesian models of language using Dirichlet processes and hierarchical Dirichlet processes can be written as simple grammars. We present a general-purpose inference algorithm for adaptor grammars, making it easy to define and use such models, and illustrate how several existing nonparametric Bayesian models can be expressed within this framework.
Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm
Jenssen, Robert, Eltoft, Torbjørn, Girolami, Mark, Erdogmus, Deniz
We propose a new kernel-based data transformation technique. It is founded on the principle of maximum entropy (MaxEnt) preservation, hence named kernel MaxEnt. The key measure is Renyi's entropy estimated via Parzen windowing. We show that kernel MaxEnt is based on eigenvectors, and is in that sense similar to kernel PCA, but may produce strikingly different transformed data sets. An enhanced spectral clustering algorithm is proposed, by replacing kernel PCA by kernel MaxEnt as an intermediate step. This has a major impact on performance.
Correcting Sample Selection Bias by Unlabeled Data
Huang, Jiayuan, Gretton, Arthur, Borgwardt, Karsten, Schölkopf, Bernhard, Smola, Alex J.
We consider the scenario where training and test data are drawn from different distributions, commonly referred to as sample selection bias. Most algorithms for this setting try to first recover sampling distributions and then make appropriate corrections based on the distribution estimate. We present a nonparametric method which directly produces resampling weights without distribution estimation. Our method works by matching distributions between training and testing sets in feature space. Experimental results demonstrate that our method works well in practice.
Single Channel Speech Separation Using Factorial Dynamics
Hershey, John R., Kristjansson, Trausti, Rennie, Steven, Olsen, Peder A.
Human listeners have the extraordinary ability to hear and recognize speech even when more than one person is talking. Their machine counterparts have historically been unable to compete with this ability, until now. We present a modelbased system that performs on par with humans in the task of separating speech of two talkers from a single-channel recording.