Regression
Convex Neural Networks
Bengio, Yoshua, Roux, Nicolas L., Vincent, Pascal, Delalleau, Olivier, Marcotte, Patrice
Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors.
Large-scale biophysical parameter estimation in single neurons via constrained linear regression
Ahrens, Misha, Paninski, Liam, Huys, Quentin J.
Our understanding of the input-output function of single cells has been substantially advanced by biophysically accurate multi-compartmental models. The large number of parameters needing hand tuning in these models has, however, somewhat hampered their applicability and interpretability. Here we propose a simple and well-founded method for automatic estimation of many of these key parameters: 1) the spatial distribution of channel densities on the cell's membrane; 2) the spatiotemporal pattern of synaptic input; 3) the channels' reversal potentials; 4) the intercompartmental conductances; and 5) the noise level in each compartment. We assume experimental access to: a) the spatiotemporal voltage signal in the dendrite (or some contiguous subpart thereof, e.g.
Noise and the two-thirds power Law
Maoz, Uri, Portugaly, Elon, Flash, Tamar, Weiss, Yair
The two-thirds power law, an empirical law stating an inverse nonlinear relationship between the tangential hand speed and the curvature of its trajectory during curved motion, is widely acknowledged to be an invariant ofupper-limb movement. It has also been shown to exist in eyemotion, locomotionand was even demonstrated in motion perception and prediction. This ubiquity has fostered various attempts to uncover the origins of this empirical relationship. In these it was generally attributed eitherto smoothness in hand-or joint-space or to the result of mechanisms that damp noise inherent in the motor system to produce the smooth trajectories evident in healthy human motion. We show here that white Gaussian noise also obeys this power-law. Analysis ofsignal and noise combinations shows that trajectories that were synthetically created not to comply with the power-law are transformed to power-law compliant ones after combination with low levels of noise. Furthermore, there exist colored noise types that drive non-power-law trajectories to power-law compliance and are not affected by smoothing. These results suggest caution when running experiments aimed at verifying thepower-law or assuming its underlying existence without proper analysis of the noise. Our results could also suggest that the power-law might be derived not from smoothness or smoothness-inducing mechanisms operatingon the noise inherent in our motor system but rather from the correlated noise which is inherent in this motor system.
Learning Multiple Related Tasks using Latent Independent Component Analysis
Zhang, Jian, Ghahramani, Zoubin, Yang, Yiming
We propose a probabilistic model based on Independent Component Analysis for learning multiple related tasks. In our model the task parameters areassumed to be generated from independent sources which account for the relatedness of the tasks. We use Laplace distributions to model hidden sources which makes it possible to identify the hidden, independent components instead of just modeling correlations. Furthermore, ourmodel enjoys a sparsity property which makes it both parsimonious and robust. We also propose efficient algorithms for both empirical Bayes method and point estimation. Our experimental results on two multi-label text classification data sets show that the proposed approach is promising.
Selecting Landmark Points for Sparse Manifold Learning
Silva, Jorge, Marques, Jorge, Lemos, Joรฃo
There has been a surge of interest in learning nonlinear manifold models to approximate high-dimensional data. Both for computational complexity reasonsand for generalization capability, sparsity is a desired feature in such models. This usually means dimensionality reduction, which naturally implies estimating the intrinsic dimension, but it can also mean selecting a subset of the data to use as landmarks, which is especially important becausemany existing algorithms have quadratic complexity in the number of observations.
Variational EM Algorithms for Non-Gaussian Latent Variable Models
Palmer, Jason, Kreutz-Delgado, Kenneth, Rao, Bhaskar D., Wipf, David P.
We consider criteria for variational representations of non-Gaussian latent variables,and derive variational EM algorithms in general form. We establish a general equivalence among convex bounding methods, evidence basedmethods, and ensemble learning/Variational Bayes methods, which has previously been demonstrated only for particular cases.
Worst-Case Bounds for Gaussian Process Models
Kakade, Sham M., Seeger, Matthias W., Foster, Dean P.
Dean P. Foster University of Pennsylvania We present a competitive analysis of some nonparametric Bayesian algorithms ina worst-case online learning setting, where no probabilistic assumptions about the generation of the data are made. We consider models which use a Gaussian process prior (over the space of all functions) andprovide bounds on the regret (under the log loss) for commonly usednon-parametric Bayesian algorithms -- including Gaussian regression and logistic regression -- which show how these algorithms can perform favorably under rather general conditions.