Overview
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The submission describes a convex deep learning formulation that leverages a number of key ideas. First, a training objective is proposed that explicitly includes the outputs of hidden layers as variables to be inferred via optimization. These are linked to linear responses via a loss function, and the net objective is the sum of these loss functions across the layers, plus some regularization terms. Next, a number of changes of variables are performed in order to reparameterize the objective into a convex form, heavily leveraging the representer theorem and the idea of value regularization. We are left with a convex objective in terms of three different matrices (per layer) to optimize. In particular, one of these matrices is a nonparametric'normalized output kernel' matrix, which takes the place of optimizing over the hidden layer outputs directly; however, this leads to a transductive method where we must simultaneously solve the optimization for training and test inputs.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","1407" "Title:","On Communication Cost of Distributed Statistical Estimation and Dimensionality" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper investigates the communication cost of distributed estimation for d-dimensional spherical Gaussian distribution with unknown mean and unitary covariance, where the joint distribution is assumed to be a product distribution of each coordinate. The authors generalize previous works on the one-dimensional case in [4] by proposing upper and lower bounds for d-dimensional data on two communication schemes, interactive and simultaneous communication settings, for achieving minimax squared loss. The results establish the tradeoffs between dimensionality and communication cost for distributed estimation. In addition, improved bounds are derived when the unknown mean is s-sparse.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper addresses the problem of robustly estimating the low-dimensional subspace of contaminated observations when the observations are inherently coherent. Performance goes worse with increasing data coherence is a standard theoretical bottleneck of previous RPCA methods. This paper, however, circumvents this problem in a clever manner. Considering that such cluster structure is rather common in realistic data, solving this issue is certainly significantly meaningful.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: The paper extends the classical BCM learning rule to utilize information from spike triplets. It is shown that the update rule can learn selectivity for mixture distributions (by converging to the class means). Quality: By employing tensor notation, the paper shows that the BCM rule can be generalized to use information from more than a pair of spikes (spike triplets are used for the examples). While the model has fewer parameters than previous learning algorithms based on spike triplets or quadruplets and the method can be shown to have stable points as class means of mixture distributions, the lack of experimental comparisons with other models makes it hard to gauge the incremental contribution of the model.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper presents a new Gibbs sampler algorithm for FHMMs. The idea is to add an auxillary variable, U, to the state of the Gibbs sampler. The value of U restricts the set of possible values that the hidden state X can take at the next step of the Gibbs sampler. As the number of possible values for X_i is small for each time point i, we can update X given U (and the data) using FFBS. I think this is an original and clever approach to an important class of problems.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper connects two previously described methods of multitask feature learning, one where regularization is applied within and across tasks separately, and one where the regularization is performed on the jointly learned parameter matrix. This paper proves that under specific parameter settings, the two formulations are equivalent. This paper is fairly clear, but I would have liked one final statement of the full form of problem 1 using problem 2's parameters (or vice versa) rather than the current stating of parameter equivalence. However, the proofs are nicely presented with sketches of the steps described beforehand to help the reader stitch together the parts.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors present a novel approach to learning to rank. In contrast to traditional approaches, the idea is to focus on the number of positive instances that are ranked before the first negative one. Following a large-margin approach leads to primal and dual representations. Compared to similar approaches, the complexity is only linear in the number of instances.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Review for Exponential Concentration of a Density Functional Estimator This paper derives an exponential concentration inequality for a plug-in estimator of a class of integral functionals of one or more continuous probability densities, which includes entropy, divergence, mutual information, and others. From the concentration inequality and an analysis of the bias, mean squared error convergence rates of the estimator are derived. It is then shown how the concentration inequality can be used to find bounds on the error of an estimator for conditional mutual information. This work could be significant in that the results can be applied to a large class of integral functionals of probability densities.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. In this article, the authors propose a framework for performing model comparison of Bayesian models on behavioral data. To do so, they summarize the Bayesian Decision Theory framework, pinpoint areas of non-identifiability, and outline the types of constraints that can be used to make each term in the Bayesian framework identifiable. They then make assumptions to constrain each term in the Bayesian framework, explore how differentiable parameter values are in their model, and apply the technique to two studies that use Bayesian decision theory to explain behavioral responses: time interval estimation and motion perception. Issues of identifiability of internal representations and processes have been prominent issues within cognitive science and psychology for decades.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. SUMMARY This paper proposes a nuclear norm penalized estimator for matrix completion problem, where the observations take a finite (discrete) number of values. Both with theoretical analysis and with numerical experiment, the authors verify the proposed approach is effective. I understand that there are cases where the observations are discrete and that we may need a distinguished algorithm for them, the recommendation systems may not be a good example. Although most recommender system datasets allow finite number of possible ratings (usually 1 to 5 stars), the output does not need to be finite.