Overview
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper introduces a system that creates a classifier at test time for a given test image using a set of training examples in a specific neighborhood of the test image. The authors method proposes how to identify the neighborhood with the most informative training examples for the given test image. The authors indicate that their method is well suited to fine-grained image classification tasks. The authors' key observation is that while training images that are nearest to the test image may be informative, the most informative set of training images must be found by estimating the informativeness of all members of a neighborhood set together. The main contribution of this paper is a method for estimating the most informative neighborhood set in an online fashion for each new test image.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper presents a spectral clustering algorithm for the sparse degree regime (degree=\theta(1)) of the stochastic blockmodel. The authors propose an alternative to the non-backtracking operator and point out that this has similar properties as the non-backtracking operator, which has been shown to be useful in the sparse regime. But the proposed data matrix is smaller than the non backtracking operator, and symmetric, therefore making eigenvalue computation easier and more accurate. The sparse regime is indeed the hardest in terms of showing concentration of empirical eigenvalues, or performance of a clustering algorithm in the stochastic blockmodel.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper defines temporal coverage functions, a generalization of coverage functions with dependence. It introduces a method to learn them from previous time history. It also includes experimental validation on real data sets. The paper is well written, and clear. It is also very technical, and was not an easy read for me.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes an algorithm for online combinatorial optimization. In this online learning problem, the action space is combinatorially large and can be represented in a d-dimensional Euclidean space such that the loss in each time step is a linear function of the action. It would greatly improve the paper if there was a thorough comparison between the new algorithm and Online Stochastic Mirror Descent (OSMD by Audibert et al., [3] in the current paper) both in terms of how the algorithms work and in terms of regret bounds. In the current form of the paper, I am not sure if the new algorithm is significantly different from OSMD or if it improves its bounds.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper addresses the problem of learning with label proportions, wherein one only knows the proportions of the labels of bags of samples that are positive (or more generally that belong to one class). It makes very substantive contributions by: a. analysis that shows that a general class of loss functions allow efficient learning via the mean operator without requiring homogeneity (as was the case with previous literature) b. Developing fast learning algorithms for estimating the mean operator c. allow the use of standard binary classifier learning algorithms to solve the LLP problem via reduction. It is also very well written, though it is quite dense because of the sheer volume of novel material that the authors try to cover in a short NIPS paper.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. Summary: This paper studies the Principal Component Analysis (PCA) for large tensors of arbitrary order k under a single-spike model. Solving tensor PCA exactly is in general NP hard. Given a completely observed rank-one symmetric tensor, this paper provides conditions under which one can reliably estimate the unknown unit vector. Specifically, the paper gives conditions on signal-to-noise ratio under several scenarios which allow one to estimate the solution reliably. For the maximum-likelihood estimator (MLE), the authors show that in an ideal case with unbounded computational resources, the MLE is successful with high probability if the signal-to-noise ration is above sqrt(k.log(k))(1+o(1))
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper studies a planted partition model for random m-uniform hypergraphs, and proves the consistency of a natural generalization of spectral clustering. The hypergraph adjacency tensor is (mode-1) flattened to a matrix, from which a normalized Laplacian matrix is formed and the standard spectral partitioning is then applied. The striking feature of the analysis is that the rate of convergence improves as m increases, provided that the number of partitions is small. Some experiments on both synthetic and application derived data are reported, and the proposed method is shown to be relatively effective, especially given its simplicity. The model is well-motivated by applications in computer vision and likely elsewhere.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper defines a joint generative model of an image and its annotated text which is used to learn a bit vector representation for large scale image retrieval. An Indian Buffet Process is used to learn the length of the bit vector. The method is compared favourably to several widely used techniques. Quality ======= It is good to see a retrieval paper constructed around a well-defined probabilistic model.