Convolutional Neural Networks are great at identifying all the information that makes an image distinct. When we train a deep neural network in Caffe to classify images, we specify a multilayered neural network with different types of layers like convolution, rectified linear unit, softmax loss, and so on. The last layer is the output layer that gives us the output tag with the corresponding confidence value. But sometimes it's useful for us to extract the feature vectors from various layers and use it for other purposes. Let's see how to do it in Python Caffe, shall we?

Nassar, Marcel, Wang, Xin, Tumer, Evren

We introduce a novel encoder-decoder architecture to embed functional processes into latent vector spaces. This embedding can then be decoded to sample the encoded functions over any arbitrary domain. This autoencoder generalizes the recently introduced Conditional Neural Process (CNP) model of random processes. Our architecture employs the latest advances in graph neural networks to process irregularly sampled functions. Thus, we refer to our model as Conditional Graph Neural Process (CGNP). Graph neural networks can effectively exploit `local' structures of the metric spaces over which the functions/processes are defined. The contributions of this paper are twofold: (i) a novel graph-based encoder-decoder architecture for functional and process embeddings, and (ii) a demonstration of the importance of using the structure of metric spaces for this type of representations.

Herbrich, Ralf, Graepel, Thore

We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC-Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to nontrivial bound values and - for maximum margins - to a vanishing complexity term.Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets. 1 Introduction Linear classifiers are exceedingly popular in the machine learning community due to their straightforward applicability and high flexibility which has recently been boosted by the so-called kernel methods [13]. A natural and popular framework for the theoretical analysis of classifiers is the PAC (probably approximately correct) framework[11] which is closely related to Vapnik's work on the generalisation error [12]. For binary classifiers it turned out that the growth function is an appropriate measureof "complexity" and can tightly be upper bounded by the VC (Vapnik-Chervonenkis) dimension [14].

Tansey, Wesley, Padilla, Oscar Hernan Madrid, Suggala, Arun Sai, Ravikumar, Pradeep

We present Vector-Space Markov Random Fields (VS-MRFs), a novel class of undirected graphical models where each variable can belong to an arbitrary vector space. VS-MRFs generalize a recent line of work on scalar-valued, uni-parameter exponential family and mixed graphical models, thereby greatly broadening the class of exponential families available (e.g., allowing multinomial and Dirichlet distributions). Specifically, VS-MRFs are the joint graphical model distributions where the node-conditional distributions belong to generic exponential families with general vector space domains. We also present a sparsistent $M$-estimator for learning our class of MRFs that recovers the correct set of edges with high probability. We validate our approach via a set of synthetic data experiments as well as a real-world case study of over four million foods from the popular diet tracking app MyFitnessPal. Our results demonstrate that our algorithm performs well empirically and that VS-MRFs are capable of capturing and highlighting interesting structure in complex, real-world data. All code for our algorithm is open source and publicly available.

Last time we investigated the k-nearest-neighbors algorithm and the underlying idea that one can learn a classification rule by copying the known classification of nearby data points. This required that we view our data as sitting inside a metric space; that is, we imposed a kind of geometric structure on our data. One glaring problem is that there may be no reasonable way to do this. While we mentioned scaling issues and provided a number of possible metrics in our primer, a more common problem is that the data simply isn't numeric. For instance, a poll of US citizens might ask the respondent to select which of a number of issues he cares most about. There could be 50 choices, and there is no reasonable way to assign these numerical values so that all are equidistant in the resulting metric space. Another issue is that the quality of the data could be bad. For instance, there may be missing values for some attributes (e.g., a respondent may neglect to answer one or more questions).