Supervised Learning
Functional programming for deep learning – Towards Data Science – Medium
Before I started my most recent job at ThinkTopic, the concepts of "functional programming" and "machine learning" belonged to two different worlds entirely. One was a programming paradigm surging in popularity as the world turned towards simplicity, composability, and immutability to maintain complex scaling applications; the other was a tool to teach computers to autocomplete doodles and make music. The more I worked with the two, the more I began realizing that the overlap is both practical and theoretical. Firstly, machine learning is not a stand-alone endeavor; it needs to be rapidly incorporated into complex scaling applications in industry. Secondly, machine learning -- and deep learning in particular -- is functional by design.
L.A. County median home price breaks record set during last decade's housing boom
In summer 2007, the Los Angeles County median home price hit an all-time high of $550,000. It soon plunged as the housing bubble burst and the national economy crashed. Now the median, the point where half the homes sold for more and half for less, has finally passed the heights of 10 years ago -- the result of an improving economy, historically low mortgage rates and a shortage of listings. According to a report released Wednesday from real estate firm CoreLogic, the county's median price in May rose 6.8% from a year earlier to reach $560,500 as sales jumped 4.8%. When adjusted for inflation, May's median remains 11% below the 2007 high, though the nominal record comes amid fresh concerns over the high cost of housing in California.
Learning from networked examples
Wang, Yuyi, Ramon, Jan, Guo, Zheng-Chu
Many machine learning algorithms are based on the assumption that training examples are drawn independently. However, this assumption does not hold anymore when learning from a networked sample because two or more training examples may share some common objects, and hence share the features of these shared objects. We show that the classic approach of ignoring this problem potentially can have a harmful effect on the accuracy of statistics, and then consider alternatives. One of these is to only use independent examples, discarding other information. However, this is clearly suboptimal. We analyze sample error bounds in this networked setting, providing significantly improved results. An important component of our approach is formed by efficient sample weighting schemes, which leads to novel concentration inequalities.
Poincar\'e Embeddings for Learning Hierarchical Representations
Nickel, Maximilian, Kiela, Douwe
Representation learning has become an invaluable approach for learning from symbolic data such as text and graphs. However, while complex symbolic datasets often exhibit a latent hierarchical structure, state-of-the-art methods typically learn embeddings in Euclidean vector spaces, which do not account for this property. For this purpose, we introduce a new approach for learning hierarchical representations of symbolic data by embedding them into hyperbolic space -- or more precisely into an n-dimensional Poincar\'e ball. Due to the underlying hyperbolic geometry, this allows us to learn parsimonious representations of symbolic data by simultaneously capturing hierarchy and similarity. We introduce an efficient algorithm to learn the embeddings based on Riemannian optimization and show experimentally that Poincar\'e embeddings outperform Euclidean embeddings significantly on data with latent hierarchies, both in terms of representation capacity and in terms of generalization ability.
#NuggsforCarter: Teen reaches all-time retweet record
Carter Wilkerson, 16, of Reno took on a Wendy's challenge to get 18 million retweets for free chicken nuggets for a year. On Tuesday, May 9, 2017, he beat the record set by Ellen DeGeneres. He will get the free nuggets. Carter Wilkerson, 16, of Reno, reached the top of the Twitter game Tuesday morning with the most retweets ever (3.441 million as of 11:24 a.m. And he'll get his chicken nuggets, plus $100,000 to a national charity.
Scalable and Flexible Multiview MAX-VAR Canonical Correlation Analysis
Fu, Xiao, Huang, Kejun, Hong, Mingyi, Sidiropoulos, Nicholas D., So, Anthony Man-Cho
Generalized canonical correlation analysis (GCCA) aims at finding latent low-dimensional common structure from multiple views (feature vectors in different domains) of the same entities. Unlike principal component analysis (PCA) that handles a single view, (G)CCA is able to integrate information from different feature spaces. Here we focus on MAX-VAR GCCA, a popular formulation which has recently gained renewed interest in multilingual processing and speech modeling. The classic MAX-VAR GCCA problem can be solved optimally via eigen-decomposition of a matrix that compounds the (whitened) correlation matrices of the views; but this solution has serious scalability issues, and is not directly amenable to incorporating pertinent structural constraints such as non-negativity and sparsity on the canonical components. We posit regularized MAX-VAR GCCA as a non-convex optimization problem and propose an alternating optimization (AO)-based algorithm to handle it. Our algorithm alternates between {\em inexact} solutions of a regularized least squares subproblem and a manifold-constrained non-convex subproblem, thereby achieving substantial memory and computational savings. An important benefit of our design is that it can easily handle structure-promoting regularization. We show that the algorithm globally converges to a critical point at a sublinear rate, and approaches a global optimal solution at a linear rate when no regularization is considered. Judiciously designed simulations and large-scale word embedding tasks are employed to showcase the effectiveness of the proposed algorithm.
Conditional Similarity Networks
Veit, Andreas, Belongie, Serge, Karaletsos, Theofanis
What makes images similar? To measure the similarity between images, they are typically embedded in a feature-vector space, in which their distance preserve the relative dissimilarity. However, when learning such similarity embeddings the simplifying assumption is commonly made that images are only compared to one unique measure of similarity. A main reason for this is that contradicting notions of similarities cannot be captured in a single space. To address this shortcoming, we propose Conditional Similarity Networks (CSNs) that learn embeddings differentiated into semantically distinct subspaces that capture the different notions of similarities. CSNs jointly learn a disentangled embedding where features for different similarities are encoded in separate dimensions as well as masks that select and reweight relevant dimensions to induce a subspace that encodes a specific similarity notion. We show that our approach learns interpretable image representations with visually relevant semantic subspaces. Further, when evaluating on triplet questions from multiple similarity notions our model even outperforms the accuracy obtained by training individual specialized networks for each notion separately.
Nonconvex One-bit Single-label Multi-label Learning
Qiu, Shuang, Luo, Tingjin, Ye, Jieping, Lin, Ming
We study an extreme scenario in multi-label learning where each training instance is endowed with a single one-bit label out of multiple labels. We formulate this problem as a non-trivial special case of one-bit rank-one matrix sensing and develop an efficient non-convex algorithm based on alternating power iteration. The proposed algorithm is able to recover the underlying low-rank matrix model with linear convergence. For a rank-$k$ model with $d_1$ features and $d_2$ classes, the proposed algorithm achieves $O(\epsilon)$ recovery error after retrieving $O(k^{1.5}d_1 d_2/\epsilon)$ one-bit labels within $O(kd)$ memory. Our bound is nearly optimal in the order of $O(1/\epsilon)$. This significantly improves the state-of-the-art sampling complexity of one-bit multi-label learning. We perform experiments to verify our theory and evaluate the performance of the proposed algorithm.
Using Neural Network Formalism to Solve Multiple-Instance Problems
Many objects in the real world are difficult to describe by a single numerical vector of a fixed length, whereas describing them by a set of vectors is more natural. Therefore, Multiple instance learning (MIL) techniques have been constantly gaining on importance throughout last years. MIL formalism represents each object (sample) by a set (bag) of feature vectors (instances) of fixed length where knowledge about objects (e.g., class label) is available on bag level but not necessarily on instance level. Many standard tools including supervised classifiers have been already adapted to MIL setting since the problem got formalized in late nineties. In this work we propose a neural network (NN) based formalism that intuitively bridges the gap between MIL problem definition and the vast existing knowledge-base of standard models and classifiers. We show that the proposed NN formalism is effectively optimizable by a modified back-propagation algorithm and can reveal unknown patterns inside bags. Comparison to eight types of classifiers from the prior art on a set of 14 publicly available benchmark datasets confirms the advantages and accuracy of the proposed solution.
Belief Propagation in Conditional RBMs for Structured Prediction
Restricted Boltzmann machines~(RBMs) and conditional RBMs~(CRBMs) are popular models for a wide range of applications. In previous work, learning on such models has been dominated by contrastive divergence~(CD) and its variants. Belief propagation~(BP) algorithms are believed to be slow for structured prediction on conditional RBMs~(e.g., Mnih et al. [2011]), and not as good as CD when applied in learning~(e.g., Larochelle et al. [2012]). In this work, we present a matrix-based implementation of belief propagation algorithms on CRBMs, which is easily scalable to tens of thousands of visible and hidden units. We demonstrate that, in both maximum likelihood and max-margin learning, training conditional RBMs with BP as the inference routine can provide significantly better results than current state-of-the-art CD methods on structured prediction problems. We also include practical guidelines on training CRBMs with BP, and some insights on the interaction of learning and inference algorithms for CRBMs.