Supervised Learning


A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

#artificialintelligence

This paper introduces a la carte embed-ding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transfor-mation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in con-text to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.


Facebook is staring down a record-setting $5 billion fine

USATODAY - Tech Top Stories

Facebook may be close to putting a Federal Trade Commission investigation behind it. But it faces a variety of other probes in Europe and the U.S., some of which could present it with even bigger headaches. While the $5 billion fine from the FTC, which Facebook has been expecting, is by far the largest the agency has levied on a technology company. When Facebook reported its first-quarter earnings back in April, the company confirmed what many had long suspected: The Federal Trade Commission (FTC) was nearing the end of an investigation into the company following last year's Cambridge Analytica scandal. Numerous media reports suggested that the FTC was considering a record-setting fine to make an example of the social media platform.


Cloud TPU Pods break AI training records Google Cloud Blog

#artificialintelligence

Google Cloud's AI-optimized infrastructure makes it possible for businesses to train state-of-the-art machine learning models faster, at greater scale, and at lower cost. These advantages enabled Google Cloud Platform (GCP) to set three new performance records in the latest round of the MLPerf benchmark competition, the industry-wide standard for measuring ML performance. All three record-setting results ran on Cloud TPU v3 Pods, the latest generation of supercomputers that Google has built specifically for machine learning. These results showcased the speed of Cloud TPU Pods-- with each of the winning runs using less than two minutes of compute time. With these latest MLPerf benchmark results, Google Cloud is the first public cloud provider to outperform on-premise systems when running large-scale, industry-standard ML training workloads of Transformer, Single Shot Detector (SSD), and ResNet-50.


Persistent homology detects curvature

arXiv.org Machine Learning

In topological data analysis, persistent homology is used to study the "shape of data". Persistent homology computations are completely characterized by a set of intervals called a bar code. It is often said that the long intervals represent the "topological signal" and the short intervals represent "noise". We give evidence to dispute this thesis, showing that the short intervals encode geometric information. Specifically, we prove that persistent homology detects the curvature of disks from which points have been sampled. We describe a general computational framework for solving inverse problems using the average persistence landscape, a continuous mapping from metric spaces with a probability measure to a Hilbert space. In the present application, the average persistence landscapes of points sampled from disks of constant curvature results in a path in this Hilbert space which may be learned using standard tools from statistical and machine learning.


Structured Output Learning with Conditional Generative Flows

arXiv.org Machine Learning

Traditional structured prediction models try to learn the conditional likelihood, i.e., p(y x), to capture the relationship between the structured output y and the input features x. For many models, computing the likelihood is intractable. These models are therefore hard to train, requiring the use of surrogate objectives or variational inference to approximate likelihood. In this paper, we propose conditional Glow (c-Glow), a conditional generative flow for structured output learning. C-Glow benefits from the ability of flow-based models to compute p(y x) exactly and efficiently. Learning with c-Glow does not require a surrogate objective or performing inference during training. Once trained, we can directly and efficiently generate conditional samples to do structured prediction. We evaluate this approach on different structured prediction tasks and find c-Glow's structured outputs comparable in quality with state-of-the-art deep structured prediction approaches.


Online Learning to Rank with Features

arXiv.org Machine Learning

We introduce a new model for online ranking in which the click probability factors into an examination and attractiveness function and the attractiveness function is a linear function of a feature vector and an unknown parameter. Only relatively mild assumptions are made on the examination function. A novel algorithm for this setup is analysed, showing that the dependence on the number of items is replaced by a dependence on the dimension, allowing the new algorithm to handle a large number of items. When reduced to the orthogonal case, the regret of the algorithm improves on the state-of-the-art.


Learning Mahalanobis Metric Spaces via Geometric Approximation Algorithms

arXiv.org Machine Learning

Learning Mahalanobis metric spaces is an important problem that has found numerous applications. Several algorithms have been designed for this problem, including Information Theoretic Metric Learning (ITML) by [Davis et al. 2007] and Large Margin Nearest Neighbor (LMNN) classification by [Weinberger and Saul 2009]. We consider a formulation of Mahalanobis metric learning as an optimization problem, where the objective is to minimize the number of violated similarity/dissimilarity constraints. We show that for any fixed ambient dimension, there exists a fully polynomial-time approximation scheme (FPTAS) with nearly-linear running time. This result is obtained using tools from the theory of linear programming in low dimensions. We also discuss improvements of the algorithm in practice, and present experimental results on synthetic and real-world data sets.


A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

Neural Information Processing Systems

In this paper we propose an approximated learning framework for large scale graphical models and derive message passing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in the CRF's primal a variant of the log-partition function, known as soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for structured prediction problems using Fenchel duality based on a local entropy approximation that computes the exact gradients of the approximated problem and is guaranteed to converge. Unlike existing approaches, this allow us to learn graphical models with cycles and very large number of parameters efficiently. We demonstrate the effectiveness of our approach in an image denoising task.


Efficient Model-free Reinforcement Learning in Metric Spaces

arXiv.org Machine Learning

Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Watkins, Dayan 92] have been widely used in practice and can achieve human level performance in applications such as video games [Mnih et al. 15]. Recently, equipped with the idea of optimism in the face of uncertainty, Q-learning algorithms [Jin, Allen-Zhu, Bubeck, Jordan 18] can be proven to be sample efficient for discrete tabular Markov Decision Processes (MDPs) which have finite number of states and actions. In this work, we present an efficient model-free Q-learning based algorithm in MDPs with a natural metric on the state-action space--hence extending efficient model-free Q-learning algorithms to continuous state-action space. Compared to previous model-based RL algorithms for metric spaces [Kakade, Kearns, Langford 03], our algorithm does not require access to a black-box planning oracle.


Japanese teenager lops off Guinness World Record-setting locks ready for Reiwa and college

The Japan Times

KAGOSHIMA - An 18-year-old Japanese teenager, once recognized as having the longest hair in the world among 13-to-17-year-olds, had her first-ever haircut Tuesday before starting life at university. Keito Kawahara, who lives in Izumi, Kagoshima Prefecture, said she plans to donate the hair that was cut for medical wigs. Kawahara initially grew her hair to hide a scar on her head that developed as a result of medical treatment shortly after birth. She continued life without cutting her hair, which she braided every morning during high school. There were times when she thought about changing her hairstyle, but instead she focused on studying for university entrance examinations.