similarity


Comprehensive Guide to build Recommendation Engine from scratch

#artificialintelligence

In today's world, every customer is faced with multiple choices. For example, If I'm looking for a book to read without any specific idea of what I want, there's a wide range of possibilities how my search might pan out. I might waste a lot of time browsing around on the internet and trawling through various sites hoping to strike gold. I might look for recommendations from other people. But if there was a site or app which could recommend me books based on what I have read previously, that would be a massive help. Instead of wasting time on various sites, I could just log in and voila! 10 recommended books tailored to my taste. This is what recommendation engines do and their power is being harnessed by most businesses these days. From Amazon to Netflix, Google to Goodreads, recommendation engines are one of the most widely used applications of machine learning techniques. In this article, we will cover various types of recommendation engine algorithms and fundamentals of creating them in Python. We will also see the mathematics behind the workings of these algorithms. Finally, we will create our own recommendation engine using matrix factorization.


AI teaches itself to identify materials – and predict new ones too

#artificialintelligence

A deep learning neural network trained on 50,000 crystal structures of inorganic materials has acquired the ability to recognise chemical similarities and predict new materials. One way to find out whether two elements from the periodic table will form a crystalline material is the tried and trusted'shake and bake' – mix them together at a range of different stoichiometries and hope for the best. Binary materials are thus very well covered in the scientific literature, but this method can't keep up with the vastly more complex combinatorial possibilities afforded by three or more elements. Therefore, predictions of which elements will combine in which ratios to form regular solids are necessary and they hold the promise of new materials with desirable or even unprecedented properties. Current prediction methods typically use evolutionary algorithms and apply them to random starting structures.


Unsupervised learning demystified – Cassie Kozyrkov – Medium

#artificialintelligence

Unsupervised learning sounds like a fancy way to say "let the kids learn on their own not to touch the hot oven", but it's actually a pattern-finding technique for mining inspiration from your data. Contrary to popular belief, it has nothing to do with machines running around without adult supervision, forming their own opinions about things.


Detecting image similarity using Spark, LSH and TensorFlow

#artificialintelligence

As a visual platform, the ability to learn from images to understand our content is important. In order to detect near-duplicate images we use the NearDup system, a Spark- and TensorFlow-based pipeline. At the core of the pipeline is a Spark implementation of batch LSH (locality-sensitive hashing) search and a TensorFlow-based classifier. Every day, the pipeline compares billions of items and incrementally updates clusters. In this post, we'll explain how we use this technology to better understand images and improve the accuracy and density of recommendations and search results across our production surfaces.


Creativity in the Age of Machines: How AI-Powered Creatives Will Enable a More Beautiful World Adobe Blog

#artificialintelligence

As a researcher, I've always had a strong interest in robots. Today, I see a lot of similarities for how I imagined robots would work and what we expect our intelligent assistants to do. Both will need effective computer vision, voice recognition, and synthesis -- but most importantly, interesting things to do and say. There are lots of similarities between now and a theory about what triggered the Cambrian Explosion. According to that theory, once creatures developed vision that worked, we had a fantastic acceleration of evolution and an emergence of a wide variety of new forms and behaviors.


A neural network catalyzer for multi-dimensional similarity search

arXiv.org Machine Learning

This paper aims at learning a function mapping input vectors to an output space in a way that improves high-dimensional similarity search. As a proxy objective, we design and train a neural network that favors uniformity in the spherical output space, while preserving the neighborhood structure after the mapping. For this purpose, we propose a new regularizer derived from the Kozachenko-Leonenko differential entropy estimator and combine it with a locality-aware triplet loss. Our method operates as a catalyzer for traditional indexing methods such as locality sensitive hashing or iterative quantization, boosting the overall recall. Additionally, the network output distribution makes it possible to leverage structured quantizers with efficient algebraic encoding, in particular spherical lattice quantizers such as the Gosset lattice E8. Our experiments show that this approach is competitive with state-of-the-art methods such as optimized product quantization.


Probabilistic FastText for Multi-Sense Word Embeddings

arXiv.org Machine Learning

We introduce Probabilistic FastText, a new model for word embeddings that can capture multiple word senses, sub-word structure, and uncertainty information. In particular, we represent each word with a Gaussian mixture density, where the mean of a mixture component is given by the sum of n-grams. This representation allows the model to share statistical strength across sub-word structures (e.g. Latin roots), producing accurate representations of rare, misspelt, or even unseen words. Moreover, each component of the mixture can capture a different word sense. Probabilistic FastText outperforms both FastText, which has no probabilistic model, and dictionary-level probabilistic embeddings, which do not incorporate subword structures, on several word-similarity benchmarks, including English RareWord and foreign language datasets. We also achieve state-of-art performance on benchmarks that measure ability to discern different meanings. Thus, the proposed model is the first to achieve multi-sense representations while having enriched semantics on rare words.


Learn from Your Neighbor: Learning Multi-modal Mappings from Sparse Annotations

arXiv.org Machine Learning

Many structured prediction problems (particularly in vision and language domains) are ambiguous, with multiple outputs being correct for an input - e.g. there are many ways of describing an image, multiple ways of translating a sentence; however, exhaustively annotating the applicability of all possible outputs is intractable due to exponentially large output spaces (e.g. all English sentences). In practice, these problems are cast as multi-class prediction, with the likelihood of only a sparse set of annotations being maximized - unfortunately penalizing for placing beliefs on plausible but unannotated outputs. We make and test the following hypothesis - for a given input, the annotations of its neighbors may serve as an additional supervisory signal. Specifically, we propose an objective that transfers supervision from neighboring examples. We first study the properties of our developed method in a controlled toy setup before reporting results on multi-label classification and two image-grounded sequence modeling tasks - captioning and question generation. We evaluate using standard task-specific metrics and measures of output diversity, finding consistent improvements over standard maximum likelihood training and other baselines.


GraKeL: A Graph Kernel Library in Python

arXiv.org Machine Learning

The problem of accurately measuring the similarity between graphs is at the core of many applications in a variety of disciplines. Graph kernels have recently emerged as a promising approach to this problem. There are now many kernels, each focusing on different structural aspects of graphs. Here, we present GraKeL, a library that unifies several graph kernels into a common framework. The library is written in Python and is build on top of scikit-learn. It is simple to use and can be naturally combined with scikit-learn's modules to build a complete machine learning pipeline for tasks such as graph classification and clustering. The code is BSD licensed and is available at: https://github.com/ysig/GraKeL.


GuideR: a guided separate-and-conquer rule learning in classification, regression, and survival settings

arXiv.org Machine Learning

This article presents GuideR, a user-guided rule induction algorithm, which overcomes the largest limitation of the existing methods-the lack of the possibility to introduce user's preferences or domain knowledge to the rule learning process. Automatic selection of attributes and attribute ranges often leads to the situation in which resulting rules do not contain interesting information. We propose an induction algorithm which takes into account user's requirements. Our method uses the sequential covering approach and is suitable for classification, regression, and survival analysis problems. The effectiveness of the algorithm in all these tasks has been verified experimentally, confirming guided rule induction to be a powerful data analysis tool.