Goto

Collaborating Authors

 Information Retrieval


'Revenge porn victim' sues Google, Yahoo! and Bing demanding they delete her name

Daily Mail - Science & tech

A New York City college student is taking internet giants Google, Yahoo! and Bing to court, demanding they delete her name from their search engines because she was the victim of revenge porn. According to the lawsuit, the 30-year-old woman broke up with her boyfriend last year, the New York Post reported. After the end of their three-month relationship, the man uploaded a video onto the internet of the two engaged in sexual acts. The video was secretly recorded by the ex-boyfriend without her knowledge, the lawsuit states. Because the woman is of West African descent, she has a unique four-letter last name, thus search results of her name are limited to the raunchy video.


How to build a search engine: Part 3

@machinelearnbot

Assuming the dataset is named "people_wiki.csv", Executing this script will result in steaming logs which is ultimately leading to the data getting indexed in elasticsearch. That's how easy it is! Let's spend the next few lines on what actually happened. We declare our elasticsearch object configured on our local machine. Once that object is initialized we will use it to index all of our data.



How will Google's AI Improvements Change SEO for Marketers? โ€“ Marketing and Entrepreneurship

#artificialintelligence

If you prefer reading, here's the quick recap on what changes AI will bring to marketers according to these four industry influencers, plus some of my personal suggestions of what you should do in face of these changes: According to Sam Mallikarjunan, Head of Growth of HubSpot Labs, visual content will have an increasing influence on SEO, as he says, "search engines are getting good at knowing what a video, audio clip, or image is actually about." Not only does Google favor YouTube videos in search results, they're also getting better at analyzing what visual content is about. Just like how content writers had to learn to optimize headings and keywords, visual artists will have to start thinking about SEO when creating visual content like images and videos. SEO for videos, for example, means optimizing keyword targeting, descriptions, tags, video length, and more. Here's a great guide on optimizing videos for SEO from Brian Dean, if you want to learn more.


John Giannandreas Head of Google Search Machine Learning

#artificialintelligence

We can all agree that being with a Google that long and contributing so much to search is a remarkable accomplishment and congratulate Singhal as he steps into a new time in life, focusing on philanthropy. As new leadership often means momentous refocusing, SEO professionals wonder how earned search may change as Giannandreas assumes this position, and if the change will generate ripples across the tech world as a whole. The future of how GoogleBot crawls and interprets web content looks promising under his leadership, as we observe how he impacts machine learning's future and how the Metaweb is woven. Amit went on to say that "search is stronger than ever, and will only get better in the hands of an outstanding set of senior leaders who are already running the show day-to-day. Our mission of empowering people with information and the impact it has had on this world cannot be overstated." John Giannandrea, who has been the forerunner overseeing artificial intelligence, such as in Google Algorithm RankBrain, has been employed at Google for six years and is currently the VP of engineering. As explained by Forbes in November, 2015 RankBrain's role took "a very large fraction" of the millions of queries that went through the search engine.


Maximization of Approximately Submodular Functions

Neural Information Processing Systems

We study the problem of maximizing a function that is approximately submodular under a cardinality constraint. Approximate submodularity implicitly appears in a wide range of applications as in many cases errors in evaluation of a submodular function break submodularity. Say that $F$ is $\eps$-approximately submodular if there exists a submodular function $f$ such that $(1-\eps)f(S) \leq F(S)\leq (1+\eps)f(S)$ for all subsets $S$. We are interested in characterizing the query-complexity of maximizing $F$ subject to a cardinality constraint $k$ as a function of the error level $\eps > 0$. We provide both lower and upper bounds: for $\eps > n^{-1/2}$ we show an exponential query-complexity lower bound. In contrast, when $\eps < {1}/{k}$ or under a stronger bounded curvature assumption, we give constant approximation algorithms.


Flexible Models for Microclustering with Application to Entity Resolution

Neural Information Processing Systems

Most generative models for clustering implicitly assume that the number of data points in each cluster grows linearly with the total number of data points. Finite mixture models, Dirichlet process mixture models, and Pitman--Yor process mixture models make this assumption, as do all other infinitely exchangeable clustering models. However, for some applications, this assumption is inappropriate. For example, when performing entity resolution, the size of each cluster should be unrelated to the size of the data set, and each cluster should contain a negligible fraction of the total number of data points. These applications require models that yield clusters whose sizes grow sublinearly with the size of the data set. We address this requirement by defining the microclustering property and introducing a new class of models that can exhibit this property. We compare models within this class to two commonly used clustering models using four entity-resolution data sets.


An algorithm for L1 nearest neighbor search via monotonic embedding

Neural Information Processing Systems

Fast algorithms for nearest neighbor (NN) search have in large part focused on L2 distance. Here we develop an approach for L1 distance that begins with an explicit and exact embedding of the points into L2. We show how this embedding can efficiently be combined with random projection methods for L2 NN search, such as locality-sensitive hashing or random projection trees. We rigorously establish the correctness of the methodology and show by experimentation that it is competitive in practice with available alternatives.


Verification Based Solution for Structured MAB Problems

Neural Information Processing Systems

We consider the problem of finding the best arm in a stochastic Multi-armed Bandit (MAB) game and propose a general framework based on verification that applies to multiple well-motivated generalizations of the classic MAB problem. In these generalizations, additional structure is known in advance, causing the task of verifying the optimality of a candidate to be easier than discovering the best arm. Our results are focused on the scenario where the failure probability must be very low; we essentially show that in this high confidence regime, identifying the best arm is as easy as the task of verification. We demonstrate the effectiveness of our framework by applying it, and matching or improving the state-of-the art results in the problems of: Linear bandits, Dueling bandits with the Condorcet assumption, Copeland dueling bandits, Unimodal bandits and Graphical bandits.


How to build a search engine: Part 1

@machinelearnbot

In this multi-part series, we will explore how to build a search engine. It will be quite powerful and industrial strength. The first part will focus on getting the right tools and getting technology stack ready. We will build this search engine with an AngularJS front-end and use elasticsearch as the computation back end. Most applications of today are data driven.