Information Retrieval
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
Lee, Yunseong, Scolari, Alberto, Chun, Byung-Gon, Santambrogio, Marco Domenico, Weimer, Markus, Interlandi, Matteo
Machine Learning models are often composed of pipelines of transformations. While this design allows to efficiently execute single model components at training time, prediction serving has different requirements such as low latency, high throughput and graceful performance degradation under heavy load. Current prediction serving systems consider models as black boxes, whereby prediction-time-specific optimizations are ignored in favor of ease of deployment. In this paper, we present PRETZEL, a prediction serving system introducing a novel white box architecture enabling both end-to-end and multi-model optimizations. Using production-like model pipelines, our experiments show that PRETZEL is able to introduce performance improvements over different dimensions; compared to state-of-the-art approaches PRETZEL is on average able to reduce 99th percentile latency by 5.5x while reducing memory footprint by 25x, and increasing throughput by 4.7x.
Query Understanding, Divided into Three Parts โ Daniel Tunkelang โ Medium
Like Rome, query understanding can't be built in one day. Implementing holistic understanding, reductionist understanding, and resolution is a lot of work, and as a search team you can always find room to improve all of these. But if you're not already looking at query understanding in this framework -- or if you're not looking at query understanding at all -- I urge you to consider it. It won't reduce the challenges, but it will help you tackle them in stages.
Google leak reveals secret China plans for censored search engine, prompting protests from employees
Google is secretly planning to launch a censored version of its search engine in China within the next year, a leaked transcript seems to reveal. According to The Intercept, Google's search engine chief Ben Gomes held a meeting in July to discuss the progress of a new search engine, dubbed Project Dragonfly. The platform would blacklist words and phrases like "human rights," "Nobel Prize," and "student protest," in order to conform with China's strict censorship laws. "You have taken on something extremely important to the company," Mr Gomes told the Google employees, according to the transcript obtained by the publication. "I have to admit it has been a difficult journey. But I do think a very important and worthwhile one. And I wish ourselves the best of luck in actually reaching our destination as soon as possible."
Probabilistic Blocking with An Application to the Syrian Conflict
Steorts, Rebecca C., Shrivastava, Anshumali
Entity resolution seeks to merge databases as to remove duplicate entries where unique identifiers are typically unknown. We review modern blocking approaches for entity resolution, focusing on those based upon locality sensitive hashing (LSH). First, we introduce $k$-means locality sensitive hashing (KLSH), which is based upon the information retrieval literature and clusters similar records into blocks using a vector-space representation and projections. Second, we introduce a subquadratic variant of LSH to the literature, known as Densified One Permutation Hashing (DOPH). Third, we propose a weighted variant of DOPH. We illustrate each method on an application to a subset of the ongoing Syrian conflict, giving a discussion of each method.
POIReviewQA: A Semantically Enriched POI Retrieval and Question Answering Dataset
Mai, Gengchen, Janowicz, Krzysztof, He, Cheng, Liu, Sumang, Lao, Ni
Many services that perform information retrieval for Points of Interest (POI) utilize a Lucene-based setup with spatial filtering. While this type of system is easy to implement it does not make use of semantics but relies on direct word matches between a query and reviews leading to a loss in both precision and recall. To study the challenging task of semantically enriching POIs from unstructured data in order to support open-domain search and question answering (QA), we introduce a new dataset POIReviewQA. It consists of 20k questions (e.g."is this restaurant dog friendly?") for 1022 Yelp business types. For each question we sampled 10 reviews, and annotated each sentence in the reviews whether it answers the question and what the corresponding answer is. To test a system's ability to understand the text we adopt an information retrieval evaluation by ranking all the review sentences for a question based on the likelihood that they answer this question. We build a Lucene-based baseline model, which achieves 77.0% AUC and 48.8% MAP. A sentence embedding-based model achieves 79.2% AUC and 41.8% MAP, indicating that the dataset presents a challenging problem for future research by the GIR community. The result technology can help exploit the thematic content of web documents and social media for characterisation of locations.
Do you speak SEO? Join Search Engine Land as Deputy Editor - Search Engine Land
Third Door Media is currently looking for a Deputy Editor to help drive its industry-leading coverage for Search Engine Land. The Deputy Editor supports the Editor-in-Chief with the day-to-day editorial management of Search Engine Land, which includes tasks such as story planning, staff assignment management and editing. The Deputy Editor is also a high-volume reporter who creates news and feature copy daily for the brand. Search Engine Land is synonymous with SEO and SEM, so we're looking for someone who has either covered those topics extensively for a media brand, or has worked as a professional search marketer and has solid editorial chops. This is a remote position.
Senior Google Scientist Resigns Over "Forfeiture of Our Values" in China
A senior Google research scientist has quit the company in protest over its plan to launch a censored version of its search engine in China. Jack Poulson worked for Google's research and machine intelligence department, where he was focused on improving the accuracy of the company's search systems. In early August, Poulson raised concerns with his managers at Google after The Intercept revealed that the internet giant was secretly developing a Chinese search app for Android devices. The search system, code-named Dragonfly, was designed to remove content that China's authoritarian government views as sensitive, such as information about political dissidents, free speech, democracy, human rights, and peaceful protest. After entering into discussions with his bosses, Poulson decided in mid-August that he could no longer work for Google.
Amazon SEO: How to Rank Highly for Amazon Searches
All too often, when we think of SEO, we only think of Google. And of course you want great rankings in the search engines. However, your website isn't the only place on the web where you may be selling your product. If you have a product page on Amazon, you want it to be found by customers just as you would want your site to show up on the first search engine results page (SERP) for your industry keywords. Failure to do Amazon SEO right, just like with regular SEO, will result in less traffic and fewer sales.
How to Use Facebook Page Insights Like an Expert - Search Engine Journal
By now, most business owners and marketers know how important it is to have a Facebook business page. Facebook is a platform that provides an easy way for you and your customers and prospects to interact with each other. But even if you're sharing the right variety of content on your business Facebook page and responding to customer messages and comments in a timely manner, you're still not reaping all the benefits of your page if you don't also take advantage of Facebook Page Insights. This post will explain everything Facebook Page Insights can tell you and how you can use that information. This is a 2-part post where both Facebook Insights in part 1 and Facebook Analytics will be discussed in part 2. Once your business Facebook page has more than 30 fans, it's easy to access your analytics. Just go to your page and look at the top โ you'll find "Insights" between Notifications and Posts.
100 Font Collections by Various Design Purposes โ Eugene Sadko โ Medium
One of the main advantages of Rentafont is Font Search with very detailed font description base, consisting of 12,000 unique keywords and options in English, Russian, Ukrainian. More than 3000 fonts with Latin and Cyrillic are described in this way. This data will be useful in training of designers and artificial intelligence. If you generate a design using algorithms or neural networks, based on parametric creative brief, you will need fonts that are relevant to different activities, technical parameters and features of the target audience. These collections are answers to one keyword search queries.