Goto

Collaborating Authors

 Information Retrieval


Google Says it Doesn't Require Fixing Structured Data Warnings - Search Engine Journal

#artificialintelligence

In a Webmaster Hangout, an eCommerce publisher complained about structured data warnings regarding data fields that are inappropriate to their product. They refused to create fake information to get a passing score. John Mueller responded that there's a difference between warnings and errors. The person asking the question sold custom hand made products. They did not have a global identifier.


Real-world Conversational AI for Hotel Bookings

arXiv.org Machine Learning

Hussein Fazal SnapTravel Toronto, Canada hussein@snaptravel.com Abstract --In this paper, we present a real-world conversational AI system to search for and book hotels through text messaging. Our architecture consists of a frame-based dialogue management system, which calls machine learning models for intent classification, named entity recognition, and information retrieval subtasks. Our chatbot has been deployed on a commercial scale, handling tens of thousands of hotel searches every day. We describe the various opportunities and challenges of developing a chatbot in the travel industry. Index T erms--conversational AI, task-oriented chatbot, named entity recognition, information retrieval I. I NTRODUCTION Task-oriented chatbots have recently been applied to many areas in e-commerce.


Nearest Neighbor Search-Based Bitwise Source Separation Using Discriminant Winner-Take-All Hashing

arXiv.org Artificial Intelligence

We propose an iteration-free source separation algorithm based on Winner-Take-All (WTA) hash codes, which is a faster, yet accurate alternative to a complex machine learning model for single-channel source separation in a resource-constrained environment. We first generate random permutations with WTA hashing to encode the shape of the multidimensional audio spectrum to a reduced bitstring representation. A nearest neighbor search on the hash codes of an incoming noisy spectrum as the query string results in the closest matches among the hashed mixture spectra. Using the indices of the matching frames, we obtain the corresponding ideal binary mask vectors for denoising. Since both the training data and the search operation are bitwise, the procedure can be done efficiently in hardware implementations. Experimental results show that the WTA hash codes are discriminant and provide an affordable dictionary search mechanism that leads to a competent performance compared to a comprehensive model and oracle masking.


WordPress 3 Search Engine Optimization - Programmer Books

#artificialintelligence

WordPress is a powerful platform for creating feature-rich and attractive websites and blogs; but with a little extra tweaking and effort your WordPress site can dominate the search engines and bring thousands of new customers to your blog or business. WordPress3.0 Search Engine Optimization will show you the secrets that professional SEO companies use to take websites to the top of search results and proliferate their business. You'll be able to take your WordPress blog/site to the next level, as well as brush aside even the stiffest competition with this book in hand. We'll begin with a typical WordPress installation and with a variety of simple techniques, turn it into a powerful website that search engines will reward with high rankings. We'll go further: with advanced plug-ins we'll connect your WordPress site to popular social media sites and expand the reach of your site to bring more visitors.


Automatic Language Identification in Texts: A Survey

Journal of Artificial Intelligence Research

Language identification ("LI") is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used in the LI literature. We describe the features and methods using a unified notation, to make the relationships between methods clearer. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.


Google's John Mueller Answers if Linking Out Good for SEO - Search Engine Journal

#artificialintelligence

Google launched a new video series that answers a single question. The first episode was about links but in my opinion it did not adequately answer the question. "Does linking to other websites help or hurt SEO?" The SEO community has thought of outbound links as ranking signals since at least 2002. I hope to show you how and why outbound links for SEO was invented.


As Search Engines Increasingly Turn To AI They Are Harming Search

#artificialintelligence

For more than half a century our digital search engines have relied upon the humble keyword. Yet over the past few years, search engines of all kinds have increasingly turned to deep learning-powered categorization and recommendation algorithms to augment and slowly replace the traditional keyword search. Behavioral and interest-based personalization has further eroded the impact of keyword searches, meaning that if ten people all search for the same thing, they may all get different results. As search engines depreciate traditional raw "search" in favor of AI-assisted navigation, the concept of informational access is being harmed and our digital world is being redefined by the limitations of today's AI. At first glance, the evolution of search from simple TF-IDF keyword queries into today's AI-powered personalized digital navigation is a positive step towards making the digital world more accessible to the general public.


Intent term selection and refinement in e-commerce queries

arXiv.org Artificial Intelligence

In e-commerce, a user tends to search for the desired product by issuing a query to the search engine and examining the retrieved results. If the search engine was successful in correctly understanding the user's query, it will return results that correspond to the products whose attributes match the terms in the query that are representative of the query's product intent. However, the search engine may fail to retrieve results that satisfy the query's product intent and thus degrading user experience due to different issues in query processing: (i) when multiple terms are present in a query it may fail to determine the relevant terms that are representative of the query's product intent, and (ii) it may suffer from vocabulary gap between the terms in the query and the product's description, i.e., terms used in the query are semantically similar but different from the terms in the product description. Hence, identifying the terms that describe the query's product intent and predicting additional terms that describe the query's product intent better than the existing query terms to the search engine is an essential task in e-commerce search. In this paper, we leverage the historical query reformulation logs of a major e-commerce retailer to develop distant-supervised approaches to solve both these problems. Our approaches exploit the fact that the significance of a term is dependent upon the context (other terms in the neighborhood) in which it is used in order to learn the importance of the term towards the query's product intent. We show that identifying and emphasizing the terms that define the query's product intent leads to a 3% improvement in ranking. Moreover, for the tasks of identifying the important terms in a query and for predicting the additional terms that represent product intent, experiments illustrate that our approaches outperform the non-contextual baselines.


Towards Effective Device-Aware Federated Learning

arXiv.org Machine Learning

With the wealth of information produced by social networks, smartphones, medical or financial applications, speculations have been raised about the sensitivity of such data in terms of users' personal privacy and data security. To address the above issues, Federated Learning (FL) has been recently proposed as a means to leave data and computational resources distributed over a large number of nodes (clients) where a central coordinating server aggregates only locally computed updates without knowing the original data. In this work, we extend the FL framework by pushing forward the state the art in the field on several dimensions: (i) unlike the original FedAvg approach relying solely on single criteria (i.e., local dataset size), a suite of domain- and client-specific criteria constitute the basis to compute each local client's contribution, (ii) the multi-criteria contribution of each device is computed in a prioritized fashion by leveraging a priority-aware aggregation operator used in the field of information retrieval, and (iii) a mechanism is proposed for online-adjustment of the aggregation operator parameters via a local search strategy with backtracking. Extensive experiments on a publicly available dataset indicate the merits of the proposed approach compared to standard FedAvg baseline.


Learning Representations and Agents for Information Retrieval

arXiv.org Artificial Intelligence

A goal shared by artificial intelligence and information retrieval is to create an oracle, that is, a machine that can answer our questions, no matter how difficult they are. A more limited, but still instrumental, version of this oracle is a question-answering system, in which an open-ended question is given to the machine, and an answer is produced based on the knowledge it has access to. Such systems already exist and are increasingly capable of answering complicated questions. This progress can be partially attributed to the recent success of machine learning and to the efficient methods for storing and retrieving information, most notably through web search engines. One can imagine that this general-purpose question-answering system can be built as a billion-parameters neural network trained end-to-end with a large number of pairs of questions and answers. We argue, however, that although this approach has been very successful for tasks such as machine translation, storing the world's knowledge as parameters of a learning machine can be very hard. A more efficient way is to train an artificial agent on how to use an external retrieval system to collect relevant information. This agent can leverage the effort that has been put into designing and running efficient storage and retrieval systems by learning how to best utilize them to accomplish a task. ...