Information Retrieval
How to develop an integrated Paid Search and SEO strategy for e-commerce Smart Insights
When it comes to digital marketing, pay per click (PPC) advertising and search engine optimisation (SEO) are arguably two sides of the same coin. However, all too often companies will focus on one at the expense of the other. At ClickThrough Marketing we provide an integrated digital marketing approach. Working with large-scale e-commerce sites, we have learnt the importance of combining PPC and SEO activities to gain greater client and market insight as well as streamline our own internal activities. Here, we look at four ways in which PPC and SEO can work together to deliver better results across the board.
Generating High-Quality Query Suggestion Candidates for Task-Based Search
Ding, Heng, Zhang, Shuo, Garigliotti, Darío, Balog, Krisztian
We address the task of generating query suggestions for task-based search. The current state of the art relies heavily on suggestions provided by a major search engine. In this paper, we solve the task without reliance on search engines. Specifically, we focus on the first step of a two-stage pipeline approach, which is dedicated to the generation of query suggestion candidates. We present three methods for generating candidate suggestions and apply them on multiple information sources. Using a purpose-built test collection, we find that these methods are able to generate high-quality suggestion candidates.
Comparison Based Learning from Weak Oracles
Kazemi, Ehsan, Chen, Lin, Dasgupta, Sanjoy, Karbasi, Amin
There is increasing interest in learning algorithms that involve interaction between human and machine. Comparison-based queries are among the most natural ways to get feedback from humans. A challenge in designing comparison-based interactive learning algorithms is coping with noisy answers. The most common fix is to submit a query several times, but this is not applicable in many situations due to its prohibitive cost and due to the unrealistic assumption of independent noise in different repetitions of the same query. In this paper, we introduce a new weak oracle model, where a non-malicious user responds to a pairwise comparison query only when she is quite sure about the answer. This model is able to mimic the behavior of a human in noise-prone regions. We also consider the application of this weak oracle model to the problem of content search (a variant of the nearest neighbor search problem) through comparisons. More specifically, we aim at devising efficient algorithms to locate a target object in a database equipped with a dissimilarity metric via invocation of the weak comparison oracle. We propose two algorithms termed WORCS-I and WORCS-II (Weak-Oracle Comparison-based Search), which provably locate the target object in a number of comparisons close to the entropy of the target distribution. While WORCS-I provides better theoretical guarantees, WORCS-II is applicable to more technically challenging scenarios where the algorithm has limited access to the ranking dissimilarity between objects. A series of experiments validate the performance of our proposed algorithms.
Information Retrieval Document Search Engine in R
In this post, we learn about building a basic search engine or document retrieval system using Vector space model. This use case is widely used in information retrieval systems. Given a set of documents and search term(s)/query we need to retrieve relevant documents that are similar to the search query.
WHInter: A Working set algorithm for High-dimensional sparse second order Interaction models
Morvan, Marine Le, Vert, Jean-Philippe
Learning sparse linear models with two-way interactions is desirable in many application domains such as genomics. l1-regularised linear models are popular to estimate sparse models, yet standard implementations fail to address specifically the quadratic explosion of candidate two-way interactions in high dimensions, and typically do not scale to genetic data with hundreds of thousands of features. Here we present WHInter, a working set algorithm to solve large l1-regularised problems with two-way interactions for binary design matrices. The novelty of WHInter stems from a new bound to efficiently identify working sets while avoiding to scan all features, and on fast computations inspired from solutions to the maximum inner product search problem. We apply WHInter to simulated and real genetic data and show that it is more scalable and two orders of magnitude faster than the state of the art.
Agent Assist: Automating Enterprise IT Support Help Desks
Mani, Senthil (IBM Research AI) | Gantayat, Neelamadhav (IBM Research AI) | Aralikatte, Rahul (IBM Research AI) | Gupta, Monika (IBM Research AI) | Dechu, Sampath (IBM Research AI) | Sankaran, Anush (IBM Research AI) | Khare, Shreya (IBM Research AI) | Mitchell, Barry (IBM Global Business Services) | Subramanian, Hemamalini (IBM Global Business Services) | Venkatarangan, Hema (IBM Global Business Services)
In this paper, we present Agent Assist, a virtual assistant which helps IT support staff to resolve tickets faster. It is essentially a conversation system which provides procedural and often complex answers to queries. This system can ingest knowledge from various sources like application documentation, ticket management systems and knowledge transfer video recordings. It uses an ensemble of techniques like question classification, knowledge graph based disambiguation, information retrieval, etc., to provide quick and relevant solutions to problems from various technical domains and is currently being used in more than 650 projects within IBM.
Product Quantized Translation for Fast Nearest Neighbor Search
Hwang, Yoonho (Pohang University of Science and Technology (POSTECH)) | Baek, Mooyeol (Pohang University of Science and Technology (POSTECH)) | Kim, Saehoon (Pohang University of Science and Technology (POSTECH)) | Han, Bohyung (Pohang University of Science and Technology (POSTECH)) | Ahn, Hee-Kap (Pohang University of Science and Technology (POSTECH))
This paper proposes a simple nearest neighbor search algorithm, which provides the exact solution in terms of the Euclidean distance efficiently. Especially, we present an interesting approach to improve the speed of nearest neighbor search by proper translations of data and query although the task is inherently invariant to the Euclidean transformations. The proposed algorithm aims to eliminate nearest neighbor candidates effectively using their distance lower bounds in nonlinear embedded spaces, and further improves the lower bounds by transforming data and query through product quantized translations. Although our framework is composed of simple operations only, it achieves the state-of-the-art performance compared to existing nearest neighbor search techniques, which is illustrated quantitatively using various large-scale benchmark datasets in different sizes and dimensions.
Constructing Domain-Specific Search Engines With No Programming
Kejriwal, Mayank (USC Information Sciences Institute) | Szekely, Pedro (USC Information Sciences Institute)
Users machine learning, becomes ever more complicated, there is can also input their glossaries to seed knowledge graph construction a need to build interactive systems with powerful capabilities for certain attributes. For example, one could input that can be accessed and used by nontechnical domain a glossary of stock ticker symbols to seed the extractions for experts. Such capabilities are especially useful on crawled an attribute'Stock Tickers'. Web data, since many interesting phenomena worthy of social In the second domain exploration phase, domain experts or investigative interest (like fraud), have a significant use the search engine for gaining further insight into domain Web presence. We propose a demonstration of myDIG, a properties and characteristics, and in the case of investigative system that ingests a corpus of webpages stored in a distributed domains, both generating and investigating leads.
Fast Approximate Nearest Neighbor Search via k-Diverse Nearest Neighbor Graph
Xiao, Yan (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences ) | Guo, Jiafeng (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences ) | Lan, Yanyan (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences) | Xu, Jun (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences) | Cheng, Xueqi (University of Chinese Academy of Sciences; Institute of Computing Technology, Chinese Academy of Sciences)
Approximate nearest neighbor search is a fundamental problem and has been studied for a few decades. Recently graph-based indexing methods have demonstrated their great efficiency, whose main idea is to construct neighborhood graph offline and perform a greedy search starting from some sampled points of the graph online. Most existing graph-based methods focus on either the precise k-nearest neighbor (k-NN) graph which has good exploitation ability, or the diverse graph which has good exploration ability. In this paper, we propose the k-diverse nearest neighbor (k-DNN) graph, which balances the precision and diversity of the graph, leading to good exploitation and exploration abilities simultaneously. We introduce an efficient indexing algorithm for the construction of the k-DNN graph inspired by a well-known diverse ranking algorithm in information retrieval (IR). Experimental results show that our method can outperform both state-of-the-art precise graph and diverse graph methods.
FgER: Fine-Grained Entity Recognition
Abhishek, Abhishek (Indian Institute of Technology Guwahati)
Fine-grained Entity Recognition (FgER) is the task of detecting and classifying entity mentions into more than 100 types. The type set can span various domains including biomedical (e.g., disease, gene), sport (e.g., sports event, sports player), religion and mythology (e.g., religion, god) and entertainment (e.g., movies, music). Most of the existing literature for Entity Recognition (ER) focuses on coarse-grained entity recognition (CgER), i.e., recognition of entities belonging to few types such as person, location and organization. In the past two decades, several manually annotated datasets spanning different genre of texts were created to facilitate the development and evaluation of CgER systems (Nadeau and Sekine 2007). The state-of-the-art CgER systems use supervised statistical learning models trained on manually annotated datasets (Ma and Hovy 2016). In contrast, FgER systems are yet to match the performance level of CgER systems. There are two major challenges associated with failure of FgER systems. First, manually annotating a large-scale multi-genre training data for FgER task is expensive, time-consuming and error-prone. Note that, a human-annotator will have to choose a subset of types from a large set of types and types for the same entity might differ in sentences based on the contextual information. Second, supervised statistical learning models when trained on automatically generated noisy training data fits to noise, impacting the model’s performance. The objective of my thesis is to create a FgER system by exploring an off the beaten path which can eliminate the need for manually annotating large-scale multi-genre training dataset. The path includes: (1) automatically generating a large-scale single-genre training dataset, (2) noise-aware learning models that learn better in noisy datasets, and (3) use of knowledge transfer approaches to adapt FgER system to different genres of text.