Goto

Collaborating Authors

 Information Retrieval


Bing SEO: Website Optimization Guide & Free SEO Tools

#artificialintelligence

Bing, previously known as Microsoft Live Search, is a search engine with over 40% market share in the US. Bing's SEO guide is aimed at helping small business owners to better optimize their websites for traffic and leads. If you're already using Google's SEO solutions (Analytics, Search Console, etc.) then Bing's guide will give you an edge over your competitors by showing you what additional steps to take and tools to use in order to increase your website traffic. The free SEO analysis tool is an extension for Chrome that automatically analyzes any page that it loads and tells you how well optimized the page is for Bing. Based on its results, you can then use this Bing's SEO guide to optimize your website further.


Google opens up about how Maps review moderation works โ€“ Search Engine Land

#artificialintelligence

Review moderation powered by machine learning. User reviews are sent to Google's moderation system as soon as they're submitted.


Doubly Robust Off-Policy Evaluation for Ranking Policies under the Cascade Behavior Model

arXiv.org Machine Learning

In real-world recommender systems and search engines, optimizing ranking decisions to present a ranked list of relevant items is critical. Off-policy evaluation (OPE) for ranking policies is thus gaining a growing interest because it enables performance estimation of new ranking policies using only logged data. Although OPE in contextual bandits has been studied extensively, its naive application to the ranking setting faces a critical variance issue due to the huge item space. To tackle this problem, previous studies introduce some assumptions on user behavior to make the combinatorial item space tractable. However, an unrealistic assumption may, in turn, cause serious bias. Therefore, appropriately controlling the bias-variance tradeoff by imposing a reasonable assumption is the key for success in OPE of ranking policies. To achieve a well-balanced bias-variance tradeoff, we propose the Cascade Doubly Robust estimator building on the cascade assumption, which assumes that a user interacts with items sequentially from the top position in a ranking. We show that the proposed estimator is unbiased in more cases compared to existing estimators that make stronger assumptions. Furthermore, compared to a previous estimator based on the same cascade assumption, the proposed estimator reduces the variance by leveraging a control variate. Comprehensive experiments on both synthetic and real-world data demonstrate that our estimator leads to more accurate OPE than existing estimators in a variety of settings.


Two minutes NLP -- Learn TF-IDF with easy examples

#artificialintelligence

TF-IDF (Term Frequency-Inverse Document Frequency) is a way of measuring how relevant a word is to a document in a collection of documents. TF-IDF has many uses, such as in information retrieval, text analysis, keyword extraction, and as a way of obtaining numeric features from text for machine learning algorithms. TF-IDF was first designed for document search and information retrieval, where a query is run and the system has to find the most relevant documents. Suppose the query is the text "The bug". The system would give each document a higher score proportionally to the frequencies of the query words found in the document, weighting more rare words like "bug" with respect to common words like "the".


5 Alternatives to Search Engine Optimization - DataScienceCentral.com

#artificialintelligence

It is not a coincidence that search engine optimization is the'holy cow' of internet traffic. It is responsible for more than half of it. Every second, Google alone processes nearly 100,000 search queries. Therefore, content creators do their best to exploit SEO tricks and gimmicks to their advantage and push their websites to the top of the search. People with deep knowledge of search engine optimization can easily find jobs all around the world.


Reinforcement Learning Based Query Vertex Ordering Model for Subgraph Matching

arXiv.org Artificial Intelligence

Subgraph matching is a fundamental problem in various fields that use graph structured data. Subgraph matching algorithms enumerate all isomorphic embeddings of a query graph q in a data graph G. An important branch of matching algorithms exploit the backtracking search approach which recursively extends intermediate results following a matching order of query vertices. It has been shown that the matching order plays a critical role in time efficiency of these backtracking based subgraph matching algorithms. In recent years, many advanced techniques for query vertex ordering (i.e., matching order generation) have been proposed to reduce the unpromising intermediate results according to the preset heuristic rules. In this paper, for the first time we apply the Reinforcement Learning (RL) and Graph Neural Networks (GNNs) techniques to generate the high-quality matching order for subgraph matching algorithms. Instead of using the fixed heuristics to generate the matching order, our model could capture and make full use of the graph information, and thus determine the query vertex order with the adaptive learning-based rule that could significantly reduces the number of redundant enumerations. With the help of the reinforcement learning framework, our model is able to consider the long-term benefits rather than only consider the local information at current ordering step.Extensive experiments on six real-life data graphs demonstrate that our proposed matching order generation technique could reduce up to two orders of magnitude of query processing time compared to the state-of-the-art algorithms.


Less is Less: When Are Snippets Insufficient for Human vs Machine Relevance Estimation?

arXiv.org Artificial Intelligence

Traditional information retrieval (IR) ranking models process the full text of documents. Newer models based on Transformers, however, would incur a high computational cost when processing long texts, so typically use only snippets from the document instead. The model's input based on a document's URL, title, and snippet (UTS) is akin to the summaries that appear on a search engine results page (SERP) to help searchers decide which result to click. This raises questions about when such summaries are sufficient for relevance estimation by the ranking model or the human assessor, and whether humans and machines benefit from the document's full text in similar ways. To answer these questions, we study human and neural model based relevance assessments on 12k query-documents sampled from Bing's search logs. We compare changes in the relevance assessments when only the document summaries and when the full text is also exposed to assessors, studying a range of query and document properties, e.g., query type, snippet length. Our findings show that the full text is beneficial for humans and a BERT model for similar query and document types, e.g., tail, long queries. A closer look, however, reveals that humans and machines respond to the additional input in very different ways. Adding the full text can also hurt the ranker's performance, e.g., for navigational queries.


How Artificial Intelligence Is Powering Search Engines - DataScienceCentral.com

#artificialintelligence

Whether you are a customer searching for your favorite products online, a writer looking for the latest statistics, or a business owner learning SEO skills, you are using a search engine to get answers. And search engines are pretty interesting! You open up your favorite one, add some related keywords and click to search. Within a fraction of a second, you get thousands of results for your entered keyword. Search engines can perform the way they do because of the algorithms they have and a lot of brilliant people powering them.


Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals

arXiv.org Artificial Intelligence

Publicly accessible benchmarks that allow for assessing and comparing model performances are important drivers of progress in artificial intelligence (AI). While recent advances in AI capabilities hold the potential to transform medical practice by assisting and augmenting the cognitive processes of healthcare professionals, the coverage of clinically relevant tasks by AI benchmarks is largely unclear. Furthermore, there is a lack of systematized meta-information that allows clinical AI researchers to quickly determine accessibility, scope, content and other characteristics of datasets and benchmark datasets relevant to the clinical domain. To address these issues, we curated and released a comprehensive catalogue of datasets and benchmarks pertaining to the broad domain of clinical and biomedical natural language processing (NLP), based on a systematic review of literature and online resources. A total of 450 NLP datasets were manually systematized and annotated with rich metadata, such as targeted tasks, clinical applicability, data types, performance metrics, accessibility and licensing information, and availability of data splits. We then compared tasks covered by AI benchmark datasets with relevant tasks that medical practitioners reported as highly desirable targets for automation in a previous empirical study. Our analysis indicates that AI benchmarks of direct clinical relevance are scarce and fail to cover most work activities that clinicians want to see addressed. In particular, tasks associated with routine documentation and patient data administration workflows are not represented despite significant associated workloads. Thus, currently available AI benchmarks are improperly aligned with desired targets for AI automation in clinical settings, and novel benchmarks should be created to fill these gaps.


Content metadata: why keyword extraction requires automated labelling -- EDIA

#artificialintelligence

Keywords are no science but an art. There is no such thing as'the right keyword,' as we're talking about a core concept incorporated into a piece of content in the broadest form. Texts don't necessarily need to contain an exact keyword. For example, if the term'European Union' is used several times, 'European Commission' may be a suitable keyword even though the writer never uses the term. Despite this fluid definition, keywords should be understandable to those who try to find the right ones.