AITopics

Industry:

Information Technology > Services (0.33)
Retail (0.31)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Adolphs, Leonard, Boerschinger, Benjamin, Buck, Christian, Huebscher, Michelle Chen, Ciaramita, Massimiliano, Espeholt, Lasse, Hofmann, Thomas, Kilcher, Yannic

Boosting Search Engines with Interactive Agents

arXiv.org Artificial IntelligenceSep-1-2021

Can machines learn to use a search engine as an interactive tool for finding information? That would have far reaching consequences for making the world's knowledge more accessible. This paper presents first steps in designing agents that learn meta-strategies for contextual query refinements. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and search results. We develop a novel way of generating synthetic search sessions, which leverages the power of transformer-based generative language models through (self-)supervised learning. We also present a reinforcement learning agent with dynamically constrained actions that can learn interactive search strategies completely from scratch. In both cases, we obtain significant improvements over one-shot search with a strong information retrieval baseline. Finally, we provide an in-depth analysis of the learned search policies.

agent, doc title, latexit sha1, (14 more...)

2109.00527

Country:

Asia > North Korea (0.14)
Asia > South Korea (0.14)
Europe > Russia (0.04)
(10 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Media > Film (1.00)
Government > Military (0.67)
Media > Television (0.67)
Leisure & Entertainment > Sports > Baseball (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Yu, HongChien, Xiong, Chenyan, Callan, Jamie

Improving Query Representations for Dense Retrieval with Pseudo Relevance Feedback

arXiv.org Artificial IntelligenceAug-30-2021

Retrieval with dense, fully-learned representations has the potential to address some fundamental challenges in sparse retrieval. Dense retrieval systems conduct first-stage retrieval using embedded For example, vocabulary mismatch can be solved if the embeddings representations and simple similarity metrics to match a query accurately capture the information need behind a query and to documents. Its effectiveness depends on encoded embeddings maps it to relevant documents. However, decades of IR research to capture the semantics of queries and documents, a challenging demonstrates that inferring a user's search intent from a concise task due to the shortness and ambiguity of search queries. This and often ambiguous search query is challenging [7]. Even with paper proposes ANCE-PRF, a new query encoder that uses pseudo powerful pre-trained language models, it is unrealistic to expect an relevance feedback (PRF) to improve query representations for encoder to perfectly embed the underlying information need from dense retrieval. ANCE-PRF uses a BERT encoder that consumes a few query terms.

proceedings, query, retrieval, (13 more...)

2108.13454

Country:

Oceania > Australia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > British Indian Ocean Territory > Diego Garcia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.87)

#artificialintelligenceAug-27-2021, 15:02:10 GMT

How to use machine learning (if you can't code) to help your keyword research

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

information retrieval, machine learning, natural language, (20 more...)

Country: Europe > United Kingdom (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

#artificialintelligenceAug-26-2021, 07:40:31 GMT

Keyword Extraction API - BytesView

Keyword extraction also known as keyword detection is a machine learning technique that can help you automate the identification and extraction of relevant information from unstructured text data. BytesView's efficient keyword extraction tool can analyze unstructured text including customer feedback, emails, surveys, social media posts, etc. Pre-define tags to identify topical content, business intelligence, customer opinions, and recurring tickets.

bytesview, keyword extraction api

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

#artificialintelligenceAug-26-2021, 02:30:46 GMT

Google Continues To Pay Apple Billions To Remain Safari's Default Search Engine

According to a report from Ped30, they have gotten their hands on an investor's note from Bernstein's analysts where they are claiming that Google is now paying Apple as much as $15 billion in 2021 to remain Safari's default search. This is higher than what Google had paid Apple in 2020 at $10 billion, and it seems that this figure is only expected to grow. According to the analysts, "We now estimate that Google's payments to AAPL to be the default search engine on iOS were $10B in FY 20, higher than our prior published model estimate of $8B. Recent disclosures in Apple's public filings as well as a bottom-up analysis of Google's TAC (traffic acquisition costs) payments each point us to this figure…We now forecast that Google's payments to Apple might be nearly $15B in FY 21, contribute an amazing 850 bps to Services growth YoY, and amount to 9% of company gross profits." They go on to estimate that this figure will jump to $18-$20 billion in 2022, and the reason behind the increase in payments is because Google wants to ensure that Microsoft (and other competitors) don't outbid them.

default search engine, google, safari, (3 more...)

Technology:

Information Technology > Information Management > Search (0.64)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)

Bénédict, Gabriel, Koops, Vincent, Odijk, Daan, de Rijke, Maarten

sigmoidF1: A Smooth F1 Score Surrogate Loss for Multilabel Classification

arXiv.org Machine LearningAug-24-2021

Multiclass multilabel classification refers to the task of attributing multiple labels to examples via predictions. Current models formulate a reduction of that multilabel setting into either multiple binary classifications or multiclass classification, allowing for the use of existing loss functions (sigmoid, cross-entropy, logistic, etc.). Empirically, these methods have been reported to achieve good performance on different metrics (F1 score, Recall, Precision, etc.). Theoretically though, the multilabel classification reductions does not accommodate for the prediction of varying numbers of labels per example and the underlying losses are distant estimates of the performance metrics. We propose a loss function, sigmoidF1. It is an approximation of the F1 score that (I) is smooth and tractable for stochastic gradient descent, (II) naturally approximates a multilabel metric, (III) estimates label propensities and label counts. More generally, we show that any confusion matrix metric can be formulated with a smooth surrogate. We evaluate the proposed loss function on different text and image datasets, and with a variety of metrics, to account for the complexity of multilabel classification evaluation. In our experiments, we embed the sigmoidF1 loss in a classification head that is attached to state-of-the-art efficient pretrained neural networks MobileNetV2 and DistilBERT. Our experiments show that sigmoidF1 outperforms other loss functions on four datasets and several metrics. These results show the effectiveness of using inference-time metrics as loss function at training time in general and their potential on non-trivial classification problems like multilabel classification.

classification, dataset, loss function, (14 more...)

arXiv.org Machine Learning

2108.10566

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceAug-22-2021

QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction

Zhang, Danqing, Li, Zheng, Cao, Tianyu, Luo, Chen, Wu, Tony, Lu, Hanqing, Song, Yiwei, Yin, Bing, Zhao, Tuo, Yang, Qiang

We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: {named entity recognition (NER)} and {attribute value normalization (AVN)}. However, existing works only focus on the NER phase but neglect equally important AVN. To bridge this gap, this paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO, which involves both two phases. Moreover, by leveraging large-scale weakly-labeled behavior data, we further improve the extraction performance with less supervision cost. Specifically, for the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels to refine the weakly-labeled data for training a student network. Meanwhile, the teacher network can be dynamically adapted by the feedback of the student's performance on strongly-labeled data to maximally denoise the noisy supervisions from the weak labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products. Extensive experiments on a real-world large-scale E-commerce dataset demonstrate the effectiveness of QUEACO.

proceedings, query, weakly-labeled data, (15 more...)

doi: 10.1145/3459637.3481946

2108.08468

Country:

Oceania > Australia (0.05)
North America > Canada (0.04)
Europe > Germany (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Services > e-Commerce Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

#artificialintelligenceAug-21-2021, 01:30:38 GMT

Azure Synapse Analytics Serverless SQL Pool Guidelines

With the introduction of the serverless SQL pool as a part of Azure Synapse Analytics, Microsoft has provided a very cost-efficient and convenient way to drive value from data residing in lakes using simple T-SQL statements. It enables you to easily build logical analytical models by querying and joining data across heterogeneous sources making the development of complex data integration pipelines obsolete in many cases. To use it, you don't even need to explicitly provision it beforehand due to its serverless nature, it is per default part of an Azure Synapse Analytics workspace. All you have to do is query data in an on-demand fashion in which you get charged according to the amount of data your queries need to process. Yet, the flexibility provided in terms of how data can be stored and queried require you to stick to some conventions for properly applying all its features and functionalities. Otherwise, the once promising serverless query engine can end up causing lots of costs together with a poor performance.

analytic serverless sql pool guideline, azure synapse analytic, microsoft, (6 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.36)

arXiv.org Artificial IntelligenceAug-21-2021

Towards Personalized and Human-in-the-Loop Document Summarization

Ghodratnama, Samira

The ubiquitous availability of computing devices and the widespread use of the internet have generated a large amount of data continuously. Therefore, the amount of available information on any given topic is far beyond humans' processing capacity to properly process, causing what is known as information overload. To efficiently cope with large amounts of information and generate content with significant value to users, we require identifying, merging and summarising information. Data summaries can help gather related information and collect it into a shorter format that enables answering complicated questions, gaining new insight and discovering conceptual boundaries. This thesis focuses on three main challenges to alleviate information overload using novel summarisation techniques. It further intends to facilitate the analysis of documents to support personalised information extraction. This thesis separates the research issues into four areas, covering (i) feature engineering in document summarisation, (ii) traditional static and inflexible summaries, (iii) traditional generic summarisation approaches, and (iv) the need for reference summaries. We propose novel approaches to tackle these challenges, by: i)enabling automatic intelligent feature engineering, ii) enabling flexible and interactive summarisation, iii) utilising intelligent and personalised summarisation approaches. The experimental results prove the efficiency of the proposed approaches compared to other state-of-the-art models. We further propose solutions to the information overload problem in different domains through summarisation, covering network traffic data, health data and business process data.

automatic intelligent feature engineering, computational natural language learning, iot-enabled process data analytic pipeline, (12 more...)

2108.09443

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
Europe > Czechia > Prague (0.04)
North America > United States > New York (0.04)
(22 more...)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education (0.92)
(7 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Communications > Social Media (1.00)
(16 more...)