AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

An Unsupervised Normalization Algorithm for Noisy Text: A Case Study for Information Retrieval and Stance Detection

Roy, Anurag, Ghosh, Shalmoli, Ghosh, Kripabandhu, Ghosh, Saptarshi

arXiv.org Artificial IntelligenceJan-9-2021

A large fraction of textual data available today contains various types of 'noise', such as OCR noise in digitized documents, noise due to informal writing style of users on microblogging sites, and so on. To enable tasks such as search/retrieval and classification over all the available data, we need robust algorithms for text normalization, i.e., for cleaning different kinds of noise in the text. There have been several efforts towards cleaning or normalizing noisy text; however, many of the existing text normalization methods are supervised and require language-dependent resources or large amounts of training data that is difficult to obtain. We propose an unsupervised algorithm for text normalization that does not need any training data / human intervention. The proposed algorithm is applicable to text over different languages, and can handle both machine-generated and human-generated noise. Experiments over several standard datasets show that text normalization through the proposed algorithm enables better retrieval and stance detection, as compared to that using several baseline text normalization methods. Implementation of our algorithm can be found at https://github.com/ranarag/UnsupClean.

algorithm, dataset, similarity, (13 more...)

arXiv.org Artificial Intelligence

2101.03303

Country:

North America > United States (0.28)
Asia > India > West Bengal > Kharagpur (0.04)
North America > Cuba (0.04)
(3 more...)

Genre:

Workflow (0.68)
Research Report > Experimental Study (0.46)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Context-Aware Target Apps Selection and Recommendation for Enhancing Personal Mobile Assistants

Aliannejadi, Mohammad, Zamani, Hamed, Crestani, Fabio, Croft, W. Bruce

arXiv.org Artificial IntelligenceJan-9-2021

Users install many apps on their smartphones, raising issues related to information overload for users and resource management for devices. Moreover, the recent increase in the use of personal assistants has made mobile devices even more pervasive in users' lives. This paper addresses two research problems that are vital for developing effective personal mobile assistants: target apps selection and recommendation. The former is the key component of a unified mobile search system: a system that addresses the users' information needs for all the apps installed on their devices with a unified mode of access. The latter, instead, predicts the next apps that the users would want to launch. Here we focus on context-aware models to leverage the rich contextual information available to mobile devices. We design an in situ study to collect thousands of mobile queries enriched with mobile sensor data (now publicly available for research purposes). With the aid of this dataset, we study the user behavior in the context of these tasks and propose a family of context-aware neural models that take into account the sequential, temporal, and personal behavior of users. We study several state-of-the-art models and show that the proposed models significantly outperform the baselines.

app, information, query, (13 more...)

arXiv.org Artificial Intelligence

2101.03394

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Switzerland (0.04)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.92)

Industry:

Media (0.93)
Information Technology > Services (0.92)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Communications > Mobile (1.00)
(4 more...)

Add feedback

Incorporating Vision Bias into Click Models for Image-oriented Search Engine

Xu, Ningxin, Yang, Cheng, Zhu, Yixin, Hu, Xiaowei, Wang, Changhu

arXiv.org Artificial IntelligenceJan-7-2021

Most typical click models assume that the probability of a document to be examined by users only depends on position, such as PBM and UBM. It works well in various kinds of search engines. However, in a search engine where massive candidate documents display images as responses to the query, the examination probability should not only depend on position. The visual appearance of an image-oriented document also plays an important role in its opportunity to be examined. In this paper, we assume that vision bias exists in an image-oriented search engine as another crucial factor affecting the examination probability aside from position. Specifically, we apply this assumption to classical click models and propose an extended model, to better capture the examination probabilities of documents. We use regression-based EM algorithm to predict the vision bias given the visual features extracted from candidate documents. Empirically, we evaluate our model on a dataset developed from a real-world online image-oriented search engine, and demonstrate that our proposed model can achieve significant improvements over its baseline model in data fitness and sparsity handling.

click model, vision bias, visual feature, (14 more...)

arXiv.org Artificial Intelligence

2101.02459

Country:

Asia > Singapore (0.05)
Asia > China (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Modeling Global Semantics for Question Answering over Knowledge Bases

Wu, Peiyun, Wu, Yunjie, Wu, Linjuan, Zhang, Xiaowang, Feng, Zhiyong

arXiv.org Artificial IntelligenceJan-5-2021

Semantic parsing, as an important approach However, the state-of-the-art semantic parsing approaches to question answering over knowledge bases utilize relational semantics of query graphs with pay little attention (KBQA), transforms a question into the complete to the structure semantics of a question. The structure query graph for further generating the correct logical semantics is an important part of the whole semantics query. Existing semantic parsing approaches of questions (e.g., Figure 1), especially in complex questions mainly focus on relations matching with paying where the complexity of a question often relies on its complicated less attention to the underlying internal structure structure. As a result, existing works only consider relational of questions (e.g., the dependencies and relations semantics cannot always perform complex questions between all entities in a question) to select the better. So it is necessary to pay more attention to the structure query graph. In this paper, we present a relational semantics of questions together with relational semantics graph convolutional network (RGCN)-based model when semantic parsing in KBQA. However, to model multirelational gRGCN for semantic parsing in KBQA.

artificial intelligence, information retrieval query processing, natural language, (18 more...)

arXiv.org Artificial Intelligence

2101.0151

Country: Asia > China > Tianjin Province > Tianjin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.82)

Add feedback

A Survey on Advancing the DBMS Query Optimizer: Cardinality Estimation, Cost Model, and Plan Enumeration

Lan, Hai, Bao, Zhifeng, Peng, Yuwei

arXiv.org Artificial IntelligenceJan-5-2021

Query optimizer is at the heart of the database systems. Cost-based optimizer studied in this paper is adopted in almost all current database systems. A cost-based optimizer introduces a plan enumeration algorithm to find a (sub)plan, and then uses a cost model to obtain the cost of that plan, and selects the plan with the lowest cost. In the cost model, cardinality, the number of tuples through an operator, plays a crucial role. Due to the inaccuracy in cardinality estimation, errors in cost model, and the huge plan space, the optimizer cannot find the optimal execution plan for a complex query in a reasonable time. In this paper, we first deeply study the causes behind the limitations above. Next, we review the techniques used to improve the quality of the three key components in the cost-based optimizer, cardinality estimation, cost model, and plan enumeration. We also provide our insights on the future directions for each of the above aspects.

cost model, estimation, query, (12 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s41019-020-00149-7

2101.01507

Country:

Europe > Austria > Vienna (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(4 more...)

Add feedback

Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering

Zhu, Fengbin, Lei, Wenqiang, Wang, Chao, Zheng, Jianming, Poria, Soujanya, Chua, Tat-Seng

arXiv.org Artificial IntelligenceJan-3-2021

Open-domain Question Answering (OpenQA) is an important task in Natural Language Processing (NLP), which aims to answer a question in the form of natural language based on large-scale unstructured documents. Recently, there has been a surge in the amount of research literature on OpenQA, particularly on techniques that integrate with neural Machine Reading Comprehension (MRC). While these research works have advanced performance to new heights on benchmark datasets, they have been rarely covered in existing surveys on QA systems. In this work, we review the latest research trends in OpenQA, with particular attention to systems that incorporate neural MRC techniques. Specifically, we begin with revisiting the origin and development of OpenQA systems. We then introduce modern OpenQA architecture named ``Retriever-Reader'' and analyze the various systems that follow this architecture as well as the specific techniques adopted in each of the components. We then discuss key challenges to developing OpenQA systems and offer an analysis of benchmarks that are commonly used. We hope our work would enable researchers to be informed of the recent advancement and also the open challenges in OpenQA research, so as to stimulate further progress in this field.

computational linguistic, openqa system, proceedings, (12 more...)

arXiv.org Artificial Intelligence

2101.00774

Country:

Asia > Singapore (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > China (0.04)
(6 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Education (0.89)
Media > News (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Neural document expansion for ad-hoc information retrieval

Tang, Cheng, Arnold, Andrew

arXiv.org Artificial IntelligenceDec-27-2020

Recently, Nogueira et al. [2019] proposed a new approach to document expansion based on a neural Seq2Seq model, showing significant improvement on short text retrieval task. However, this approach needs a large amount of in-domain training data. In this paper, we show that this neural document expansion approach can be effectively adapted to standard IR tasks, where labels are scarce and many long documents are present.

dataset, document expansion model, retrieval, (8 more...)

arXiv.org Artificial Intelligence

2012.14005

Country:

North America > United States > New York > New York County > New York City (0.06)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Brain-inspired Search Engine Assistant based on Knowledge Graph

Zhao, Xuejiao, Chen, Huanhuan, Xing, Zhenchang, Miao, Chunyan

arXiv.org Artificial IntelligenceDec-25-2020

Search engines can quickly response a hyperlink list according to query keywords. However, when a query is complex, developers need to repeatedly refine the search keywords and open a large number of web pages to find and summarize answers. Many research works of question and answering (Q and A) system attempt to assist search engines by providing simple, accurate and understandable answers. However, without original semantic contexts, these answers lack explainability, making them difficult for users to trust and adopt. In this paper, a brain-inspired search engine assistant named DeveloperBot based on knowledge graph is proposed, which aligns to the cognitive process of human and has the capacity to answer complex queries with explainability. Specifically, DeveloperBot firstly constructs a multi-layer query graph by splitting a complex multi-constraint query into several ordered constraints. Then it models the constraint reasoning process as subgraph search process inspired by the spreading activation model of cognitive science. In the end, novel features of the subgraph will be extracted for decision-making. The corresponding reasoning subgraph and answer confidence will be derived as explanations. The results of the decision-making demonstrate that DeveloperBot can estimate the answers and answer confidences with high accuracy. We implement a prototype and conduct a user study to evaluate whether and how the direct answers and the explanations provided by DeveloperBot can assist developers' information needs.

developerbot, explanation, graph, (15 more...)

arXiv.org Artificial Intelligence

2012.13529

Country:

Asia > Singapore (0.04)
Oceania > Australia (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.36)

Add feedback

Neural Methods for Effective, Efficient, and Exposure-Aware Information Retrieval

Mitra, Bhaskar

arXiv.org Artificial IntelligenceDec-21-2020

Neural networks with deep architectures have demonstrated significant performance improvements in computer vision, speech recognition, and natural language processing. The challenges in information retrieval (IR), however, are different from these other application areas. A common form of IR involves ranking of documents -- or short passages -- in response to keyword-based queries. Effective IR systems must deal with query-document vocabulary mismatch problem, by modeling relationships between different query and document terms and how they indicate relevance. Models should also consider lexical matches when the query contains rare terms -- such as a person's name or a product model number -- not seen during training, and to avoid retrieving semantically related but irrelevant results. In many real-life IR tasks, the retrieval involves extremely large collections -- such as the document index of a commercial Web search engine -- containing billions of documents. Efficient IR methods should take advantage of specialized IR data structures, such as inverted index, to efficiently retrieve from large collections. Given an information need, the IR system also mediates how much exposure an information artifact receives by deciding whether it should be displayed, and where it should be positioned, among other results. Exposure-aware IR systems may optimize for additional objectives, besides relevance, such as parity of exposure for retrieved items and content publishers. In this thesis, we present novel neural architectures and methods motivated by the specific needs and challenges of IR tasks.

deep learning, exposure-aware information retrieval, us government, (24 more...)

arXiv.org Artificial Intelligence

2012.11685

Country:

Africa (0.67)
North America > United States > Colorado (0.14)
North America > United States > New Mexico > Bernalillo County (0.14)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Media > Music (1.00)
Leisure & Entertainment > Sports > Football (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Small Business Classification By Name: Addressing Gender and Geographic Origin Biases

Shapiro, Daniel

arXiv.org Artificial IntelligenceDec-18-2020

Small business classification is a difficult and important task within many applications, including customer segmentation. Training on small business names introduces gender and geographic origin biases. A model for predicting one of 66 business types based only upon the business name was developed in this work (top-1 f1-score = 60.2%). Two approaches to removing the bias from this model are explored: replacing given names with a placeholder token, and augmenting the training data with gender-swapped examples. The results for these approaches is reported, and the bias in the model was reduced by hiding given names from the model. However, bias reduction was accomplished at the expense of classification performance (top-1 f1-score = 56.6%). Augmentation of the training data with gender-swapping samples proved less effective at bias reduction than the name hiding approach on the evaluated dataset.

business name, classification, dataset, (15 more...)

arXiv.org Artificial Intelligence

2012.10348

Country:

North America > United States (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback