AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding

Nikzad-Khasmakhi, Narjes, Feizi-Derakhshi, Mohammad-Reza, Asgari-Chenaghlu, Meysam, Balafar, Mohammad-Ali, Feizi-Derakhshi, Ali-Reza, Rahkar-Farshi, Taymaz, Ramezani, Majid, Jahanbakhsh-Nagadeh, Zoleikha, Zafarani-Moattar, Elnaz, Ranjbar-Khadivi, Mehrdad

arXiv.org Artificial IntelligenceJun-9-2021

Background: Keyword extraction is a popular research topic in the field of natural language processing. Keywords are terms that describe the most relevant information in a document. The main problem that researchers are facing is how to efficiently and accurately extract the core keywords from a document. However, previous keyword extraction approaches have utilized the text and graph features, there is the lack of models that can properly learn and combine these features in a best way. Methods: In this paper, we develop a multimodal Key-phrase extraction approach, namely Phraseformer, using transformer and graph embedding techniques. In Phraseformer, each keyword candidate is presented by a vector which is the concatenation of the text and structure learning representations. Phraseformer takes the advantages of recent researches such as BERT and ExEm to preserve both representations. Also, the Phraseformer treats the key-phrase extraction task as a sequence labeling problem solved using classification task. Results: We analyze the performance of Phraseformer on three datasets including Inspec, SemEval2010 and SemEval 2017 by F1-score. Also, we investigate the performance of different classifiers on Phraseformer method over Inspec dataset. Experimental results demonstrate the effectiveness of Phraseformer method over the three datasets used. Additionally, the Random Forest classifier gain the highest F1-score among all classifiers. Conclusions: Due to the fact that the combination of BERT and ExEm is more meaningful and can better represent the semantic of words. Hence, Phraseformer significantly outperforms single-modality methods.

extraction, phraseformer, representation, (14 more...)

arXiv.org Artificial Intelligence

2106.04939

Country:

Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.05)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Margin-Based Cluster Recovery with Oracle Queries

Bressan, Marco, Cesa-Bianchi, Nicolò, Lattanzi, Silvio, Paudice, Andrea

arXiv.org Machine LearningJun-9-2021

We study an active cluster recovery problem where, given a set of $n$ points and an oracle answering queries like "are these two points in the same cluster?", the task is to recover exactly all clusters using as few queries as possible. We begin by introducing a simple but general notion of margin between clusters that captures, as special cases, the margins used in previous work, the classic SVM margin, and standard notions of stability for center-based clusterings. Then, under our margin assumptions we design algorithms that, in a variety of settings, recover all clusters exactly using only $O(\log n)$ queries. For the Euclidean case, $\mathbb{R}^m$, we give an algorithm that recovers arbitrary convex clusters, in polynomial time, and with a number of queries that is lower than the best existing algorithm by $\Theta(m^m)$ factors. For general pseudometric spaces, where clusters might not be convex or might not have any notion of shape, we give an algorithm that achieves the $O(\log n)$ query bound, and is provably near-optimal as a function of the packing number of the space. Finally, for clusterings realized by binary concept classes, we give a combinatorial characterization of recoverability with $O(\log n)$ queries, and we show that, for many concept classes in Euclidean spaces, this characterization is equivalent to our margin condition. Our results show a deep connection between cluster margins and active cluster recoverability.

algorithm, one-versus-all margin, query, (13 more...)

arXiv.org Machine Learning

2106.04913

Country:

Europe > Italy > Lombardy > Milan (0.04)
North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Google will let rivals appear as default search engine options on Android for free

EngadgetJun-8-2021, 17:39:36 GMT

Google will jettison an auction system that forces other providers to bid for the right to be featured as a default search engine option on Android. Following a $5 billion fine and antitrust enforcement action in 2018, people in Europe have been able to choose which core apps and services they use on Android by default, instead of having to use Google products at first. Users in the region see an Android choice screen while setting up a device or after performing a factory reset. They can select their default search engine from a number of options. However, the three providers that are presented alongside Google Search have been determined by a sealed bidding process.

default search engine option, google, provider, (6 more...)

Engadget

Country:

Europe > United Kingdom (0.07)
Europe > Middle East (0.07)
Asia > Middle East (0.07)
Africa > Middle East (0.07)

Industry: Information Technology > Services (0.59)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.97)

Add feedback

CAiRE in DialDoc21: Data Augmentation for Information-Seeking Dialogue System

Ishii, Etsuko, Xu, Yan, Winata, Genta Indra, Lin, Zhaojiang, Madotto, Andrea, Liu, Zihan, Xu, Peng, Fung, Pascale

arXiv.org Artificial IntelligenceJun-7-2021

Information-seeking dialogue systems, including knowledge identification and response generation, aim to respond to users with fluent, coherent, and informative responses based on users' needs, which. To tackle this challenge, we utilize data augmentation methods and several training techniques with the pre-trained language models to learn a general pattern of the task and thus achieve promising performance. In DialDoc21 competition, our system achieved 74.95 F1 score and 60.74 Exact Match score in subtask 1, and 37.72 SacreBLEU score in subtask 2. Empirical analysis is provided to explain the effectiveness of our approaches.

dataset, genta indra winata, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2106.0353

Country:

Asia > China > Hong Kong (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)

Add feedback

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models

Gong, Hongyu, Chaudhary, Vishrav, Tang, Yuqing, Guzmán, Francisco

arXiv.org Artificial IntelligenceJun-7-2021

Cross-lingual document representations enable language understanding in multilingual contexts and allow transfer learning from high-resource to low-resource languages at the document level. Recently large pre-trained language models such as BERT, XLM and XLM-RoBERTa have achieved great success when fine-tuned on sentence-level downstream tasks. It is tempting to apply these cross-lingual models to document representation learning. However, there are two challenges: (1) these models impose high costs on long document processing and thus many of them have strict length limit; (2) model fine-tuning requires extra data and computational resources, which is not practical in resource-limited settings. In this work, we address these challenges by proposing unsupervised Language-Agnostic Weighted Document Representations (LAWDR). We study the geometry of pre-trained sentence embeddings and leverage it to derive document representations without fine-tuning. Evaluated on cross-lingual document alignment, LAWDR demonstrates comparable performance to state-of-the-art models on benchmark datasets.

document representation, pre-trained model, representation, (14 more...)

arXiv.org Artificial Intelligence

2106.03379

Country: Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Diversity driven Query Rewriting in Search Advertising

Mohankumar, Akash Kumar, Begwani, Nikit, Singh, Amit

arXiv.org Artificial IntelligenceJun-7-2021

Retrieving keywords (bidwords) with the same intent as query, referred to as close variant keywords, is of prime importance for effective targeted search advertising. For head and torso search queries, sponsored search engines use a huge repository of same intent queries and keywords, mined ahead of time. Online, this repository is used to rewrite the query and then lookup the rewrite in a repository of bid keywords contributing to significant revenue. Recently generative retrieval models have been shown to be effective at the task of generating such query rewrites. We observe two main limitations of such generative models. First, rewrites generated by these models exhibit low lexical diversity, and hence the rewrites fail to retrieve relevant keywords that have diverse linguistic variations. Second, there is a misalignment between the training objective - the likelihood of training data, v/s what we desire - improved quality and coverage of rewrites. In this work, we introduce CLOVER, a framework to generate both high-quality and diverse rewrites by optimizing for human assessment of rewrite quality using our diversity-driven reinforcement learning algorithm. We use an evaluation model, trained to predict human judgments, as the reward function to finetune the generation policy. We empirically show the effectiveness of our proposed approach through offline experiments on search queries across geographies spanning three major languages. We also perform online A/B experiments on Bing, a large commercial search engine, which shows (i) better user engagement with an average increase in clicks by 12.83% accompanied with an average defect reduction by 13.97%, and (ii) improved revenue by 21.29%.

keyword, query, rewrite, (14 more...)

arXiv.org Artificial Intelligence

2106.03816

Country:

Asia > Singapore (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry:

Marketing (0.85)
Information Technology > Services (0.61)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Build A Search Engine With Python: Computer Science & Python

#artificialintelligenceJun-6-2021, 09:36:08 GMT

Unit 2 will take you in more depth in using procedures, loops, and the logical constructs in order to add more functionality to the crawler built-in unit 1. Unit 3 is about managing data through mastering the use of the different data types to be able to create the search engine index. Our biggest goal tho is to learn about computer science, So unit 4 focuses on teaching you how computers store data and how to be cost-effective when doing that. By the end of unit 5, you'll have a better understanding of how programs run and how to implement a hash table for our search engine. Unit 6 will extend the grammar we introduced in unit 1 and will show you how to get the best result for a search query. If the course gets much interaction and feedback, we'll work on units to demonstrate how to code real-world Python applications I hope you're as excited as I'm to start this learning journey, so just the enrollment today and thank me later.

computer science, computer science & python, python, (5 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (0.52)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)

Add feedback

AliCG: Fine-grained and Evolvable Conceptual Graph Construction for Semantic Search at Alibaba

Zhang, Ningyu, Jia, Qianghuai, Deng, Shumin, Chen, Xiang, Ye, Hongbin, Chen, Hui, Tou, Huaixiao, Huang, Gang, Wang, Zhao, Hua, Nengwei, Chen, Huajun

arXiv.org Artificial IntelligenceJun-3-2021

Conceptual graphs, which is a particular type of Knowledge Graphs, play an essential role in semantic search. Prior conceptual graph construction approaches typically extract high-frequent, coarse-grained, and time-invariant concepts from formal texts. In real applications, however, it is necessary to extract less-frequent, fine-grained, and time-varying conceptual knowledge and build taxonomy in an evolving manner. In this paper, we introduce an approach to implementing and deploying the conceptual graph at Alibaba. Specifically, We propose a framework called AliCG which is capable of a) extracting fine-grained concepts by a novel bootstrapping with alignment consensus approach, b) mining long-tail concepts with a novel low-resource phrase mining approach, c) updating the graph dynamically via a concept distribution estimation method based on implicit and explicit user behaviors. We have deployed the framework at Alibaba UC Browser. Extensive offline evaluation as well as online A/B testing demonstrate the efficacy of our approach.

alicg, conceptual graph, query, (13 more...)

arXiv.org Artificial Intelligence

2106.01686

Country:

Asia > Singapore (0.05)
Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Hubei Province > Wuhan (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
Health & Medicine > Therapeutic Area > Immunology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Conversational Question Answering: A Survey

Zaib, Munazza, Zhang, Wei Emma, Sheng, Quan Z., Mahmood, Adnan, Zhang, Yang

arXiv.org Artificial IntelligenceJun-2-2021

Question answering (QA) systems provide a way of querying the information available in various formats including, but not limited to, unstructured and structured data in natural languages. It constitutes a considerable part of conversational artificial intelligence (AI) which has led to the introduction of a special research topic on Conversational Question Answering (CQA), wherein a system is required to understand the given context and then engages in multi-turn QA to satisfy the user's information needs. Whilst the focus of most of the existing research work is subjected to single-turn QA, the field of multi-turn QA has recently grasped attention and prominence owing to the availability of large-scale, multi-turn QA datasets and the development of pre-trained language models. With a good amount of models and research papers adding to the literature every year recently, there is a dire need of arranging and presenting the related work in a unified manner to streamline future research. This survey, therefore, is an effort to present a comprehensive review of the state-of-the-art research trends of CQA primarily based on reviewed papers from 2016-2021. Our findings show that there has been a trend shift from single-turn to multi-turn QA which empowers the field of Conversational AI from different perspectives. This survey is intended to provide an epitome for the research community with the hope of laying a strong foundation for the field of CQA.

dataset, natural language processing, proceedings, (15 more...)

arXiv.org Artificial Intelligence

2106.00874

Country:

North America > United States > New York > Richmond County > New York City (0.04)
Asia > Pakistan (0.04)
Asia > Nepal (0.04)
(6 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.85)

Industry:

Education (0.68)
Media (0.67)
Government (0.67)
Information Technology > Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(4 more...)

Add feedback

Database Reasoning Over Text

Thorne, James, Yazdani, Majid, Saeidi, Marzieh, Silvestri, Fabrizio, Riedel, Sebastian, Halevy, Alon

arXiv.org Artificial IntelligenceJun-2-2021

Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as "List/Count all female athletes who were born in 20th century", which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer models perform very well for small databases, they exhibit limitations in processing noisy data, numerical operations, and queries that aggregate facts. We propose a modular architecture to answer these database-style queries over multiple spans from text and aggregating these at scale. We evaluate the architecture using WikiNLDB, a novel dataset for exploring such queries. Our architecture scales to databases containing thousands of facts whereas contemporary models are limited by how many facts can be encoded. In direct comparison on small databases, our approach increases overall answer accuracy from 85% to 90%. On larger databases, our approach retains its accuracy whereas transformer baselines could not encode the context.

database, operator, query, (15 more...)

arXiv.org Artificial Intelligence

2106.01074

Country:

Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
South America > Uruguay > Montevideo > Montevideo (0.04)
North America > United States > North Carolina (0.04)
(7 more...)

Genre:

Personal (0.46)
Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Tennis (0.46)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.70)
(2 more...)

Add feedback