AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Integrating connection search in graph queries

Anadiotis, Angelos Christos, Manolescu, Ioana, Mohanty, Madhulika

arXiv.org Artificial IntelligenceAug-9-2022

Graph data management and querying has many practical applications. When graphs are very heterogeneous and/or users are unfamiliar with their structure, they may need to find how two or more groups of nodes are connected in a graph, even when users are not able to describe the connections. This is only partially supported by existing query languages, which allow searching for paths, but not for trees connecting three or more node groups. The latter is related to the NP-hard Group Steiner Tree problem, and has been previously considered for keyword search in databases. In this work, we formally show how to integrate connecting tree patterns (CTPs, in short) within a graph query language such as SPARQL or Cypher, leading to an Extended Query Language (or EQL, in short). We then study a set of algorithms for evaluating CTPs; we generalize prior keyword search work, most importantly by (i) considering bidirectional edge traversal and (ii) allowing users to select any score function for ranking CTP results. To cope with very large search spaces, we propose an efficient pruning technique and formally establish a large set of cases where our algorithm, MOLESP, is complete even with pruning. Our experiments validate the performance of our CTP and EQL evaluation algorithms on a large set of synthetic and real-world workloads.

algorithm, graph, node, (15 more...)

arXiv.org Artificial Intelligence

2208.04802

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)
(21 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)

Add feedback

Learning Diverse Document Representations with Deep Query Interactions for Dense Retrieval

Li, Zehan, Yang, Nan, Wang, Liang, Wei, Furu

arXiv.org Artificial IntelligenceAug-8-2022

In this paper, we propose a new dense retrieval model which learns diverse document representations with deep query interactions. Our model encodes each document with a set of generated pseudo-queries to get query-informed, multi-view document representations. It not only enjoys high inference efficiency like the vanilla dual-encoder models, but also enables deep query-document interactions in document encoding and provides multi-faceted representations to better match different queries. Experiments on several benchmarks demonstrate the effectiveness of the proposed method, out-performing strong dual encoder baselines.The code is available at \url{https://github.com/jordane95/dual-cross-encoder

encoder, query, representation, (15 more...)

arXiv.org Artificial Intelligence

2208.04232

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Google Keyword Planner gets a new feature – Search Engine Land

#artificialintelligenceAug-6-2022, 18:01:13 GMT

This feature adds the ability to use an automated machine learning system where we suggest which ad groups are the best ones for the keywords, …

search engine land

#artificialintelligence

Industry: Media > News (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Sublinear Time Algorithm for Online Weighted Bipartite Matching

Hu, Hang, Song, Zhao, Tao, Runzhou, Xu, Zhaozhuo, Zhuo, Danyang

arXiv.org Artificial IntelligenceAug-5-2022

Online bipartite matching is a fundamental problem in online algorithms. The goal is to match two sets of vertices to maximize the sum of the edge weights, where for one set of vertices, each vertex and its corresponding edge weights appear in a sequence. Currently, in the practical recommendation system or search engine, the weights are decided by the inner product between the deep representation of a user and the deep representation of an item. The standard online matching needs to pay $nd$ time to linear scan all the $n$ items, computing weight (assuming each representation vector has length $d$), and then decide the matching based on the weights. However, in reality, the $n$ could be very large, e.g. in online e-commerce platforms. Thus, improving the time of computing weights is a problem of practical significance. In this work, we provide the theoretical foundation for computing the weights approximately. We show that, with our proposed randomized data structures, the weights can be computed in sublinear time while still preserving the competitive ratio of the matching algorithm.

algorithm, data structure, online, (16 more...)

arXiv.org Artificial Intelligence

2208.03367

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Michigan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Services > e-Commerce Services (0.48)

Technology:

Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)
(2 more...)

Add feedback

Multi-agent Databases via Independent Learning

Zhang, Chi, Papaemmanouil, Olga, Hanna, Josiah P., Akella, Aditya

arXiv.org Artificial IntelligenceAug-5-2022

Machine learning is rapidly being used in database research to improve the effectiveness of numerous tasks included but not limited to query optimization, workload scheduling, physical design, etc. Currently, the research focus has been on replacing a single database component responsible for one task by its learning-based counterpart. However, query performance is not simply determined by the performance of a single component, but by the cooperation of multiple ones. As such, learning based database components need to collaborate during both training and execution in order to develop policies that meet end performance goals. Thus, the paper attempts to address the question "Is it possible to design a database consisting of various learned components that cooperatively work to improve end-to-end query latency?". To answer this question, we introduce MADB (Multi-Agent DB), a proof-of-concept system that incorporates a learned query scheduler and a learned query optimizer. MADB leverages a cooperative multi-agent reinforcement learning approach that allows the two components to exchange the context of their decisions with each other and collaboratively work towards reducing the query latency. Preliminary results demonstrate that MADB can outperform the non-cooperative integration of learned components.

agent, optimizer, query, (15 more...)

arXiv.org Artificial Intelligence

2205.14323

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Texas > Travis County > Austin (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.52)

Add feedback

Distilling Knowledge from Reader to Retriever for Question Answering

Izacard, Gautier, Grave, Edouard

arXiv.org Artificial IntelligenceAug-4-2022

The task of information retrieval is an important component of many natural language processing systems, such as open domain question answering. While traditional methods were based on hand-crafted features, continuous representations based on neural networks recently obtained competitive results. A challenge of using such methods is to obtain supervised data to train the retriever model, corresponding to pairs of query and support documents. In this paper, we propose a technique to learn retriever models for downstream tasks, inspired by knowledge distillation, and which does not require annotated pairs of query and documents. Our approach leverages attention scores of a reader model, used to solve the task based on retrieved documents, to obtain synthetic labels for the retriever. We evaluate our method on question answering, obtaining state-of-the-art results.

information retrieval, machine learning, question answering, (19 more...)

arXiv.org Artificial Intelligence

2012.04584

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.92)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.90)

Add feedback

Towards No.1 in CLUE Semantic Matching Challenge: Pre-trained Language Model Erlangshen with Propensity-Corrected Loss

Wang, Junjie, Zhang, Yuxiang, Yang, Ping, Gan, Ruyi

arXiv.org Artificial IntelligenceAug-4-2022

This report describes a pre-trained language model Erlangshen with propensity-corrected loss, the No.1 in CLUE Semantic Matching Challenge. In the pre-training stage, we construct a dynamic masking strategy based on knowledge in Masked Language Modeling (MLM) with whole word masking. Furthermore, by observing the specific structure of the dataset, the pre-trained Erlangshen applies propensity-corrected loss (PCL) in the fine-tuning phase. Overall, we achieve 72.54 points in F1 Score and 78.90 points in Accuracy on the test set. Our code is publicly available at: https://github.com/IDEA-CCNL/Fengshenbang-LM/tree/hf-ds/fengshen/examples/clue_sim.

dataset, erlangshen, propensity-corrected loss, (9 more...)

arXiv.org Artificial Intelligence

2208.02959

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)

Add feedback

Q4EDA: A Novel Strategy for Textual Information Retrieval Based on User Interactions with Visual Representations of Time Series

Christino, Leonardo, Ferreira, Martha D., Paulovich, Fernando V.

arXiv.org Artificial IntelligenceAug-2-2022

Knowing how to construct text-based Search Queries (SQs) for use in Search Engines (SEs) such as Google or Wikipedia has become a fundamental skill. Though much data are available through such SEs, most structured datasets live outside their scope. Visualization tools aid in this limitation, but no such tools come close to the sheer amount of information available through general-purpose SEs. To fill this gap, this paper presents Q4EDA, a novel framework that converts users' visual selection queries executed on top of time series visual representations, providing valid and stable SQs to be used in general-purpose SEs and suggestions of related information. The usefulness of Q4EDA is presented and validated by users through an application linking a Gapminder's line-chart replica with a SE populated with Wikipedia documents, showing how Q4EDA supports and enhances exploratory analysis of United Nations world indicators. Despite some limitations, Q4EDA is unique in its proposal and represents a real advance towards providing solutions for querying textual information based on user interactions with visual representations.

artificial intelligence, information retrieval, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/info13080368

2101.08655

Country:

Asia > Russia (0.29)
Europe > Russia (0.15)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(35 more...)

Genre:

Questionnaire & Opinion Survey (0.93)
Research Report > Experimental Study (0.69)
Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.94)
Government (0.88)
Health & Medicine > Therapeutic Area > Immunology (0.68)
(2 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Data Science (1.00)
(2 more...)

Add feedback

No Pattern, No Recognition: a Survey about Reproducibility and Distortion Issues of Text Clustering and Topic Modeling

Silva, Marília Costa Rosendo, Siqueira, Felipe Alves, Tarrega, João Pedro Mantovani, Beinotti, João Vitor Pataca, Nunes, Augusto Sousa, Gardini, Miguel de Mattos, da Silva, Vinícius Adolfo Pereira, da Silva, Nádia Félix Felipe, de Carvalho, André Carlos Ponce de Leon Ferreira

arXiv.org Artificial IntelligenceAug-2-2022

Extracting knowledge from unlabeled texts using machine learning algorithms can be complex. Document categorization and information retrieval are two applications that may benefit from unsupervised learning (e.g., text clustering and topic modeling), including exploratory data analysis. However, the unsupervised learning paradigm poses reproducibility issues. The initialization can lead to variability depending on the machine learning algorithm. Furthermore, the distortions can be misleading when regarding cluster geometry. Amongst the causes, the presence of outliers and anomalies can be a determining factor. Despite the relevance of initialization and outlier issues for text clustering and topic modeling, the authors did not find an in-depth analysis of them. This survey provides a systematic literature review (2011-2022) of these subareas and proposes a common terminology since similar procedures have different terms. The authors describe research opportunities, trends, and open issues. The appendices summarize the theoretical background of the text vectorization, the factorization, and the clustering algorithms that are directly or indirectly related to the reviewed works.

algorithm, computational linguistic, reproducibility and distortion issue, (11 more...)

arXiv.org Artificial Intelligence

2208.01712

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
(36 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.47)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(3 more...)

Add feedback

The New Way Police Could Use Your Google Searches Against You

SlateAug-1-2022, 18:18:25 GMT

For millennia, we've been told that asking questions was the path to enlightenment. But in the surveillance age, it might land you in jail. That's the danger of a new search tactic that police are increasingly turning to in their constant campaign to transform our phones and devices into evidence against us: keyword warrants. One Denver court may soon rule on whether they can continue as a policing tactic--and in the post-Roe era, the wrong decision could put abortion seekers in unprecedented danger. Police have used web browser history and search engine data in their investigations for about as long as the data has existed, but keyword warrants are different--a digital dragnet to find every user who searches for a specific person, place or thing.

google, keyword warrant, warrant, (12 more...)

Slate

Country:

North America > United States > Colorado (0.41)
North America > United States > Wisconsin (0.05)
North America > United States > Texas (0.05)
(3 more...)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (0.50)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.74)

Add feedback