AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

A Survey for Efficient Open Domain Question Answering

Zhang, Qin, Chen, Shangsi, Xu, Dongkuan, Cao, Qingqing, Chen, Xiaojun, Cohn, Trevor, Fang, Meng

arXiv.org Artificial IntelligenceNov-14-2022

Open domain question answering (ODQA) is a longstanding task aimed at answering factual questions from a large knowledge corpus without any explicit evidence in natural language processing (NLP). Recent works have predominantly focused on improving the answering accuracy and achieved promising progress. However, higher accuracy often comes with more memory consumption and inference latency, which might not necessarily be efficient enough for direct deployment in the real world. Thus, a trade-off between accuracy, memory consumption and processing speed is pursued. In this paper, we provide a survey of recent advances in the efficiency of ODQA models. We walk through the ODQA models and conclude the core techniques on efficiency. Quantitative analysis on memory cost, processing speed, accuracy and overall comparison are given. We hope that this work would keep interested scholars informed of the advances and open challenges in ODQA efficiency research, and thus contribute to the further development of ODQA efficiency.

computational linguistic, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2211.07886

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
North America > United States > New York > New York County > New York City (0.04)
(11 more...)

Genre: Overview (1.00)

Industry:

Energy (0.67)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding the Snowflake Query Optimizer

#artificialintelligenceNov-13-2022, 04:50:21 GMT

You are preeminent in your field, a singular talent. Using cleverness and craft you imagine factory designs that are elegant and streamlined. No inch of space wasted, not an inefficiency in sight. Imagine you want to design a megafactory - a factory that designs factories - encapsulating everything you know into one automated machine. You embark on an impossible journey.

optimization, query, snowflake, (14 more...)

#artificialintelligence

Technology:

Information Technology > Databases (0.99)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.33)

Add feedback

Pareto-Optimal Learning-Augmented Algorithms for Online k-Search Problems

Lee, Russell, Sun, Bo, Lui, John C. S., Hajiesmaili, Mohammad

arXiv.org Artificial IntelligenceNov-11-2022

This paper leverages machine learned predictions to design online algorithms for the k-max and k-min search problems. Our algorithms can achieve performances competitive with the offline algorithm in hindsight when the predictions are accurate (i.e., consistency) and also provide worst-case guarantees when the predictions are arbitrarily wrong (i.e., robustness). Further, we show that our algorithms have attained the Pareto-optimal trade-off between consistency and robustness, where no other algorithms for k-max or k-min search can improve on the consistency for a given robustness. To demonstrate the performance of our algorithms, we evaluate them in experiments of buying and selling Bitcoin.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2211.06567

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
Asia > China > Hong Kong (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.50)

Add feedback

Multi-stage Information Retrieval for Vietnamese Legal Texts

Pham, Nhat-Minh, Nguyen, Ha-Thanh, Do, Trong-Hop

arXiv.org Artificial IntelligenceNov-11-2022

Despite being well researched in many languages, information retrieval has still not received much attention from the Vietnamese research community. This is especially true for the case of legal documents, which are hard to process. This study proposes a new approach for information retrieval for Vietnamese legal documents using sentence-transformer. Besides, various experiments are conducted to make comparisons between different transformer models, ranking scores, syllable-level, and word-level training. The experiment results show that the proposed model outperforms models used in current research on information retrieval for Vietnamese documents.

information retrieval, natural language, relevant article, (17 more...)

arXiv.org Artificial Intelligence

2209.14494

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

Chen, Xilun, Lakhotia, Kushal, Oğuz, Barlas, Gupta, Anchit, Lewis, Patrick, Peshterliev, Stan, Mehdad, Yashar, Gupta, Sonal, Yih, Wen-tau

arXiv.org Artificial IntelligenceNov-11-2022

Despite their recent popularity and well-known advantages, dense retrievers still lag behind sparse methods such as BM25 in their ability to reliably match salient phrases and rare entities in the query and to generalize to out-of-domain data. It has been argued that this is an inherent limitation of dense models. We rebut this claim by introducing the Salient Phrase Aware Retriever (SPAR), a dense retriever with the lexical matching capacity of a sparse model. We show that a dense Lexical Model {\Lambda} can be trained to imitate a sparse one, and SPAR is built by augmenting a standard dense retriever with {\Lambda}. Empirically, SPAR shows superior performance on a range of tasks including five question answering datasets, MS MARCO passage retrieval, as well as the EntityQuestions and BEIR benchmarks for out-of-domain evaluation, exceeding the performance of state-of-the-art dense and sparse retrievers. The code and models of SPAR are available at: https://github.com/facebookresearch/dpr-scale/tree/main/spar

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2110.06918

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
(7 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)

Add feedback

Zebra: Deeply Integrating System-Level Provenance Search and Tracking for Efficient Attack Investigation

Yang, Xinyu, Liu, Haoyuan, Wang, Ziyu, Gao, Peng

arXiv.org Artificial IntelligenceNov-10-2022

However, a key limitation is that their DSLs can only search for events that are located within a close subgraph neighborhood. System auditing has emerged as a key approach for monitoring Thus, these approaches cannot efficiently reveal faraway system call events and investigating sophisticated attacks. Based on events on a long-range attack sequence, which is observed in many the collected audit logs, research has proposed to search for attack of the attacks these days due to their sophisticated, multi-stage patterns or track the causal dependencies of system events to reveal nature [55]. Tracking-based approaches assume causal dependencies the attack sequence. However, existing approaches either cannot between system entities that are involved in the same system reveal long-range attack sequences or suffer from the dependency event (e.g., a process reading a file) [45, 48, 52, 54]. Based on this explosion problem due to a lack of focus on attack-relevant parts, assumption, these approaches track the dependencies from a Point and thus are insufficient for investigating complex attacks. of Interest (POI) event (e.g., an alert event like the creation of a To bridge the gap, we propose Zebra, a system that synergistically suspicious file) and construct a system dependency graph, in which integrates attack pattern search and causal dependency tracking nodes represent system entities and edges represent system events.

artificial intelligence, information retrieval query processing, natural language, (18 more...)

arXiv.org Artificial Intelligence

2211.05403

Country:

North America > United States > Virginia (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.47)

Add feedback

Accountable and Explainable Methods for Complex Reasoning over Text

Atanasova, Pepa

arXiv.org Artificial IntelligenceNov-9-2022

A major concern of Machine Learning (ML) models is their opacity. They are deployed in an increasing number of applications where they often operate as black boxes that do not provide explanations for their predictions. Among others, the potential harms associated with the lack of understanding of the models' rationales include privacy violations, adversarial manipulations, and unfair discrimination. As a result, the accountability and transparency of ML models have been posed as critical desiderata by works in policy and law, philosophy, and computer science. In computer science, the decision-making process of ML models has been studied by developing accountability and transparency methods. Accountability methods, such as adversarial attacks and diagnostic datasets, expose vulnerabilities of ML models that could lead to malicious manipulations or systematic faults in their predictions. Transparency methods explain the rationales behind models' predictions gaining the trust of relevant stakeholders and potentially uncovering mistakes and unfairness in models' decisions. To this end, transparency methods have to meet accountability requirements as well, e.g., being robust and faithful to the underlying rationales of a model. This thesis presents my research that expands our collective knowledge in the areas of accountability and transparency of ML models developed for complex reasoning tasks over text.

information retrieval, large language model, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2211.04946

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Baltimore (0.13)
Asia > China > Hong Kong (0.04)
(46 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(10 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
(7 more...)

Add feedback

Improving Performance of Automatic Keyword Extraction (AKE) Methods Using PoS-Tagging and Enhanced Semantic-Awareness

Altuncu, Enes, Nurse, Jason R. C., Xu, Yang, Guo, Jie, Li, Shujun

arXiv.org Artificial IntelligenceNov-9-2022

Automatic keyword extraction (AKE) has gained more importance with the increasing amount of digital textual data that modern computing systems process. It has various applications in information retrieval (IR) and natural language processing (NLP), including text summarisation, topic analysis and document indexing. This paper proposes a simple but effective post-processing-based universal approach to improve the performance of any AKE methods, via an enhanced level of semantic-awareness supported by PoS-tagging. To demonstrate the performance of the proposed approach, we considered word types retrieved from a PoS-tagging step and two representative sources of semantic information -- specialised terms defined in one or more context-dependent thesauri, and named entities in Wikipedia. The above three steps can be simply added to the end of any AKE methods as part of a post-processor, which simply re-evaluate all candidate keywords following some context-specific and semantic-aware criteria. For five state-of-the-art (SOTA) AKE methods, our experimental results with 17 selected datasets showed that the proposed approach improved their performances both consistently (up to 100\% in terms of improved cases) and significantly (between 10.2\% and 53.8\%, with an average of 25.8\%, in terms of F1-score and across all five methods), especially when all the three enhancement steps are used. Our results have profound implications considering the ease to apply our proposed approach to any AKE methods and to further extend it.

information retrieval, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2211.05031

Country:

Asia > China > Shanghai > Shanghai (0.04)
Europe > United Kingdom (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

COV19IR : COVID-19 Domain Literature Information Retrieval

Bose, Arusarka, Zhou, Zili, Xu, Guandong

arXiv.org Artificial IntelligenceNov-8-2022

Increasing number of COVID-19 research literatures cause new challenges in effective literature screening and COVID-19 domain knowledge aware Information Retrieval. To tackle the challenges, we demonstrate two tasks along withsolutions, COVID-19 literature retrieval, and question answering. COVID-19 literature retrieval task screens matching COVID-19 literature documents for textual user query, and COVID-19 question answering task predicts proper text fragments from text corpus as the answer of specific COVID-19 related questions. Based on transformer neural network, we provided solutions to implement the tasks on CORD-19 dataset, we display some examples to show the effectiveness of our proposed solutions.

information retrieval, machine learning, question answering, (17 more...)

arXiv.org Artificial Intelligence

2211.04013

Country:

Asia > Middle East > Saudi Arabia (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)
(9 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.86)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.72)

Add feedback

Query-Specific Knowledge Graphs for Complex Finance Topics

Mackie, Iain, Dalton, Jeffrey

arXiv.org Artificial IntelligenceNov-8-2022

Across the financial domain, researchers answer complex questions by extensively "searching" for relevant information to generate long-form reports. This workshop paper discusses automating the construction of query-specific document and entity knowledge graphs (KGs) for complex research topics. We focus on the CODEC dataset, where domain experts (1) create challenging questions, (2) construct long natural language narratives, and (3) iteratively search and assess the relevance of documents and entities. For the construction of query-specific KGs, we show that state-of-the-art ranking systems have headroom for improvement, with specific failings due to a lack of context or explicit knowledge representation. We demonstrate that entity and document relevance are positively correlated, and that entity-based query feedback improves document ranking effectiveness. Furthermore, we construct query-specific KGs using retrieval and evaluate using CODEC's "ground-truth graphs", showing the precision and recall trade-offs. Lastly, we point to future work, including adaptive KG retrieval algorithms and GNN-based weighting methods, while highlighting key challenges such as high-quality data, information extraction recall, and the size and sparsity of complex topic graphs.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.04142

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.84)

Industry: Banking & Finance > Trading (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.35)

Add feedback