AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Value Retrieval with Arbitrary Queries for Form-like Documents

Gao, Mingfei, Xue, Le, Ramaiah, Chetan, Xing, Chen, Xu, Ran, Xiong, Caiming

arXiv.org Artificial IntelligenceDec-14-2021

We propose value retrieval with arbitrary queries for form-like documents to reduce human effort of processing forms. Unlike previous methods that only address a fixed set of field items, our method predicts target value for an arbitrary query based on the understanding of layout and semantics of a form. To further boost model performance, we propose a simple document language modeling (simpleDLM) strategy to improve document understanding on large-scale model pre-training. Experimental results show that our method outperforms our baselines significantly and the simpleDLM further improves our performance on value retrieval by around 17\% F1 score compared with the state-of-the-art pre-training method. Code will be made publicly available.

baseline, ocr word, query, (13 more...)

arXiv.org Artificial Intelligence

2112.0782

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Add feedback

Quantum Mathematics in Artificial Intelligence

Widdows, Dominic | Kitto, Kirsty (University of Technology Sydney) | Cohen, Trevor (University of Washington)

Journal of Artificial Intelligence ResearchDec-14-2021

In the decade since 2010, successes in artificial intelligence have been at the forefront of computer science and technology, and vector space models have solidified a position at the forefront of artificial intelligence. At the same time, quantum computers have become much more powerful, and announcements of major advances are frequently in the news. The mathematical techniques underlying both these areas have more in common than is sometimes realized. Vector spaces took a position at the axiomatic heart of quantum mechanics in the 1930s, and this adoption was a key motivation for the derivation of logic and probability from the linear geometry of vector spaces. Quantum interactions between particles are modelled using the tensor product, which is also used to express objects and operations in artificial neural networks. This paper describes some of these common mathematical areas, including examples of how they are used in artificial intelligence (AI), particularly in automated reasoning and natural language processing (NLP). Techniques discussed include vector spaces, scalar products, subspaces and implication, orthogonal projection and negation, dual vectors, density matrices, positive operators, and tensor products. Application areas include information retrieval, categorization and implication, modelling word-senses and disambiguation, inference in knowledge bases, and semantic composition. Some of these approaches can potentially be implemented on quantum hardware. Many of the practical steps in this implementation are in early stages, and some are already realized. Explaining some of the common mathematical tools can help researchers in both AI and quantum computing further exploit these overlaps, recognizing and exploring new directions along the way.

mathematics, representation, vector, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12702

AI Access Foundation

12702

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York > New York County > New York City (0.04)
(11 more...)

Genre:

Overview (0.92)
Research Report (0.92)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Few-shot Multi-hop Question Answering over Knowledge Base

Meihao, Fan

arXiv.org Artificial IntelligenceDec-13-2021

Previous work on Chinese Knowledge Base Question Answering has been restricted due to the lack of complex Chinese semantic parsing dataset and the exponentially growth of searching space with the length of relation paths. This paper proposes an efficient pipeline method equipped with a pre-trained language model and a strategy to construct artificial training samples, which only needs small amount of data but performs well on open-domain complex Chinese Question Answering task. Besides, By adopting a Beam Search algorithm based on a language model marking scores for candidate query tuples, we decelerate the growing relation paths when generating multi-hop query paths. Finally, we evaluate our model on CCKS2019 Complex Question Answering via Knowledge Base task and achieves F1-score of 62.55\% on the test dataset. Moreover when training with only 10\% data, our model can still achieves F1-score of 58.54\%. The result shows the capability of our model to process KBQA task and the advantage in few-shot learning.

query path, relation, topic entity, (14 more...)

arXiv.org Artificial Intelligence

2112.11909

Country:

North America > Canada (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

UPV at TREC Health Misinformation Track 2021 Ranking with SBERT and Quality Estimators

Schlicht, Ipek Baris, de Paula, Angel Felipe Magnossão, Rosso, Paolo

arXiv.org Artificial IntelligenceDec-11-2021

Health misinformation on search engines is a significant problem that could negatively affect individuals or public health. To mitigate the problem, TREC organizes a health misinformation track. This paper presents our submissions to this track. We use a BM25 and a domain-specific semantic search engine for retrieving initial documents. Later, we examine a health news schema for quality assessment and apply it to re-rank documents. We merge the scores from the different components by using reciprocal rank fusion. Finally, we discuss the results and conclude with future works.

misinformation, quality estimator, search engine, (10 more...)

arXiv.org Artificial Intelligence

2112.0608

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.82)

Industry:

Media > News (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)

Add feedback

Calculating Question Similarity is Enough: A New Method for KBQA Tasks

Zhao, Hanyu, Yuan, Sha, Leng, Jiahong, Pan, Xiang, Wang, Guoqiang

arXiv.org Artificial IntelligenceDec-11-2021

Knowledge Base Question Answering (KBQA) aims to answer natural language questions with the help of an external knowledge base. The core idea is to find the link between the internal knowledge behind questions and known triples of the knowledge base. The KBQA task pipeline contains several steps, including entity recognition, entity linking, answering selection, etc. This kind of pipeline method means that errors in any procedure will inevitably propagate to the final prediction. To address this challenge, this paper proposes a Corpus Generation - Retrieve Method (CGRM) with Pre-training Language Model (PLM) for the KBQA task. The major novelty lies in the design of the new method, wherein our approach, the knowledge enhanced T5 (kT5) model aims to generate natural language QA pairs based on Knowledge Graph triples and directly solve the QA by only retrieving the synthetic dataset. The new method can extract more information about the entities from PLM to improve accuracy and simplify the processes. We test our method on NLPCC-ICCPOL 2016 KBQA dataset, and the results show that our method improves the performance of KBQA and the out straight-forward method is competitive with the state-of-the-art.

computational linguistic, knowledge base, proceedings, (11 more...)

arXiv.org Artificial Intelligence

2111.07658

Country:

Asia > China > Sichuan Province > Chengdu (0.04)
Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)
(9 more...)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
(2 more...)

Add feedback

A Scoping Review of Publicly Available Language Tasks in Clinical Natural Language Processing

Gao, Yanjun, Dligach, Dmitriy, Christensen, Leslie, Tesch, Samuel, Laffin, Ryan, Xu, Dongfang, Miller, Timothy, Uzuner, Ozlem, Churpek, Matthew M, Afshar, Majid

arXiv.org Artificial IntelligenceDec-7-2021

Objective: to provide a scoping review of papers on clinical natural language processing (NLP) tasks that use publicly available electronic health record data from a cohort of patients. Materials and Methods: We searched six databases, including biomedical research and computer science literature database. A round of title/abstract screening and full-text screening were conducted by two reviewers. Our method followed the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) guidelines. Results: A total of 35 papers with 47 clinical NLP tasks met inclusion criteria between 2007 and 2021. We categorized the tasks by the type of NLP problems, including name entity recognition, summarization, and other NLP tasks. Some tasks were introduced with a topic of clinical decision support applications, such as substance abuse, phenotyping, cohort selection for clinical trial. We summarized the tasks by publication and dataset information. Discussion: The breadth of clinical NLP tasks keeps growing as the field of NLP evolves with advancements in language systems. However, gaps exist in divergent interests between general domain NLP community and clinical informatics community, and in generalizability of the data sources. We also identified issues in data selection and preparation including the lack of time-sensitive data, and invalidity of problem size and evaluation. Conclusions: The existing clinical NLP tasks cover a wide range of topics and the field will continue to grow and attract more attention from both general domain NLP and clinical informatics community. We encourage future work to incorporate multi-disciplinary collaboration, reporting transparency, and standardization in data preparation.

general domain nlp community, natural language processing, nlp task, (11 more...)

arXiv.org Artificial Intelligence

2112.0578

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom (0.04)
(7 more...)

Genre: Research Report > Experimental Study (0.66)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Diagnostic Medicine (0.94)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)

Add feedback

Blogging Business Owner

#artificialintelligenceDec-4-2021, 23:57:56 GMT

DISCLOSURE: THIS MESSAGE MAY CONTAIN AN AFFILIATE LINK, MEANING I GET A SMALL COMMISSION IF YOU DECIDE TO MAKE A PURCHASE USING MY LINK AT NO ADDITIONAL COST TO YOU. Frase offers top-ranking keywords and headlines for your topic. With Frase, you have access to frequently asked questions regarding your content and the external links connected to your subject. With Frase, content creation has never been easier! When I contemplated starting a blog, consistent content creation was at the top of my concerns list. Where will I find reliable sources? How can I ensure I am providing the most value for my work?

content brief, frase, information, (14 more...)

#artificialintelligence

Genre: Frequently Asked Questions (FAQ) (0.91)

Industry: Media > News (0.52)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)

Add feedback

Building a Search Engine using Elasticsearch in 15 minutes

#artificialintelligenceDec-3-2021, 21:33:02 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. It's free, we don't spam, and we never share your email address.

#artificialintelligence

Technology:

Information Technology > Information Management > Search (0.48)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Add feedback

Improving Predictions of Tail-end Labels using Concatenated BioMed-Transformers for Long Medical Documents

Yogarajan, Vithya, Pfahringer, Bernhard, Smith, Tony, Montiel, Jacob

arXiv.org Artificial IntelligenceDec-3-2021

Multi-label learning predicts a subset of labels from a given label set for an unseen instance while considering label correlations. A known challenge with multi-label classification is the long-tailed distribution of labels. Many studies focus on improving the overall predictions of the model and thus do not prioritise tail-end labels. Improving the tail-end label predictions in multi-label classifications of medical text enables the potential to understand patients better and improve care. The knowledge gained by one or more infrequent labels can impact the cause of medical decisions and treatment plans. This research presents variations of concatenated domain-specific language models, including multi-BioMed-Transformers, to achieve two primary goals. First, to improve F1 scores of infrequent labels across multi-label problems, especially with long-tail labels; second, to handle long medical text and multi-sourced electronic health records (EHRs), a challenging task for standard transformers designed to work on short input sequences. A vital contribution of this research is new state-of-the-art (SOTA) results obtained using TransformerXL for predicting medical codes. A variety of experiments are performed on the Medical Information Mart for Intensive Care (MIMIC-III) database. Results show that concatenated BioMed-Transformers outperform standard transformers in terms of overall micro and macro F1 scores and individual F1 scores of tail-end labels, while incurring lower training times than existing transformer-based solutions for long input sequences.

language model, tail-end label, transformerxl, (16 more...)

arXiv.org Artificial Intelligence

2112.01718

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Health & Medicine > Health Care Technology > Medical Record (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.79)

Add feedback

SLOs Made Easier with Nobl9 and Amazon CloudWatch Metrics Insights (Preview)

#artificialintelligenceDec-1-2021, 00:42:51 GMT

Amazon CloudWatch has recently launched Metrics Insights – a fast, flexible, SQL-based query engine that lets customers identify trends and patterns across millions of operational metrics in real time. Metrics Insights allows customers to easily query and analyze metrics to gain better visibility into the health and performance of their infrastructure and large-scale applications. Nobl9 and Amazon Web Services (AWS) have collaborated to extend the existing Nobl9 CloudWatch integration with CloudWatch Metrics Insights (Preview). This will help users to retrieve metrics even faster and gain added flexibility in querying raw service level indicator (SLI) data to use for your SLOs. Nobl9 launched the first version of its CloudWatch integration in September 2021, giving customers a versatile tool to monitor their products.

cloudwatch metric insight, customer, metric insight, (12 more...)

#artificialintelligence

Country: Asia > China (0.06)

Industry:

Retail > Online (0.40)
Information Technology > Services (0.37)

Technology:

Information Technology > Databases (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.38)

Add feedback