AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

DyREx: Dynamic Query Representation for Extractive Question Answering

Zaratiana, Urchade, Khbir, Niama El, Núñez, Dennis, Holat, Pierre, Tomeh, Nadi, Charnois, Thierry

arXiv.org Artificial IntelligenceOct-26-2022

Extractive question answering (ExQA) is an essential task for Natural Language Processing. The dominant approach to ExQA is one that represents the input sequence tokens (question and passage) with a pre-trained transformer, then uses two learned query vectors to compute distributions over the start and end answer span positions. These query vectors lack the context of the inputs, which can be a bottleneck for the model performance. To address this problem, we propose \textit{DyREx}, a generalization of the \textit{vanilla} approach where we dynamically compute query vectors given the input, using an attention mechanism through transformer layers. Empirical observations demonstrate that our approach consistently improves the performance over the standard one. The code and accompanying files for running the experiments are available at \url{https://github.com/urchade/DyReX}.

machine learning, natural language, question answering, (16 more...)

arXiv.org Artificial Intelligence

2210.15048

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.42)

Add feedback

NeuralSearchX: Serving a Multi-billion-parameter Reranker for Multilingual Metasearch at a Low Cost

Almeida, Thales Sales, Laitz, Thiago, Seródio, João, Bonifacio, Luiz Henrique, Lotufo, Roberto, Nogueira, Rodrigo

arXiv.org Artificial IntelligenceOct-26-2022

The widespread availability of search API's (both free and commercial) brings the promise of increased coverage and quality of search results for metasearch engines, while decreasing the maintenance costs of the crawling and indexing infrastructures. However, merging strategies frequently comprise complex pipelines that require careful tuning, which is often overlooked in the literature. In this work, we describe NeuralSearchX, a metasearch engine based on a multi-purpose large reranking model to merge results and highlight sentences. Due to the homogeneity of our architecture, we could focus our optimization efforts on a single component. We compare our system with Microsoft's Biomedical Search and show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks. Human evaluation on two domain-specific tasks shows that our retrieval system outperformed Google API by a large margin in terms of nDCG@10 scores. By describing our architecture and implementation in detail, we hope that the community will build on our design choices.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.14837

Country:

South America > Brazil > São Paulo (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.94)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.99)
(3 more...)

Add feedback

A search engine for shapes

MIT Technology ReviewOct-25-2022, 22:00:00 GMT

Born and raised in Shanghai, Tan came to MIT to study high-energy astrophysics and wrote his dissertation on computational modeling of neutron stars. "Coming from China at that time, I had very little experience with computers," he says. "I was fortunate to find many helpful students during my time there." Tan also met his wife, Hong (Zhang) Tan, SM '88, PhD '96, at MIT. The pair were married in the MIT Chapel and today have two sons.

defense contractor, neutron star, search engine, (3 more...)

MIT Technology Review

Country:

Asia > China > Shanghai > Shanghai (0.27)
North America > United States > Indiana (0.07)

Industry:

Government > Military (1.00)
Government > Regional Government > North America Government > United States Government (0.37)

Technology:

Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Bilingual Lexicon Induction for Low-Resource Languages using Graph Matching via Optimal Transport

Marchisio, Kelly, Saad-Eldin, Ali, Duh, Kevin, Priebe, Carey, Koehn, Philipp

arXiv.org Artificial IntelligenceOct-25-2022

Bilingual lexicons form a critical component of various natural language processing applications, including unsupervised and semisupervised machine translation and crosslingual information retrieval. We improve bilingual lexicon induction performance across 40 language pairs with a graph-matching method based on optimal transport. The method is especially strong with low amounts of supervision.

computational linguistic, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.14378

Country:

Europe > Italy > Tuscany > Florence (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report (0.50)

Industry:

Government > Military (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

CrisisLTLSum: A Benchmark for Local Crisis Event Timeline Extraction and Summarization

Faghihi, Hossein Rajaby, Alhafni, Bashar, Zhang, Ke, Ran, Shihao, Tetreault, Joel, Jaimes, Alejandro

arXiv.org Artificial IntelligenceOct-25-2022

Social media has increasingly played a key role in emergency response: first responders can use public posts to better react to ongoing crisis events and deploy the necessary resources where they are most needed. Timeline extraction and abstractive summarization are critical technical tasks to leverage large numbers of social media posts about events. Unfortunately, there are few datasets for benchmarking technical approaches for those tasks. This paper presents CrisisLTLSum, the largest dataset of local crisis event timelines available to date. CrisisLTLSum contains 1,000 crisis event timelines across four domains: wildfires, local fires, traffic, and storms. We built CrisisLTLSum using a semi-automated cluster-then-refine approach to collect data from the public Twitter stream. Our initial experiments indicate a significant gap between the performance of strong baselines compared to the human performance on both tasks. Our dataset, code, and models are publicly available.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2210.1419

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Venezuela > Capital District > Caracas (0.04)
North America > United States > New York (0.04)
(15 more...)

Genre: Research Report (0.82)

Industry: Law Enforcement & Public Safety > Fire & Emergency Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SciFact-Open: Towards open-domain scientific claim verification

Wadden, David, Lo, Kyle, Kuehl, Bailey, Cohan, Arman, Beltagy, Iz, Wang, Lucy Lu, Hajishirzi, Hannaneh

arXiv.org Artificial IntelligenceOct-25-2022

While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature. Moving to this open-domain evaluation setting, however, poses unique challenges; in particular, it is infeasible to exhaustively annotate all evidence documents. In this work, we present SciFact-Open, a new test collection designed to evaluate the performance of scientific claim verification systems on a corpus of 500K research abstracts. Drawing upon pooling techniques from information retrieval, we collect evidence for scientific claims by pooling and annotating the top predictions of four state-of-the-art scientific claim verification models. We find that systems developed on smaller corpora struggle to generalize to SciFact-Open, exhibiting performance drops of at least 15 F1. In addition, analysis of the evidence in SciFact-Open reveals interesting phenomena likely to appear when claim verification systems are deployed in practice, e.g., cases where the evidence supports only a special case of the claim. Our dataset is available at https://github.com/dwadden/scifact-open.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.13777

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Enhancing Label Consistency on Document-level Named Entity Recognition

Jeong, Minbyul, Kang, Jaewoo

arXiv.org Artificial IntelligenceOct-24-2022

Named entity recognition (NER) is a fundamental part of extracting information from documents in biomedical applications. A notable advantage of NER is its consistency in extracting biomedical entities in a document context. Although existing document NER models show consistent predictions, they still do not meet our expectations. We investigated whether the adjectives and prepositions within an entity cause a low label consistency, which results in inconsistent predictions. In this paper, we present our method, ConNER, which enhances the label dependency of modifiers (e.g., adjectives and prepositions) to achieve higher label agreement. ConNER refines the draft labels of the modifiers to improve the output representations of biomedical entities. The effectiveness of our method is demonstrated on four popular biomedical NER datasets; in particular, its efficacy is proved on two datasets with 7.5-8.6% absolute improvements in the F1 score. We interpret that our ConNER method is effective on datasets that have intrinsically low label consistency. In the qualitative analysis, we demonstrate how our approach makes the NER model generate consistent predictions. Our code and resources are available at https://github.com/dmis-lab/ConNER/.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.12949

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Research on Cross-media Science and Technology Information Data Retrieval

Jiang, Yang, Xue, Zhe, Li, Ang

arXiv.org Artificial IntelligenceOct-24-2022

Since the era of big data, the Internet has been flooded with all kinds of information. Browsing information through the Internet has become an integral part of people's daily life. Unlike the news data and social data in the Internet, the cross-media technology information data has different characteristics. This data has become an important basis for researchers and scholars to track the current hot spots and explore the future direction of technology development. As the volume of science and technology information data becomes richer, the traditional science and technology information retrieval system, which only supports unimodal data retrieval and uses outdated data keyword matching model, can no longer meet the daily retrieval needs of science and technology scholars. Therefore, in view of the above research background, it is of profound practical significance to study the cross-media science and technology information data retrieval system based on deep semantic features, which is in line with the development trend of domestic and international technologies.

artificial intelligence, natural language, technology information data retrieval, (1 more...)

arXiv.org Artificial Intelligence

2204.04887

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

Improving Chinese Named Entity Recognition by Search Engine Augmentation

Mao, Qinghua, Li, Jiatong, Meng, Kui

arXiv.org Artificial IntelligenceOct-23-2022

Compared with English, Chinese suffers from more grammatical ambiguities, like fuzzy word boundaries and polysemous words. In this case, contextual information is not sufficient to support Chinese named entity recognition (NER), especially for rare and emerging named entities. Semantic augmentation using external knowledge is a potential way to alleviate this problem, while how to obtain and leverage external knowledge for the NER task remains a challenge. In this paper, we propose a neural-based approach to perform semantic augmentation using external knowledge from search engine for Chinese NER. In particular, a multi-channel semantic fusion model is adopted to generate the augmented input representations, which aggregates external related texts retrieved from the search engine. Experiments have shown the superiority of our model across 4 NER datasets, including formal and social media language contexts, which further prove the effectiveness of our approach.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.12662

Country:

Asia > China > Hong Kong (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

[100%OFF] Build A Search Engine With Python: Computer Science & Python

#artificialintelligenceOct-22-2022, 20:34:27 GMT

Many of the online courses teach you to code but not the theory/way of thinking behind it why would we choose a while but not a for loop, why should we pass 2 parameters to a function but not only one? We provide a platform for thousands of people to expand the understanding of programming and computer science. Founded in 2013 our mission is to spread the love for programming. To achieve this, we're working hard on providing content that will help people build a solid foundation in those subjects. This course will help you to master the foundation and know-how to solve problems with Python code.

computer science, computer science & python, python, (9 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education > Educational Setting > Online (0.77)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.44)

Add feedback