AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Natural Language in Search Engine Optimization (SEO) -- How, What, When, And Why

#artificialintelligenceApr-30-2021, 09:30:12 GMT

For information to be processed, it is necessary to understand the data behind it by going into the essence of such data [1]. When it comes to natural language processing, the "what" of natural language is discussed. For instance, we can describe a native or evolved language as a natural language. Consequently, we can think of any spoken language as a natural language. We can use natural language to describe ordinary non-artificial speaking and writing language for natural language.

natural language processing, search engine, search engine optimization, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)

Add feedback

IITP in COLIEE@ICAIL 2019: Legal Information Retrieval using BM25 and BERT

Gain, Baban, Bandyopadhyay, Dibyanayan, Saikh, Tanik, Ekbal, Asif

arXiv.org Artificial IntelligenceApr-29-2021

Natural Language Processing (NLP) and Information Retrieval (IR) in the judicial domain is an essential task. With the advent of availability domain-specific data in electronic form and aid of different Artificial intelligence (AI) technologies, automated language processing becomes more comfortable, and hence it becomes feasible for researchers and developers to provide various automated tools to the legal community to reduce human burden. The Competition on Legal Information Extraction/Entailment (COLIEE-2019) run in association with the International Conference on Artificial Intelligence and Law (ICAIL)-2019 has come up with few challenging tasks. The shared defined four sub-tasks (i.e. Task1, Task2, Task3 and Task4), which will be able to provide few automated systems to the judicial system. The paper presents our working note on the experiments carried out as a part of our participation in all the sub-tasks defined in this shared task. We make use of different Information Retrieval(IR) and deep learning based approaches to tackle these problems. We obtain encouraging results in all these four sub-tasks.

information retrieval, legal information retrieval, query, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.13140/RG.2.2.28887.32161

2104.08653

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > India > West Bengal (0.05)
Asia > India > Bihar > Patna (0.05)
(2 more...)

Genre: Research Report (0.40)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

BEIR: A Heterogenous Benchmark for Zero-shot Evaluation of Information Retrieval Models

Thakur, Nandan, Reimers, Nils, Rücklé, Andreas, Srivastava, Abhishek, Gurevych, Iryna

arXiv.org Artificial IntelligenceApr-28-2021

Neural IR models have often been studied in homogeneous and narrow settings, which has considerably limited insights into their generalization capabilities. To address this, and to allow researchers to more broadly establish the effectiveness of their models, we introduce BEIR (Benchmarking IR), a heterogeneous benchmark for information retrieval. We leverage a careful selection of 17 datasets for evaluation spanning diverse retrieval tasks including open-domain datasets as well as narrow expert domains. We study the effectiveness of nine state-of-the-art retrieval models in a zero-shot evaluation setup on BEIR, finding that performing well consistently across all datasets is challenging. Our results show BM25 is a robust baseline and Reranking-based models overall achieve the best zero-shot performances, however, at high computational costs. In contrast, Dense-retrieval models are computationally more efficient but often underperform other approaches, highlighting the considerable room for improvement in their generalization capabilities. In this work, we extensively analyze different retrieval models and provide several suggestions that we believe may be useful for future work. BEIR datasets and code are available at https://github.com/UKPLab/beir.

cosine-sim, dataset, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2104.08663

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Oceania > Australia (0.04)
(12 more...)

Genre: Research Report > New Finding (0.54)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government (1.00)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

[P] Entity Embed: fuzzy and scalable Entity Resolution using Approximate Nearest Neighbors

#artificialintelligenceApr-27-2021, 00:15:38 GMT

Entity Embed is based on and is a special case of the AutoBlock model described by Amazon. It allows you to transform entities like companies, products, etc. into vectors to support scalable Record Linkage / Entity Resolution using Approximate Nearest Neighbors. Using Entity Embed, you can train a deep learning model to transform records into vectors in an N-dimensional embedding space. Thanks to a contrastive loss, those vectors are organized to keep similar records close and dissimilar records far apart in this embedding space. Embedding records enables scalable ANN search, which means finding thousands of candidate duplicate pairs of records per second per CPU.

approximate nearest neighbor, entity embed, entity resolution, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Information Technology > Communications > Social Media (0.76)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

How to Get the Most out of Excel with Machine Learning

#artificialintelligenceApr-26-2021, 03:20:14 GMT

Excel is perhaps the most well known data analysis tool out there. It's used to store and organize data such as sales numbers, profit rates, expenditures or revenues. Some businesses even use it to store text data. However, Excel is unable to organize text data without the help of machine learning. Machine learning algorithms can automatically analyze hundreds and thousands of rows of text data in a fast, consistent and scalable way. In other words, machine learning algorithms are able to quantify words and phrases in Excel, by assigning topics, keywords, entities, and even sentiment to each row of text.

excel spreadsheet, monkeylearn, spreadsheet, (16 more...)

#artificialintelligence

Country:

Asia > Thailand > Bangkok > Bangkok (0.05)
Europe > Spain > Galicia > Madrid (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.48)
Questionnaire & Opinion Survey (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.30)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.30)

Add feedback

Sattiy at SemEval-2021 Task 9: An Ensemble Solution for Statement Verification and Evidence Finding with Tables

Ruan, Xiaoyi, Jin, Meizhi, Ma, Jian, Yang, Haiqin, Jiang, Lianxin, Mo, Yang, Zhou, Mengyuan

arXiv.org Artificial IntelligenceApr-21-2021

Question answering from semi-structured tables can be seen as a semantic parsing task and is significant and practical for pushing the boundary of natural language understanding. Existing research mainly focuses on understanding contents from unstructured evidence, e.g., news, natural language sentences, and documents. The task of verification from structured evidence, such as tables, charts, and databases, is still less explored. This paper describes sattiy team's system in SemEval-2021 task 9: Statement Verification and Evidence Finding with Tables (SEM-TAB-FACT). This competition aims to verify statements and to find evidence from tables for scientific articles and to promote the proper interpretation of the surrounding article. In this paper, we exploited ensemble models of pre-trained language models over tables, TaPas and TaBERT, for Task A and adjust the result based on some rules extracted for Task B. Finally, in the leaderboard, we attain the F1 scores of 0.8496 and 0.7732 in Task A for the 2-way and 3-way evaluation, respectively, and the F1 score of 0.4856 in Task B.

evaluation, statement verification, training data, (15 more...)

arXiv.org Artificial Intelligence

2104.10366

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.57)

Add feedback

Benchmarking the Benchmark -- Analysis of Synthetic NIDS Datasets

Layeghy, Siamak, Gallagher, Marcus, Portmann, Marius

arXiv.org Artificial IntelligenceApr-18-2021

Network Intrusion Detection Systems (NIDSs) are an increasingly important tool for the prevention and mitigation of cyber attacks. A number of labelled synthetic datasets generated have been generated and made publicly available by researchers, and they have become the benchmarks via which new ML-based NIDS classifiers are being evaluated. Recently published results show excellent classification performance with these datasets, increasingly approaching 100 percent performance across key evaluation metrics such as accuracy, F1 score, etc. Unfortunately, we have not yet seen these excellent academic research results translated into practical NIDS systems with such near-perfect performance. This motivated our research presented in this paper, where we analyse the statistical properties of the benign traffic in three of the more recent and relevant NIDS datasets, (CIC, UNSW, ...). As a comparison, we consider two datasets obtained from real-world production networks, one from a university network and one from a medium size Internet Service Provider (ISP). Our results show that the two real-world datasets are quite similar among themselves in regards to most of the considered statistical features. Equally, the three synthetic datasets are also relatively similar within their group. However, and most importantly, our results show a distinct difference of most of the considered statistical features between the three synthetic datasets and the two real-world datasets. Since ML relies on the basic assumption of training and test datasets being sampled from the same distribution, this raises the question of how well the performance results of ML-classifiers trained on the considered synthetic datasets can translate and generalise to real-world networks. We believe this is an interesting and relevant question which provides motivation for further research in this space.

data mining, information retrieval, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.jisa.2023.103689

2104.09029

Country:

North America > United States (0.68)
Oceania > Australia (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Telecommunications (1.00)
Information Technology > Security & Privacy (1.00)
Energy > Oil & Gas > Upstream (0.34)
Government > Military > Cyberwarfare (0.34)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(3 more...)

Add feedback

Multi-source Neural Topic Modeling in Multi-view Embedding Spaces

Gupta, Pankaj, Chaudhary, Yatin, Schütze, Hinrich

arXiv.org Artificial IntelligenceApr-17-2021

Though word embeddings and topics are complementary representations, several past works have only used pretrained word embeddings in (neural) topic modeling to address data sparsity in short-text or small collection of documents. This work presents a novel neural topic modeling framework using multi-view embedding spaces: (1) pretrained topic-embeddings, and (2) pretrained word-embeddings (context insensitive from Glove and context-sensitive from BERT models) jointly from one or many sources to improve topic quality and better deal with polysemy. In doing so, we first build respective pools of pretrained topic (i.e., TopicPool) and word embeddings (i.e., WordPool). We then identify one or more relevant source domain(s) and transfer knowledge to guide meaningful learning in the sparse target domain. Within neural topic modeling, we quantify the quality of topics and document representations via generalization (perplexity), interpretability (topic coherence) and information retrieval (IR) using short-text, long-text, small and large document collections from news and medical domains. Introducing the multi-source multi-view embedding spaces, we have shown state-of-the-art neural topic modeling using 6 source (high-resource) and 5 target (low-resource) corpora.

corpora, docnade, docnadee, (14 more...)

arXiv.org Artificial Intelligence

2104.08551

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Back to the Basics: A Quantitative Analysis of Statistical and Graph-Based Term Weighting Schemes for Keyword Extraction

Ushio, Asahi, Liberatore, Federico, Camacho-Collados, Jose

arXiv.org Artificial IntelligenceApr-16-2021

Term weighting schemes are widely used in Natural Language Processing and Information Retrieval. In particular, term weighting is the basis for keyword extraction. However, there are relatively few evaluation studies that shed light about the strengths and shortcomings of each weighting scheme. In fact, in most cases researchers and practitioners resort to the well-known tf-idf as default, despite the existence of other suitable alternatives, including graph-based models. In this paper, we perform an exhaustive and large-scale empirical comparison of both statistical and graph-based term weighting methods in the context of keyword extraction. Our analysis reveals some interesting findings such as the advantages of the less-known lexical specificity with respect to tf-idf, or the qualitative differences between statistical and graph-based methods. Finally, based on our findings we discuss and devise some suggestions for practitioners. We release our code at https://github.com/asahi417/kex .

dataset, extraction, keyword extraction, (14 more...)

arXiv.org Artificial Intelligence

2104.08028

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(7 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Translational NLP: A New Paradigm and General Principles for Natural Language Processing Research

Newman-Griffis, Denis, Lehman, Jill Fain, Rosé, Carolyn, Hochheiser, Harry

arXiv.org Artificial IntelligenceApr-15-2021

Natural language processing (NLP) research combines the study of universal principles, through basic science, with applied science targeting specific use cases and settings. However, the process of exchange between basic NLP and applications is often assumed to emerge naturally, resulting in many innovations going unapplied and many important questions left unstudied. We describe a new paradigm of Translational NLP, which aims to structure and facilitate the processes by which basic and applied NLP research inform one another. Translational NLP thus presents a third research paradigm, focused on understanding the challenges posed by application needs and how these challenges can drive innovation in basic science and technology design. We show that many significant advances in NLP research have emerged from the intersection of basic principles with application needs, and present a conceptual framework outlining the stakeholders and key questions in translational research. Our framework provides a roadmap for developing Translational NLP as a dedicated research area, and identifies general translational principles to facilitate exchange between basic and applied research.

application, nlp research, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2104.07874

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Asia > China > Hong Kong (0.04)
(11 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Health Care Technology (0.68)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.67)
(2 more...)

Add feedback