AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Amazon Comprehend adds five new languages to Custom Entity Recognition

#artificialintelligenceAug-14-2020, 19:26:07 GMT

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to analyze text documents and identify insights such as sentiment, entities, and topics from text. You can use Custom Entity Recognition to identify terms that are specific to your domain. For example, you can instantly extract product names, financial entities or any term relevant to you from unstructured text documents. Starting today, Amazon Comprehend is adding support for the following five new languages to Custom Entity Recognition: French, German, Italian, Portuguese, and Spanish.

artificial intelligence, information retrieval, natural language, (4 more...)

#artificialintelligence

Industry: Retail > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Researchers claim bias in AI named entity recognition models

#artificialintelligenceAug-12-2020, 16:20:21 GMT

Twitter researchers claim to have found evidence of demographic bias in named entity recognition, the first step toward generating automated knowledge bases, or the repositories leveraged by services like search engines. They say their analysis reveals AI performs better at identifying names from specific groups and the biases manifest in syntax, semantics, and how word uses vary across linguistic contexts. Knowledge bases are essentially databases containing information about entities -- people, places, and things. In 2012, Google launched a knowledge base -- the Knowledge Graph -- to enhance Google search results with hundreds of billions of facts gathered from sources including Wikipedia, Wikidata, and CIA World Factbook. Microsoft provides a knowledge base with over 150,000 articles created by support professionals who've resolved issues for its customers. But while the usefulness of knowledge bases is not in dispute, the researchers assert the embeddings used to represent entities in them exhibit bias against certain groups of people.

entity recognition model, information retrieval, natural language, (12 more...)

#artificialintelligence

Country: North America > United States > Massachusetts (0.05)

Genre: Research Report > New Finding (0.51)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.98)

Add feedback

(Almost) All of Entity Resolution

Binette, Olivier, Steorts, Rebecca C.

arXiv.org Machine LearningAug-10-2020

Whether the goal is to estimate the number of people that live in a congressional district, to estimate the number of individuals that have died in an armed conflict, or to disambiguate individual authors using bibliographic data, all these applications have a common theme - integrating information from multiple sources. Before such questions can be answered, databases must be cleaned and integrated in a systematic and accurate way, commonly known as record linkage, de-duplication, or entity resolution. In this article, we review motivational applications and seminal papers that have led to the growth of this area. Specifically, we review the foundational work that began in the 1940's and 50's that have led to modern probabilistic record linkage. We review clustering approaches to entity resolution, semi- and fully supervised methods, and canonicalization, which are being used throughout industry and academia in applications such as human rights, official statistics, medicine, citation networks, among others. Finally, we discuss current research topics of practical importance.

entity resolution, information retrieval, machine learning, (14 more...)

arXiv.org Machine Learning

2008.04443

Country:

Asia > Middle East > Syria (0.28)
North America > United States > North Carolina > Durham County > Durham (0.14)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
(23 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government > Voting & Elections (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
(4 more...)

Add feedback

Extracting Keywords from Open-Ended Business Survey Questions

McGillivray, Barbara, Jenset, Gard, Heil, Dominik

arXiv.org Artificial IntelligenceAug-7-2020

Open-ended survey data constitute an important basis in research as well as for making business decisions. Collecting and manually analysing free-text survey data is generally more costly than collecting and analysing survey data consisting of answers to multiple-choice questions. Yet free-text data allow for new content to be expressed beyond predefined categories and are a very valuable source of new insights into people's opinions. At the same time, surveys always make ontological assumptions about the nature of the entities that are researched, and this has vital ethical consequences. Human interpretations and opinions can only be properly ascertained in their richness using textual data sources; if these sources are analyzed appropriately, the essential linguistic nature of humans and social entities is safeguarded. Natural Language Processing (NLP) offers possibilities for meeting this ethical business challenge by automating the analysis of natural language and thus allowing for insightful investigations of human judgements. We present a computational pipeline for analysing large amounts of responses to open-ended questions in surveys and extract keywords that appropriately represent people's opinions. This pipeline addresses the need to perform such tasks outside the scope of both commercial software and bespoke analysis, exceeds the performance to state-of-the-art systems, and performs this task in a transparent way that allows for scrutinising and exposing potential biases in the analysis. Following the principle of Open Data Science, our code is open-source and generalizable to other datasets. I CONTEXT AND MOTIVATION Leaders, managers, and decision-makers critically rely on information and feedback. Decisionmakers first need information about the current set of circumstances which provide the context of the decision, and then need feedback on how the decision could play out. To get such information in a format that allows them to appropriately understand the entity they are seeking to comprehend is of critical importance to come to a high-quality decision. Often only qualitative insight into the opinions, interpretations and assumptions of large numbers of people will allow us to understand a set of circumstances properly and are therefore required to make high-quality decisions and consequently outcomes.

artificial intelligence, information retrieval, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.46298/jdmdh.5077

1808.10685

Country:

North America > United States > Massachusetts > Middlesex County > Malden (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (1.00)

Industry:

Education (0.67)
Telecommunications (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.32)

Add feedback

Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering

Liu, Ye, Chowdhury, Shaika, Zhang, Chenwei, Caragea, Cornelia, Yu, Philip S.

arXiv.org Artificial IntelligenceAug-5-2020

Healthcare question answering assistance aims to provide customer healthcare information, which widely appears in both Web and mobile Internet. The questions usually require the assistance to have proficient healthcare background knowledge as well as the reasoning ability on the knowledge. Recently a challenge involving complex healthcare reasoning, HeadQA dataset, has been proposed, which contains multiple-choice questions authorized for the public healthcare specialization exam. Unlike most other QA tasks that focus on linguistic understanding, HeadQA requires deeper reasoning involving not only knowledge extraction, but also complex reasoning with healthcare knowledge. These questions are the most challenging for current QA systems, and the current performance of the state-of-the-art method is slightly better than a random guess. In order to solve this challenging task, we present a Multi-step reasoning with Knowledge extraction framework (MurKe). The proposed framework first extracts the healthcare knowledge as supporting documents from the large corpus. In order to find the reasoning chain and choose the correct answer, MurKe iterates between selecting the supporting documents, reformulating the query representation using the supporting documents and getting entailment score for each choice using the entailment model. The reformulation module leverages selected documents for missing evidence, which maintains interpretability. Moreover, we are striving to make full use of off-the-shelf pre-trained models. With less trainable weight, the pre-trained model can easily adapt to healthcare tasks with limited training samples. From the experimental results and ablation study, our system is able to outperform several strong baselines on the HeadQA dataset.

arxiv preprint arxiv, information retrieval, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2008.02434

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Solving One of the Biggest Challenges for AI-Based Search Engines: Relevance

#artificialintelligenceAug-4-2020, 19:35:47 GMT

Let's learn how to implement ClickModels in order to extract Relevance from clickstream data. These steps tend to be what is already necessary for implementing an effective enough search engine system for a given application. Eventually, the requirement to upgrade the system to deliver customized results may arise. Doing so should be simple. One could choose from a set of machine learning ranking algorithms, train some selected models, prepare them for production and observe the results.

information retrieval, machine learning, natural language, (13 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.71)

Add feedback

The Art of SEO: Mastering Search Engine Optimization, 3rd Edition - Programmer Books

#artificialintelligenceAug-3-2020, 19:20:08 GMT

Three acknowledged experts in search engine optimization share guidelines and innovative techniques that will help you plan and execute a comprehensive SEO strategy. Novices will receive a thorough SEO education, while experienced SEO practitioners get an extensive reference to support ongoing engagements. Comprehend SEO's many intricacies and complexities Explore the underlying theory and inner workings of search engines Understand the role of social media, user data, and links Discover tools to track results and measure success Examine the effects of Google's Panda and Penguin algorithms Consider opportunities in mobile, local, and vertical SEO Build a competent SEO team with defined roles Glimpse the future of search and the SEO industry

information retrieval, mastering search engine optimization, natural language, (4 more...)

#artificialintelligence

Genre: Collection > Book (0.40)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Search Engine Journal - Marketing News, Interviews and How-to Guides

#artificialintelligenceAug-2-2020, 00:14:24 GMT

Search Engine Journal is dedicated to producing the latest search news, the best guides and how-tos for the SEO and marketer community.

artificial intelligence, information retrieval, natural language, (4 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.60)

Add feedback

Content Clustering: 50 Tips for Content Planning with Topic Clustering.

#artificialintelligenceAug-1-2020, 07:00:06 GMT

Have you decided to tune your business into the next level evolution of SEO? There is a buzz around the content cluster on social media platforms and over the internet. SEO content managers and specialists are always struggling to balance the search engine optimization and content quality in the same quantity and quality. Here are the fantastic tips that you need to know content clustering where the content planning with topic clustering works better. The content clustering is the idea that concentrates on a single point of purpose where the creation of cluster related and interlinking the information through hyperlinks.

audience, information retrieval, natural language, (19 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.52)

Add feedback

NeuralQA: A Usable Library for Question Answering (Contextual Query Expansion + BERT) on Large Datasets

Dibia, Victor

arXiv.org Artificial IntelligenceJul-29-2020

Existing tools for Question Answering (QA) have challenges that limit their use in practice. They can be complex to set up or integrate with existing infrastructure, do not offer configurable interactive interfaces, and do not cover the full set of subtasks that frequently comprise the QA pipeline (query expansion, retrieval, reading, and explanation/sensemaking). To help address these issues, we introduce NeuralQA - a usable library for QA on large datasets. NeuralQA integrates well with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks. It introduces and implements contextual query expansion (CQE) using a masked language model (MLM) as well as relevant snippets (RelSnip) - a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github.

interface, neuralqa, representation, (15 more...)

arXiv.org Artificial Intelligence

2007.15211

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback