AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Blender Bot -- Part 3: The Many Architectures

#artificialintelligenceJul-3-2020, 01:45:45 GMT

We have been looking into Facebook's open-sourced conversational offering, Blender Bot. In Part-1 we went over in detail about the DataSets used in the pre-training and fine-tuning of it and the failure cases as well as limitations of Blender. And in Part-2 we studied the more generic problem setting of "Multi-Sentence Scoring", the Transformer architectures used for such a task and learnt about the Poly-Encoders in particular -- which will be used to provide the encoder representations in Blender. In this 3rd and final part, we return from our respite with Poly-Encoders, back to Blender. We shall go over the different Model Architectures, their respective training objectives, the Evaluation methods and performance of Blender in comparison to Meena.

information retrieval, machine learning, natural language, (19 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.30)

Add feedback

DuckDuckGo down in India: Private browser mysteriously stops working

The Independent - TechJul-1-2020, 11:55:59 GMT

Privacy-focused search engine DuckDuckGo has reported that its service is not working in India. "To our users in India: We've received many reports our search engine is unreachable by much of India right now and have confirmed it is not due to us," the company tweeted. "We're actively talking to Internet providers to get to the bottom of it ASAP. Thank you for your patience." It is unclear why DuckDuckGo would be unavailable in the country.

artificial intelligence, information retrieval, natural language, (15 more...)

The Independent - Tech

Country:

Asia > India (1.00)
Europe > United Kingdom (0.06)

Industry:

Government > Regional Government > Asia Government > India Government (0.52)
Information Technology > Networks (0.37)

Technology:

Information Technology > Information Management > Search (0.85)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)
Information Technology > Communications > Mobile (0.52)

Add feedback

Answering Questions on COVID-19 in Real-Time

Lee, Jinhyuk, Yi, Sean S., Jeong, Minbyul, Sung, Mujeen, Yoon, Wonjin, Choi, Yonghwa, Ko, Miyoung, Kang, Jaewoo

arXiv.org Artificial IntelligenceJun-29-2020

The recent outbreak of the novel coronavirus is wreaking havoc on the world and researchers are struggling to effectively combat it. One reason why the fight is difficult is due to the lack of information and knowledge. In this work, we outline our effort to contribute to shrinking this knowledge vacuum by creating covidAsk, a question answering (QA) system that combines biomedical text mining and QA techniques to provide answers to questions in real-time. Our system leverages both supervised and unsupervised approaches to provide informative answers using DenSPI (Seo et al., 2019) and BEST (Lee et al., 2016). Evaluation of covidAsk is carried out by using a manually created dataset called COVID-19 Questions which is based on facts about COVID-19. We hope our system will be able to aid researchers in their search for knowledge and information not only for COVID-19 but for future pandemics as well.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2006.1583

Country: Europe > Austria > Vienna (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Information Management (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Twitter Will Check if Articles Are Read Before Sharing - Search Engine Journal

#artificialintelligenceJun-27-2020, 22:38:33 GMT

Twitter announced a new feature that encourages Android Twitter users to read an article before reading it. This raised suspicions that Twitter was tracking user clicks. The move is part of Twitter's stated goal to encourage "informed discussion." Often people share a link without reading the article. That results in click bait titles getting widely promoted regardless of the content.

artificial intelligence, information retrieval, natural language, (13 more...)

#artificialintelligence

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

A Flexible Framework for Entity Resolution

#artificialintelligenceJun-27-2020, 20:10:52 GMT

A critical component of data management and enrichment pipelines is connecting large datasets from various sources to form a holistic view; to make connections between entities across data sources. Oftentimes, these entities -- such as individuals, organizations, or addresses -- may not have a unique identifier that can be used as a key to detect duplicates or to merge datasets on. ThinkData has developed a scalable entity resolution engine to solve these problems. After experimenting with both deep learning and traditional NLP techniques, the team has found the best balance of accuracy and performance. Specifically, we have achieved near-parity in accuracy compared to Magellan (the leading entity resolution project in research), albeit with much better performance metrics and greater scalability.

information retrieval, machine learning, natural language, (5 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Pre-training via Paraphrasing

Lewis, Mike, Ghazvininejad, Marjan, Ghosh, Gargi, Aghajanyan, Armen, Wang, Sida, Zettlemoyer, Luke

arXiv.org Machine LearningJun-26-2020

We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. We show it is possible to jointly learn to do retrieval and reconstruction, given only a random initialization. The objective noisily captures aspects of paraphrase, translation, multi-document summarization, and information retrieval, allowing for strong zero-shot performance on several tasks. For example, with no additional task-specific training we achieve BLEU scores of up to 35.8 for document translation. We further show that fine-tuning gives strong performance on a range of discriminative and generative tasks in many languages, making MARGE the most generally applicable pre-training method to date.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Machine Learning

2006.1502

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Virginia > Newport News (0.04)
North America > United States > West Virginia (0.04)
Africa > Niger (0.04)

Genre: Research Report (0.64)

Industry:

Government > Space Agency (0.95)
Government > Regional Government > North America Government > United States Government (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)

Add feedback

DSC Data Science Search Engine

#artificialintelligenceJun-24-2020, 14:06:16 GMT

Productive, Self-Service Data Science - June 30 Data science is a core part of an organization's digital transformation strategy. In this latest DSC webinar discover how American Family Insurance's use of the Alation Data Catalog is enabling more productive data science outcomes with trusted, curated data. Productive, Self-Service Data Science - June 30 Data science is a core part of an organization's digital transformation strategy. In this latest DSC webinar discover how American Family Insurance's use of the Alation Data Catalog is enabling more productive data science outcomes with trusted, curated data.

dsc data science search engine, information retrieval, natural language, (9 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Evaluating Your Learning to Rank Model: Dos and Don'ts in Offline/Onl…

#artificialintelligenceJun-24-2020, 12:13:27 GMT

Learning to rank (LTR from now on) is the application of machine learning techniques, typically supervised, in the formulation of ranking models for information retrieval systems. With LTR becoming more and more popular (Apache Solr supports it from Jan 2017 and Elasticsearch has an Open Source plugin released in 2018), organizations struggle with the problem of how to evaluate the quality of the models they train. This talk explores all the major points in both Offline and Online evaluation. Setting up correct infrastructures and processes for a fair and effective evaluation of the trained models is vital for measuring the improvements/regressions of a LTR system. The talk is intended for: – Product Owners, Search Managers, Business Owners – Software Engineers, Data Scientists, and Machine Learning Enthusiast Expect to learn: the importance of Offline testing from a business perspective how Offline testing can be done with Open Source libraries how to build a realistic test set from the original data set in input avoiding common mistakes in the process the importance of Online testing from a business perspective A/B testing and Interleaving approaches: details and Pros/ Cons common mistakes and how they can false the obtained results Join us as we explore real-world scenarios and dos and don'ts from the e-commerce industry!

information retrieval, machine learning, natural language, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)

Add feedback

Semantic Linking Maps for Active Visual Object Search

Zeng, Zhen, Röfer, Adrian, Jenkins, Odest Chadwicke

arXiv.org Artificial IntelligenceJun-18-2020

We aim for mobile robots to function in a variety of common human environments. Such robots need to be able to reason about the locations of previously unseen target objects. Landmark objects can help this reasoning by narrowing down the search space significantly. More specifically, we can exploit background knowledge about common spatial relations between landmark and target objects. For example, seeing a table and knowing that cups can often be found on tables aids the discovery of a cup. Such correlations can be expressed as distributions over possible pairing relationships of objects. In this paper, we propose an active visual object search strategy method through our introduction of the Semantic Linking Maps (SLiM) model. SLiM simultaneously maintains the belief over a target object's location as well as landmark objects' locations, while accounting for probabilistic inter-object spatial relations. Based on SLiM, we describe a hybrid search strategy that selects the next best view pose for searching for the target object based on the maintained belief. We demonstrate the efficiency of our SLiM-based search strategy through comparative experiments in simulated environments. We further demonstrate the real-world applicability of SLiM-based search in scenarios with a Fetch mobile manipulation robot.

artificial intelligence, information retrieval, natural language, (18 more...)

arXiv.org Artificial Intelligence

2006.10807

Country:

North America > United States > Michigan (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > Germany > Bremen > Bremen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

Add feedback

CO-Search: COVID-19 Information Retrieval with Semantic Search, Question Answering, and Abstractive Summarization

Esteva, Andre, Kale, Anuprit, Paulus, Romain, Hashimoto, Kazuma, Yin, Wenpeng, Radev, Dragomir, Socher, Richard

arXiv.org Artificial IntelligenceJun-16-2020

The COVID-19 global pandemic has resulted in international efforts to understand, track, and mitigate the disease, yielding a significant corpus of COVID-19 and SARS-CoV-2-related publications across scientific disciplines. As of May 2020, 128,000 coronavirus-related publications have been collected through the COVID-19 Open Research Dataset Challenge [23]. Here we present CO-Search, a retriever-ranker semantic search engine designed to handle complex queries over the COVID-19 literature, potentially aiding overburdened health workers in finding scientific answers during a time of crisis. The retriever is built from a Siamese-BERT[18] encoder that is linearly composed with a TF-IDF vectorizer [19], and reciprocal-rank fused [5] with a BM25 vectorizer. The ranker is composed of a multi-hop question-answering module[1], that together with a multi-paragraph abstractive summarizer adjust retriever scores. To account for the domain-specific and relatively limited dataset, we generate a bipartite graph of document paragraphs and citations, creating 1.3 million (citation title, paragraph) tuples for training the encoder. We evaluate our system on the data of the TREC-COVID[22] information retrieval challenge. CO-Search obtains top performance on the datasets of the first and second rounds, across several key metrics: normalized discounted cumulative gain, precision, mean average precision, and binary preference.

artificial intelligence, information retrieval, natural language, (14 more...)

arXiv.org Artificial Intelligence

2006.09595

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > Canada (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback