AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

Weikum, Gerhard, Dong, Luna, Razniewski, Simon, Suchanek, Fabian

arXiv.org Artificial IntelligenceSep-24-2020

Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.

closed-world assumption, knowledge management, machine learning, (25 more...)

arXiv.org Artificial Intelligence

2009.11564

Country:

Europe > Germany > Berlin (0.13)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.13)
Asia > Middle East > Jordan (0.04)
(29 more...)

Genre:

Overview (1.00)
Instructional Material (1.00)
Research Report > New Finding (0.67)
Personal > Honors > Award (0.46)

Industry:

Media > News (1.00)
Media > Music (1.00)
Media > Film (1.00)
(11 more...)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
(9 more...)

Add feedback

Archaeological search engine adds a new dimension to 'digging'

AIHubSep-23-2020, 09:20:42 GMT

Apps that can precisely identify shards, coins or heel bones: archaeology has embraced artificial intelligence. Alex Brandsen is working on a search engine that scans vast quantities of text from an archaeological viewpoint. An archaeologist by training, he spent time working as a programmer, before returning to University to study for a PhD combining the two "I've noticed at [archaeology] conferences over the last two years that AI has become a real buzzword, and a lot of money and energy are going into it." Brandsen is working on a search engine for archaeologists that can quickly and effectively scan all the excavation reports of Dutch finds. "For example, if you search under burial rites in the Middle Ages, the search engine needs to understand that the term 1200 CE is also relevant. There are thousands of terms that mean Middle Ages and it has to find them all. It must also be able to distinguish between a bill as a bladed weapon and a researcher whose name is Bill."

artificial intelligence, information retrieval, natural language, (14 more...)

AIHub

Country: Europe > Netherlands > South Holland > Leiden (0.05)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Annotator Rationales for Labeling Tasks in Crowdsourcing

Kutlu, Mucahid (TOBB University of Economics and Technology) | McDonnell, Tyler | Elsayed, Tamer (Qatar University) | Lease, Matthew (University of Texas at Austin)

Journal of Artificial Intelligence ResearchSep-23-2020

When collecting item ratings from human judges, it can be difficult to measure and enforce data quality due to task subjectivity and lack of transparency into how judges make each rating decision. To address this, we investigate asking judges to provide a specific form of rationale supporting each rating decision. We evaluate this approach on an information retrieval task in which human judges rate the relevance of Web pages for different search topics. Cost-benefit analysis over 10,000 judgments collected on Amazon's Mechanical Turk suggests a win-win. Firstly, rationales yield a multitude of benefits: more reliable judgments, greater transparency for evaluating both human raters and their judgments, reduced need for expert gold, the opportunity for dual-supervision from ratings and rationales, and added value from the rationales themselves. Secondly, once experienced in the task, crowd workers provide rationales with almost no increase in task completion time. Consequently, we can realize the above benefits with minimal additional cost.

information retrieval, machine learning, natural language, (20 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.12012

AI Access Foundation

12012

Journal of Artificial Intelligence Research

Country:

North America > United States > Texas > Travis County > Austin (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Minnesota (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology (1.00)
Media (0.92)
Leisure & Entertainment (0.92)
(2 more...)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Useful sites for finding datasets for Data Analysis tasks

#artificialintelligenceSep-18-2020, 17:31:54 GMT

Let's now look at some of the useful sites for finding open and publicly available datasets, quickly and without much hassle. Google Dataset Search is a search engine dedicated to finding datasets. It is a search engine over metadata from data providers. This implies that it indexes over the descriptions of a dataset instead of its content. So if a dataset is available publicly, there is a good chance, that it will pop up in the Google dataset search.

artificial intelligence, information retrieval, natural language, (7 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Conditional Image Retrieval

Hamilton, Mark, Fu, Stephanie, Lu, Mindren, Freeman, William T.

arXiv.org Machine LearningSep-18-2020

This work introduces Conditional Image Retrieval (CIR) systems: IR methods that can efficiently specialize to specific subsets of images on the fly. These systems broaden the class of queries IR systems support, and eliminate the need for expensive re-fitting to specific subsets of data. Specifically, we adapt tree-based K-Nearest Neighbor (KNN) data-structures to the conditional setting by introducing additional inverted-index data-structures. This speeds conditional queries and does not slow queries without conditioning. We present two new datasets for evaluating the performance of CIR systems and evaluate a variety of design choices. As a motivating application, we present an algorithm that can explore shared semantic content between works of art of vastly different media and cultural origin. Finally, we demonstrate that CIR data-structures can identify Generative Adversarial Network (GAN) "blind spots": areas where GANs fail to properly model the true data distribution.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2007.07177

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Large-Scale Intelligent Microservices

Hamilton, Mark, Gonsalves, Nick, Lee, Christina, Raman, Anand, Walsh, Brendan, Prasad, Siddhartha, Banda, Dalitso, Zhang, Lucy, Zhang, Lei, Freeman, William T.

arXiv.org Artificial IntelligenceSep-16-2020

Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with their own restrictive syntax. We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives. Our system can orchestrate web services across hundreds of machines and takes full advantage of cluster, thread, and asynchronous parallelism. Using this framework, we provide large scale clients for intelligent services such as speech, vision, search, anomaly detection, and text analysis. This allows users to integrate ready-to-use intelligence into any datastore with an Apache Spark connector. To eliminate the majority of overhead from network communication, we also introduce a low-latency containerized version of our architecture. Finally, we demonstrate that the services we investigate are competitive on a variety of benchmarks, and present two applications of this framework to create intelligent search engines, and real time auto race analytics systems.

data mining, machine learning, programming language, (25 more...)

arXiv.org Artificial Intelligence

2009.08044

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(6 more...)

Add feedback

Revealing Secrets in SPARQL Session Level

Zhang, Xinyue, Wang, Meng, Saleem, Muhammad, Ngomo, Axel-Cyrille Ngonga, Qi, Guilin, Wang, Haofen

arXiv.org Artificial IntelligenceSep-13-2020

Based on Semantic Web technologies, knowledge graphs help users to discover information of interest by using live SPARQL services. Answer-seekers often examine intermediate results iteratively and modify SPARQL queries repeatedly in a search session. In this context, understanding user behaviors is critical for effective intention prediction and query optimization. However, these behaviors have not yet been researched systematically at the SPARQL session level. This paper reveals the secrets of session-level user search behaviors by conducting a comprehensive investigation over massive real-world SPARQL query logs. In particular, we thoroughly assess query changes made by users w.r.t. structural and data-driven features of SPARQL queries. To illustrate the potentiality of our findings, we employ a proof-of-concept model to predict user intentions, i.e., future directions of the given session, and give reformulation suggestions based on the predicted intention. We hope the results presented here will help to devise efficient SPARQL caching, auto-completion, query suggestion, approximation, and relaxation techniques in the future.

artificial intelligence, information retrieval query processing, natural language, (19 more...)

arXiv.org Artificial Intelligence

2009.06625

Country:

Europe > Germany > Saxony > Leipzig (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)

Add feedback

Elastic Transformers

#artificialintelligenceSep-12-2020, 17:35:35 GMT

Contextual bit -- as we have seen, keyword search can be (sometimes) limiting. Context is definitely highly beneficial to receive results that are semantically related to what you are looking for: when looking for "virus threat", "virus risks" also appear, etc

information retrieval, natural language, semantic search, (17 more...)

#artificialintelligence

Industry: Health & Medicine > Therapeutic Area (0.30)

Technology:

Information Technology > Information Management > Search (0.80)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Report on the 2019 Workshop on Smart Farming and Data Analytics (SFDAI)

Kelly, Liadh, van der Burg, Simone, Regan, Aine, Mooney, Peter

arXiv.org Artificial IntelligenceSep-7-2020

The 1st National workshop on Smart Farming and Data Analytics took place at Maynooth University in Ireland on June 12, 2019. The workshop included two invited keynote presentations, invited talks and breakout group discussions. The workshop attracted in the order of 50 participants, consisting of a mixture of computer scientists, general scientists, farmers, farm advisors, and agricultural business representatives. This allowed for lively discussion and cross-fertilization of ideas. And showed the significant interest in the smart farming domain, the many research challenges faced in the space and the potential for data analytics and information retrieval here.

artificial intelligence, information retrieval, natural language, (12 more...)

arXiv.org Artificial Intelligence

2009.03088

Country:

Europe > Ireland (0.18)
Europe > Netherlands (0.05)
North America > United States (0.04)
Europe > United Kingdom (0.04)

Genre:

Instructional Material (0.47)
Research Report (0.40)

Industry:

Food & Agriculture > Agriculture (0.96)
Law (0.95)
Government (0.94)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.47)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Add feedback

Active Learning++: Incorporating Annotator's Rationale using Local Model Explanation

Ghai, Bhavya, Liao, Q. Vera, Zhang, Yunfeng, Mueller, Klaus

arXiv.org Artificial IntelligenceSep-6-2020

We propose a new active learning (AL) framework, Active Learning++, which can utilize an annotator's labels as well as its rationale. Annotators can provide their rationale for choosing a label by ranking input features based on their importance for a given query. To incorporate this additional input, we modified the disagreement measure for a bagging-based Query by Committee (QBC) sampling strategy. Instead of weighing all committee models equally to select the next instance, we assign higher weight to the committee model with higher agreement with the annotator's ranking. Specifically, we generated a feature importance-based local explanation for each committee model. The similarity score between feature rankings provided by the annotator and the local model explanation is used to assign a weight to each corresponding committee model. This approach is applicable to any kind of ML model using model-agnostic techniques to generate local explanation such as LIME. With a simulation study, we show that our framework significantly outperforms a QBC based vanilla AL framework.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2009.04568

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.06)
North America > United States > Wisconsin > Dane County > Madison (0.05)
North America > United States > New York > New York County > New York City (0.05)

Genre: Research Report > Experimental Study (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)

Add feedback