Goto

Collaborating Authors

 Information Retrieval


Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases

arXiv.org Artificial Intelligence

Equipping machines with comprehensive knowledge of the world's entities and their relationships has been a long-standing goal of AI. Over the last decade, large-scale knowledge bases, also known as knowledge graphs, have been automatically constructed from web contents and text sources, and have become a key asset for search engines. This machine knowledge can be harnessed to semantically interpret textual phrases in news, social media and web tables, and contributes to question answering, natural language processing and data analytics. This article surveys fundamental concepts and practical methods for creating and curating large knowledge bases. It covers models and methods for discovering and canonicalizing entities and their semantic types and organizing them into clean taxonomies. On top of this, the article discusses the automatic extraction of entity-centric properties. To support the long-term life-cycle and the quality assurance of machine knowledge, the article presents methods for constructing open schemas and for knowledge curation. Case studies on academic projects and industrial knowledge graphs complement the survey of concepts and methods.


Archaeological search engine adds a new dimension to 'digging'

AIHub

Apps that can precisely identify shards, coins or heel bones: archaeology has embraced artificial intelligence. Alex Brandsen is working on a search engine that scans vast quantities of text from an archaeological viewpoint. An archaeologist by training, he spent time working as a programmer, before returning to University to study for a PhD combining the two "I've noticed at [archaeology] conferences over the last two years that AI has become a real buzzword, and a lot of money and energy are going into it." Brandsen is working on a search engine for archaeologists that can quickly and effectively scan all the excavation reports of Dutch finds. "For example, if you search under burial rites in the Middle Ages, the search engine needs to understand that the term 1200 CE is also relevant. There are thousands of terms that mean Middle Ages and it has to find them all. It must also be able to distinguish between a bill as a bladed weapon and a researcher whose name is Bill."


Annotator Rationales for Labeling Tasks in Crowdsourcing

Journal of Artificial Intelligence Research

When collecting item ratings from human judges, it can be difficult to measure and enforce data quality due to task subjectivity and lack of transparency into how judges make each rating decision. To address this, we investigate asking judges to provide a specific form of rationale supporting each rating decision. We evaluate this approach on an information retrieval task in which human judges rate the relevance of Web pages for different search topics. Cost-benefit analysis over 10,000 judgments collected on Amazon's Mechanical Turk suggests a win-win. Firstly, rationales yield a multitude of benefits: more reliable judgments, greater transparency for evaluating both human raters and their judgments, reduced need for expert gold, the opportunity for dual-supervision from ratings and rationales, and added value from the rationales themselves. Secondly, once experienced in the task, crowd workers provide rationales with almost no increase in task completion time. Consequently, we can realize the above benefits with minimal additional cost.


Useful sites for finding datasets for Data Analysis tasks

#artificialintelligence

Let's now look at some of the useful sites for finding open and publicly available datasets, quickly and without much hassle. Google Dataset Search is a search engine dedicated to finding datasets. It is a search engine over metadata from data providers. This implies that it indexes over the descriptions of a dataset instead of its content. So if a dataset is available publicly, there is a good chance, that it will pop up in the Google dataset search.


Conditional Image Retrieval

arXiv.org Machine Learning

This work introduces Conditional Image Retrieval (CIR) systems: IR methods that can efficiently specialize to specific subsets of images on the fly. These systems broaden the class of queries IR systems support, and eliminate the need for expensive re-fitting to specific subsets of data. Specifically, we adapt tree-based K-Nearest Neighbor (KNN) data-structures to the conditional setting by introducing additional inverted-index data-structures. This speeds conditional queries and does not slow queries without conditioning. We present two new datasets for evaluating the performance of CIR systems and evaluate a variety of design choices. As a motivating application, we present an algorithm that can explore shared semantic content between works of art of vastly different media and cultural origin. Finally, we demonstrate that CIR data-structures can identify Generative Adversarial Network (GAN) "blind spots": areas where GANs fail to properly model the true data distribution.


Large-Scale Intelligent Microservices

arXiv.org Artificial Intelligence

Deploying Machine Learning (ML) algorithms within databases is a challenge due to the varied computational footprints of modern ML algorithms and the myriad of database technologies each with their own restrictive syntax. We introduce an Apache Spark-based micro-service orchestration framework that extends database operations to include web service primitives. Our system can orchestrate web services across hundreds of machines and takes full advantage of cluster, thread, and asynchronous parallelism. Using this framework, we provide large scale clients for intelligent services such as speech, vision, search, anomaly detection, and text analysis. This allows users to integrate ready-to-use intelligence into any datastore with an Apache Spark connector. To eliminate the majority of overhead from network communication, we also introduce a low-latency containerized version of our architecture. Finally, we demonstrate that the services we investigate are competitive on a variety of benchmarks, and present two applications of this framework to create intelligent search engines, and real time auto race analytics systems.


Revealing Secrets in SPARQL Session Level

arXiv.org Artificial Intelligence

Based on Semantic Web technologies, knowledge graphs help users to discover information of interest by using live SPARQL services. Answer-seekers often examine intermediate results iteratively and modify SPARQL queries repeatedly in a search session. In this context, understanding user behaviors is critical for effective intention prediction and query optimization. However, these behaviors have not yet been researched systematically at the SPARQL session level. This paper reveals the secrets of session-level user search behaviors by conducting a comprehensive investigation over massive real-world SPARQL query logs. In particular, we thoroughly assess query changes made by users w.r.t. structural and data-driven features of SPARQL queries. To illustrate the potentiality of our findings, we employ a proof-of-concept model to predict user intentions, i.e., future directions of the given session, and give reformulation suggestions based on the predicted intention. We hope the results presented here will help to devise efficient SPARQL caching, auto-completion, query suggestion, approximation, and relaxation techniques in the future.


Elastic Transformers

#artificialintelligence

Contextual bit -- as we have seen, keyword search can be (sometimes) limiting. Context is definitely highly beneficial to receive results that are semantically related to what you are looking for: when looking for "virus threat", "virus risks" also appear, etc


Report on the 2019 Workshop on Smart Farming and Data Analytics (SFDAI)

arXiv.org Artificial Intelligence

The 1st National workshop on Smart Farming and Data Analytics took place at Maynooth University in Ireland on June 12, 2019. The workshop included two invited keynote presentations, invited talks and breakout group discussions. The workshop attracted in the order of 50 participants, consisting of a mixture of computer scientists, general scientists, farmers, farm advisors, and agricultural business representatives. This allowed for lively discussion and cross-fertilization of ideas. And showed the significant interest in the smart farming domain, the many research challenges faced in the space and the potential for data analytics and information retrieval here.


Active Learning++: Incorporating Annotator's Rationale using Local Model Explanation

arXiv.org Artificial Intelligence

We propose a new active learning (AL) framework, Active Learning++, which can utilize an annotator's labels as well as its rationale. Annotators can provide their rationale for choosing a label by ranking input features based on their importance for a given query. To incorporate this additional input, we modified the disagreement measure for a bagging-based Query by Committee (QBC) sampling strategy. Instead of weighing all committee models equally to select the next instance, we assign higher weight to the committee model with higher agreement with the annotator's ranking. Specifically, we generated a feature importance-based local explanation for each committee model. The similarity score between feature rankings provided by the annotator and the local model explanation is used to assign a weight to each corresponding committee model. This approach is applicable to any kind of ML model using model-agnostic techniques to generate local explanation such as LIME. With a simulation study, we show that our framework significantly outperforms a QBC based vanilla AL framework.