Goto

Collaborating Authors

Information Retrieval: Overviews


Optimization of Retrieval Algorithms on Large Scale Knowledge Graphs

arXiv.org Artificial Intelligence

Knowledge graphs have been shown to play an important role in recent knowledge mining and discovery, for example in the field of life sciences or bioinformatics. Although a lot of research has been done on the field of query optimization, query transformation and of course in storing and retrieving large scale knowledge graphs the field of algorithmic optimization is still a major challenge and a vital factor in using graph databases. Few researchers have addressed the problem of optimizing algorithms on large scale labeled property graphs. Here, we present two optimization approaches and compare them with a naive approach of directly querying the graph database. The aim of our work is to determine limiting factors of graph databases like Neo4j and we describe a novel solution to tackle these challenges. For this, we suggest a classification schema to differ between the complexity of a problem on a graph database. We evaluate our optimization approaches on a test system containing a knowledge graph derived biomedical publication data enriched with text mining data. This dense graph has more than 71M nodes and 850M relationships. The results are very encouraging and - depending on the problem - we were able to show a speedup of a factor between 44 and 3839.


Conversational Search for Learning Technologies

arXiv.org Artificial Intelligence

Arguably, the most important scenario for search technology is lifelong learning and education, both for students and all citizens. Human learning is a complex multidimensional activity, which includes procedural learning (e.g., activity patterns associated with cooking, sports) and knowledge-based learning (e.g., mathematics, genetics). It also includes different levels of learning, such as the ability to solve an individual math problem correctly. It also includes the development of meta-cognitive self-regulatory abilities, such as recognizing the type of problem being solved and whether one is in an error state. These latter types of awareness enable correctly regulating ones approach to solving a problem, and recognizing when one is off track by repairing momentary errors as needed. Later stages of learning enable the generalization of learned skills or information from one context or domain to others such as applying math problem solving to calculations in the wild (e.g., calculation of garden space, engineering calculations required for a structurally sound building).


Large expert-curated database for benchmarking document similarity detection in biomedical literature search

#artificialintelligence

Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations.


Temporarily Unavailable: Memory Inhibition in Cognitive and Computer Science

arXiv.org Artificial Intelligence

Inhibition can take place at the level of neurotransmitters in the synaptic cleft, neurons can inhibit each other's fire rate, it can be s h own at a physiological level - for instance by measuring the EEG, and finally it can be investigated on a purely behavioral level. Behavioral inhibition typically means something like'making a content/action less accessible or suppressing it altogether' in order to enhance processing of relevant information . In cognition, thus, the concept of inhibition implies cognitive mechanisms that actively lower currently irrelevant or inter fering information. Psychological theories that posit the existence of inhibitory mechanisms in our mind have elicited much research across diverse fields of C ognitive P sychology like perception, attention, action control, and memory but have also been tra nsferred to other research fields like D evelopmental P sychology as, fo r instance, understanding the aging brain or the developing brain is closely linked to understanding how the brain handles irrelevant or interfering information - that is how or whether the brain can inhibit such information. The two areas in Cognitive Psychology in which inhibition is traditionally investigated to the largest extent are the research fields of attention and memory. In attention research, typically the interference due to distracting stimuli or actions is analyzed in experimental paradigms that try to tap a specific form of cognitive inhibition. For example, in the Negative Priming task (for a review, Frings, Schneider, & Fox, 2015) it is typically analyzed how an irrelevant distractor stimulus is inhibited. In the cuing task that elicits the inhibition of return effect (Posner, Choate, Rafal, & Vaughn, 1985) it is typically analyzed how an irrelevant location is inhibited. In task switchin g (Kiesel et al., 2010) lowering competition by a just previously performed task while currently executing a novel task is achieved by inhibiting that previous task.


Query Complexity of Bayesian Private Learning

arXiv.org Machine Learning

We study the query complexity of Bayesian Private Learning: a learner wishes to locate a random target within an interval by submitting queries, in the presence of an adversary who observes all of her queries but not the responses. How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary? Our main result is a query complexity lower bound that is tight up to the first order. We show that if the learner wants to estimate the target within an error of $\varepsilon$, while ensuring that no adversary estimator can achieve a constant additive error with probability greater than $1/L$, then the query complexity is on the order of $L\log(1/\varepsilon)$, as $\varepsilon \to 0$. Our result demonstrates that increased privacy, as captured by $L$, comes at the expense of a {multiplicative} increase in query complexity. Our proof method builds on Fano's inequality and a family of proportional-sampling estimators. As an illustration of the method's wider applicability, we generalize the complexity lower bound to settings involving high-dimensional linear query learning and partial adversary observation.


Content-Based Features to Rank Influential Hidden Services of the Tor Darknet

arXiv.org Machine Learning

The unevenness importance of criminal activities in the onion domains of the Tor Darknet and the different levels of their appeal to the end-user make them tangled to measure their influence. To this end, this paper presents a novel content-based ranking framework to detect the most influential onion domains. Our approach comprises a modeling unit that represents an onion domain using forty features extracted from five different resources: user-visible text, HTML markup, Named Entities, network topology, and visual content. And also, a ranking unit that, using the Learning-to-Rank (LtR) approach, automatically learns a ranking function by integrating the previously obtained features. Using a case-study based on drugs-related onion domains, we obtained the following results. (1) Among the explored LtR schemes, the listwise approach outperforms the benchmarked methods with an NDCG of 0.95 for the top-10 ranked domains. (2) We proved quantitatively that our framework surpasses the link-based ranking techniques. Also, (3) with the selected feature, we observed that the textual content, composed by text, NER, and HTML features, is the most balanced approach, in terms of efficiency and score obtained. The proposed framework might support Law Enforcement Agencies in detecting the most influential domains related to possible suspicious activities.


Identifying Supporting Facts for Multi-hop Question Answering with Document Graph Networks

arXiv.org Artificial Intelligence

Recent advances in reading comprehension have resulted in models that surpass human performance when the answer is contained in a single, continuous passage of text. However, complex Question Answering (QA) typically requires multi-hop reasoning - i.e. the integration of supporting facts from different sources, to infer the correct answer. This paper proposes Document Graph Network (DGN), a message passing architecture for the identification of supporting facts over a graph-structured representation of text. The evaluation on HotpotQA shows that DGN obtains competitive results when compared to a reading comprehension baseline operating on raw text, confirming the relevance of structured representations for supporting multi-hop reasoning.


Dependency-based Text Graphs for Keyphrase and Summary Extraction with Applications to Interactive Content Retrieval

arXiv.org Artificial Intelligence

We build a bridge between neural network-based machine learning and graph-based natural language processing and introduce a unified approach to keyphrase, summary and relation extraction by aggregating dependency graphs from links provided by a deep-learning based dependency parser. We reorganize dependency graphs to focus on the most relevant content elements of a sentence, integrate sentence identifiers as graph nodes and after ranking the graph, we extract our keyphrases and summaries from its largest strongly-connected component. We take advantage of the implicit structural information that dependency links bring to extract subject-verb-object, is-a and part-of relations. We put it all together into a proof-of-concept dialog engine that specializes the text graph with respect to a query and reveals interactively the document's most relevant content elements. The open-source code of the integrated system is available at https:// github.com/ptarau/DeepRank .


A review on ranking problems in statistical learning

arXiv.org Machine Learning

Search-engines like Google provide a list of websites that are suitable for the user's query in the sense that the first websites that are displayed are expected to be the most relevant ones. Mathematically spoken, the search-engine has to solve a ranking problem which is done by the PageRank algorithm (Page et al. [1999]) for Google. In their seminal paper (Clémençon et al. [2008]), Clémençon and coauthors proposed a statistical framework for ranking problems and proved that the common approach of empirical risk minimization is indeed suitable for ranking problems. Although there already existed ranking techniques, most of them indeed follow the ERM principle and can directly be embedded into the framework of Clémençon et al. [2008].


Real-world Conversational AI for Hotel Bookings

arXiv.org Machine Learning

Hussein Fazal SnapTravel Toronto, Canada hussein@snaptravel.com Abstract --In this paper, we present a real-world conversational AI system to search for and book hotels through text messaging. Our architecture consists of a frame-based dialogue management system, which calls machine learning models for intent classification, named entity recognition, and information retrieval subtasks. Our chatbot has been deployed on a commercial scale, handling tens of thousands of hotel searches every day. We describe the various opportunities and challenges of developing a chatbot in the travel industry. Index T erms--conversational AI, task-oriented chatbot, named entity recognition, information retrieval I. I NTRODUCTION Task-oriented chatbots have recently been applied to many areas in e-commerce.