AITopics | Query Processing

Collaborating Authors

Query Processing

News Overviews Instructional Materials AI-Alerts Classics

Query Complexity of k-NN based Mode Estimation

Singhal, Anirudh, Pirojiwala, Subham, Karamchandani, Nikhil

arXiv.org Machine LearningOct-26-2020

Motivated by the mode estimation problem of an unknown multivariate probability density function, we study the problem of identifying the point with the minimum k-th nearest neighbor distance for a given dataset of n points. We study the case where the pairwise distances are apriori unknown, but we have access to an oracle which we can query to get noisy information about the distance between any pair of points. For two natural oracle models, we design a sequential learning algorithm, based on the idea of confidence intervals, which adaptively decides which queries to send to the oracle and is able to correctly solve the problem with high probability. We derive instance-dependent upper bounds on the query complexity of our proposed scheme and also demonstrate significant improvement over the performance of other baselines via extensive numerical evaluations.

artificial intelligence, information retrieval query processing, query, (21 more...)

arXiv.org Machine Learning

2010.13491

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.61)

Add feedback

Knowledge Graph-based Question Answering with Electronic Health Records

Park, Junwoo, Cho, Youngwoo, Lee, Haneol, Choo, Jaegul, Choi, Edward

arXiv.org Artificial IntelligenceOct-19-2020

Question Answering (QA) on Electronic Health Records (EHR), namely EHR QA, can work as a crucial milestone towards developing an intelligent agent in healthcare. EHR data are typically stored in a relational database, which can also be converted to a Directed Acyclic Graph (DAG), allowing two approaches for EHR QA: Table-based QA and Knowledge Graph-based QA. We hypothesize that the graph-based approach is more suitable for EHR QA as graphs can represent relations between entities and values more naturally compared to tables, which essentially require JOIN operations. To validate our hypothesis, we first construct EHR QA datasets based on MIMIC-III, where the same question-answer pairs are represented in SQL (table-based) and SPARQL (graph-based), respectively. We then test a state-of-the-art EHR QA model on both datasets where the model demonstrated superior QA performance on the SPARQL version. Finally, we open-source both MIMICSQL* and MIMIC-SPARQL* to encourage further EHR QA research in both direction

deep learning, mimicsql, neural network, (19 more...)

arXiv.org Artificial Intelligence

2010.09394

Country: Asia > South Korea (0.29)

Genre: Research Report (0.40)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Mastering Presto: Hands-On Learning

#artificialintelligenceOct-14-2020, 23:18:59 GMT

Mastering Presto: Hands-On Learning Learn Presto - distributed SQL Query Engine for Big Data! Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organisations like Facebook. In the first part of the course I will talk about Presto's theory including Presto's architecture and components - coordinator, worker, connector, query execution model, etc. Additionally, I will explain to you how Kafka, Cassandra, Hive, PostgreSQL and Redshift work before I mention the specifics to their connectors.

artificial intelligence, information retrieval query processing, presto, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.41)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.84)
Information Technology > Communications > Social Media (0.51)

Add feedback

Query complexity of adversarial attacks

Głuch, Grzegorz, Urbanke, Rüdiger

arXiv.org Machine LearningOct-2-2020

The decision boundary of a learning algorithm applied to a given task can be viewed as the outcome of a random process: (i) generate a training set and, (ii) apply to it the, potentially randomized, learning algorithm. Recall, see Definitions 4 and 5, that a query-bounded adversary does not know the sample on which the model was trained nor the randomness used by the learner. This means that if the decision boundary has high entropy then the adversary needs to ask many questions to recover the boundary to a high degree of precision. This suggest that high-entropy decision boundaries are robust against query-bounded adversaries since intuitively it is clear that an approximate knowledge of the decision boundary is a prerequisite for a successful attack. Following this reasoning, we present two instances where high entropy of the decision boundary leads to security.

adversary, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

2010.01039

Country: North America > United States > California (0.46)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.50)
Government > Military (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.41)

Add feedback

Revealing Secrets in SPARQL Session Level

Zhang, Xinyue, Wang, Meng, Saleem, Muhammad, Ngomo, Axel-Cyrille Ngonga, Qi, Guilin, Wang, Haofen

arXiv.org Artificial IntelligenceSep-13-2020

Based on Semantic Web technologies, knowledge graphs help users to discover information of interest by using live SPARQL services. Answer-seekers often examine intermediate results iteratively and modify SPARQL queries repeatedly in a search session. In this context, understanding user behaviors is critical for effective intention prediction and query optimization. However, these behaviors have not yet been researched systematically at the SPARQL session level. This paper reveals the secrets of session-level user search behaviors by conducting a comprehensive investigation over massive real-world SPARQL query logs. In particular, we thoroughly assess query changes made by users w.r.t. structural and data-driven features of SPARQL queries. To illustrate the potentiality of our findings, we employ a proof-of-concept model to predict user intentions, i.e., future directions of the given session, and give reformulation suggestions based on the predicted intention. We hope the results presented here will help to devise efficient SPARQL caching, auto-completion, query suggestion, approximation, and relaxation techniques in the future.

information retrieval query processing, query, semantic web, (19 more...)

arXiv.org Artificial Intelligence

2009.06625

Country:

Asia > China (0.28)
Europe > Germany (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)

Add feedback

HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings

Fischl, Wolfgang, Gottlob, Georg, Longo, Davide Mario, Pichler, Reinhard

arXiv.org Artificial IntelligenceSep-2-2020

To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a publicly accessible repository of decomposition software, as well as a large set of benchmarks, and a web-accessible workbench for inserting, analyzing, and retrieving hypergraphs are called for. We address this need by providing (i) concrete implementations of hypergraph decompositions (including new practical algorithms), (ii) a new, comprehensive benchmark of hypergraphs stemming from disparate CQ and CSP collections, and (iii) HyperBench, our new web-inter\-face for accessing the benchmark and the results of our analyses. In addition, we describe a number of actual experiments we carried out with this new infrastructure.

constraint-based reasoning, hypergraph, information retrieval query processing, (18 more...)

arXiv.org Artificial Intelligence

2009.01769

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Databases (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

Hasura introduces a GraphQL-based data virtualization cloud

ZDNetAug-3-2020, 12:00:45 GMT

GraphQL, which has nothing to do with graph databases but has everything to do with JSON query, was a technology originated at Facebook to simplify getting access to data. Now, Hasura, a two-year old company, is building a data virtualization tool around it and last week introduced a cloud service. At the core of Hasura's offerings is its own implementation of a GraphQL query engine that it has open sourced. Currently designed to work against PostgreSQL-compatible databases, the engine scans the metadata of specified tables on the target, builds a GraphQL endpoint, represents the data in JSON format, and automatically generates a selection of possible queries. Atop its GraphQL query engine, the original Hasura enterprise on-premises offering includes adapters for PostgreSQL databases; other relational targets are planned in the future.

artificial intelligence, hasura, information retrieval query processing, (8 more...)

ZDNet

Industry: Information Technology > Services (0.57)

Technology:

Information Technology > Databases (0.94)
Information Technology > Communications > Social Media (0.79)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.75)

Add feedback

NeuralQA: A Usable Library for Question Answering (Contextual Query Expansion + BERT) on Large Datasets

Dibia, Victor

arXiv.org Artificial IntelligenceJul-29-2020

Existing tools for Question Answering (QA) have challenges that limit their use in practice. They can be complex to set up or integrate with existing infrastructure, do not offer configurable interactive interfaces, and do not cover the full set of subtasks that frequently comprise the QA pipeline (query expansion, retrieval, reading, and explanation/sensemaking). To help address these issues, we introduce NeuralQA - a usable library for QA on large datasets. NeuralQA integrates well with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks. It introduces and implements contextual query expansion (CQE) using a masked language model (MLM) as well as relevant snippets (RelSnip) - a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github.

neural network, representation, survey article, (17 more...)

arXiv.org Artificial Intelligence

2007.15211

Country:

North America > United States (0.14)
Europe > Italy (0.14)
Asia > China (0.14)
Europe > Belgium (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Query complexity of heavy hitter estimation

Sarmasarkar, Sahasrajit, Reddy, Kota Srinivas, Karamchandani, Nikhil

arXiv.org Machine LearningMay-29-2020

We consider the problem of identifying the subset $\mathcal{S}^{\gamma}_{\mathcal{P}}$ of elements in the support of an underlying distribution $\mathcal{P}$ whose probability value is larger than a given threshold $\gamma$, by actively querying an oracle to gain information about a sequence $X_1, X_2, \ldots$ of $i.i.d.$ samples drawn from $\mathcal{P}$. We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_i$ and $(b)$ each query is a pair $(i,j)$ and the oracle gives a binary answer confirming if $X_i = X_j$ or not. For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire history of responses or decide to stop and output an estimate of $\mathcal{S}^{\gamma}_{\mathcal{P}}$, which is required to be correct with some pre-specified large probability. We provide upper bounds on the query complexity of the algorithms for any distribution $\mathcal{P}$ and also derive lower bounds on the optimal query complexity under the two query models. We also consider noisy versions of the two query models and propose robust estimators which can effectively counter the noise in the oracle responses.

artificial intelligence, information retrieval query processing, probability, (17 more...)

arXiv.org Machine Learning

2005.14425

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.83)

Add feedback

Query Reformulation using Query History for Passage Retrieval in Conversational Search

Lin, Sheng-Chieh, Yang, Jheng-Hong, Nogueira, Rodrigo, Tsai, Ming-Feng, Wang, Chuan-Ju, Lin, Jimmy

arXiv.org Artificial IntelligenceMay-5-2020

Passage retrieval in a conversational context is essential for many downstream applications; it is however extremely challenging due to limited data resources. To address this problem, we present an effective multi-stage pipeline for passage ranking in conversational search that integrates a widely-used IR system with a conversational query reformulation module. Along these lines, we propose two simple yet effective query reformulation approaches: historical query expansion (HQE) and neural transfer reformulation (NTR). Whereas HQE applies query expansion, a traditional IR query reformulation technique, NTR transfers human knowledge of conversational query understanding to a neural query reformulation model. The proposed HQE method was the top-performing submission of automatic systems in CAsT Track at TREC 2019. Building on this, our NTR approach improves an additional 18% over that best entry in terms of NDCG@3. We further analyze the distinct behaviors of the two approaches, and show that fusing their output reduces the performance gap (measured in NDCG@3) between the manually-rewritten and automatically-generated queries to 4 from 22 points when compared with the best CAsT submission.

deep learning, neural network, query, (22 more...)

arXiv.org Artificial Intelligence

2005.0223

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback