AITopics | Query Processing

Collaborating Authors

Query Processing

News Overviews Instructional Materials AI-Alerts Classics

Mastering Presto: Hands-On Learning

#artificialintelligenceOct-14-2020, 23:18:59 GMT

Mastering Presto: Hands-On Learning Learn Presto - distributed SQL Query Engine for Big Data! Presto is an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes. Presto was designed and written from the ground up for interactive analytics and approaches the speed of commercial data warehouses while scaling to the size of organisations like Facebook. In the first part of the course I will talk about Presto's theory including Presto's architecture and components - coordinator, worker, connector, query execution model, etc. Additionally, I will explain to you how Kafka, Cassandra, Hive, PostgreSQL and Redshift work before I mention the specifics to their connectors.

artificial intelligence, natural language, presto, (8 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.41)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.84)
Information Technology > Communications > Social Media (0.51)

Add feedback

Query complexity of adversarial attacks

Głuch, Grzegorz, Urbanke, Rüdiger

arXiv.org Machine LearningOct-2-2020

The decision boundary of a learning algorithm applied to a given task can be viewed as the outcome of a random process: (i) generate a training set and, (ii) apply to it the, potentially randomized, learning algorithm. Recall, see Definitions 4 and 5, that a query-bounded adversary does not know the sample on which the model was trained nor the randomness used by the learner. This means that if the decision boundary has high entropy then the adversary needs to ask many questions to recover the boundary to a high degree of precision. This suggest that high-entropy decision boundaries are robust against query-bounded adversaries since intuitively it is clear that an approximate knowledge of the decision boundary is a prerequisite for a successful attack. Following this reasoning, we present two instances where high entropy of the decision boundary leads to security.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2010.01039

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.50)
Government > Military (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.41)

Add feedback

Revealing Secrets in SPARQL Session Level

Zhang, Xinyue, Wang, Meng, Saleem, Muhammad, Ngomo, Axel-Cyrille Ngonga, Qi, Guilin, Wang, Haofen

arXiv.org Artificial IntelligenceSep-13-2020

Based on Semantic Web technologies, knowledge graphs help users to discover information of interest by using live SPARQL services. Answer-seekers often examine intermediate results iteratively and modify SPARQL queries repeatedly in a search session. In this context, understanding user behaviors is critical for effective intention prediction and query optimization. However, these behaviors have not yet been researched systematically at the SPARQL session level. This paper reveals the secrets of session-level user search behaviors by conducting a comprehensive investigation over massive real-world SPARQL query logs. In particular, we thoroughly assess query changes made by users w.r.t. structural and data-driven features of SPARQL queries. To illustrate the potentiality of our findings, we employ a proof-of-concept model to predict user intentions, i.e., future directions of the given session, and give reformulation suggestions based on the predicted intention. We hope the results presented here will help to devise efficient SPARQL caching, auto-completion, query suggestion, approximation, and relaxation techniques in the future.

artificial intelligence, information retrieval query processing, natural language, (19 more...)

arXiv.org Artificial Intelligence

2009.06625

Country:

Europe > Germany > Saxony > Leipzig (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)

Add feedback

HyperBench: A Benchmark and Tool for Hypergraphs and Empirical Findings

Fischl, Wolfgang, Gottlob, Georg, Longo, Davide Mario, Pichler, Reinhard

arXiv.org Artificial IntelligenceSep-2-2020

To cope with the intractability of answering Conjunctive Queries (CQs) and solving Constraint Satisfaction Problems (CSPs), several notions of hypergraph decompositions have been proposed -- giving rise to different notions of width, noticeably, plain, generalized, and fractional hypertree width (hw, ghw, and fhw). Given the increasing interest in using such decomposition methods in practice, a publicly accessible repository of decomposition software, as well as a large set of benchmarks, and a web-accessible workbench for inserting, analyzing, and retrieving hypergraphs are called for. We address this need by providing (i) concrete implementations of hypergraph decompositions (including new practical algorithms), (ii) a new, comprehensive benchmark of hypergraphs stemming from disparate CQ and CSP collections, and (iii) HyperBench, our new web-inter\-face for accessing the benchmark and the results of our analyses. In addition, we describe a number of actual experiments we carried out with this new infrastructure.

artificial intelligence, hypergraph, natural language, (17 more...)

arXiv.org Artificial Intelligence

2009.01769

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(15 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Databases (1.00)
Information Technology > Communications (1.00)
(2 more...)

Add feedback

NeuralQA: A Usable Library for Question Answering (Contextual Query Expansion + BERT) on Large Datasets

Dibia, Victor

arXiv.org Artificial IntelligenceJul-29-2020

Existing tools for Question Answering (QA) have challenges that limit their use in practice. They can be complex to set up or integrate with existing infrastructure, do not offer configurable interactive interfaces, and do not cover the full set of subtasks that frequently comprise the QA pipeline (query expansion, retrieval, reading, and explanation/sensemaking). To help address these issues, we introduce NeuralQA - a usable library for QA on large datasets. NeuralQA integrates well with existing infrastructure (e.g., ElasticSearch instances and reader models trained with the HuggingFace Transformers API) and offers helpful defaults for QA subtasks. It introduces and implements contextual query expansion (CQE) using a masked language model (MLM) as well as relevant snippets (RelSnip) - a method for condensing large documents into smaller passages that can be speedily processed by a document reader model. Finally, it offers a flexible user interface to support workflows for research explorations (e.g., visualization of gradient-based explanations to support qualitative inspection of model behaviour) and large scale search deployment. Code and documentation for NeuralQA is available as open source on Github.

interface, neuralqa, representation, (15 more...)

arXiv.org Artificial Intelligence

2007.15211

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Query complexity of heavy hitter estimation

Sarmasarkar, Sahasrajit, Reddy, Kota Srinivas, Karamchandani, Nikhil

arXiv.org Machine LearningMay-29-2020

We consider the problem of identifying the subset $\mathcal{S}^{\gamma}_{\mathcal{P}}$ of elements in the support of an underlying distribution $\mathcal{P}$ whose probability value is larger than a given threshold $\gamma$, by actively querying an oracle to gain information about a sequence $X_1, X_2, \ldots$ of $i.i.d.$ samples drawn from $\mathcal{P}$. We consider two query models: $(a)$ each query is an index $i$ and the oracle return the value $X_i$ and $(b)$ each query is a pair $(i,j)$ and the oracle gives a binary answer confirming if $X_i = X_j$ or not. For each of these query models, we design sequential estimation algorithms which at each round, either decide what query to send to the oracle depending on the entire history of responses or decide to stop and output an estimate of $\mathcal{S}^{\gamma}_{\mathcal{P}}$, which is required to be correct with some pre-specified large probability. We provide upper bounds on the query complexity of the algorithms for any distribution $\mathcal{P}$ and also derive lower bounds on the optimal query complexity under the two query models. We also consider noisy versions of the two query models and propose robust estimators which can effectively counter the noise in the oracle responses.

artificial intelligence, information retrieval query processing, natural language, (18 more...)

arXiv.org Machine Learning

2005.14425

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.83)

Add feedback

Query Reformulation using Query History for Passage Retrieval in Conversational Search

Lin, Sheng-Chieh, Yang, Jheng-Hong, Nogueira, Rodrigo, Tsai, Ming-Feng, Wang, Chuan-Ju, Lin, Jimmy

arXiv.org Artificial IntelligenceMay-5-2020

Passage retrieval in a conversational context is essential for many downstream applications; it is however extremely challenging due to limited data resources. To address this problem, we present an effective multi-stage pipeline for passage ranking in conversational search that integrates a widely-used IR system with a conversational query reformulation module. Along these lines, we propose two simple yet effective query reformulation approaches: historical query expansion (HQE) and neural transfer reformulation (NTR). Whereas HQE applies query expansion, a traditional IR query reformulation technique, NTR transfers human knowledge of conversational query understanding to a neural query reformulation model. The proposed HQE method was the top-performing submission of automatic systems in CAsT Track at TREC 2019. Building on this, our NTR approach improves an additional 18% over that best entry in terms of NDCG@3. We further analyze the distinct behaviors of the two approaches, and show that fusing their output reduces the performance gap (measured in NDCG@3) between the manually-rewritten and automatically-generated queries to 4 from 22 points when compared with the best CAsT submission.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2005.0223

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > Canada (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Methodology for Creating Question Answering Corpora Using Inverse Data Annotation

Deriu, Jan, Mlynchyk, Katsiaryna, Schläpfer, Philippe, Rodrigo, Alvaro, von Grünigen, Dirk, Kaiser, Nicolas, Stockinger, Kurt, Agirre, Eneko, Cieliebak, Mark

arXiv.org Artificial IntelligenceApr-16-2020

In this paper, we introduce a novel methodology to efficiently construct a corpus for question answering over structured data. For this, we introduce an intermediate representation that is based on the logical query plan in a database called Operation Trees (OT). This representation allows us to invert the annotation process without losing flexibility in the types of queries that we generate. Furthermore, it allows for fine-grained alignment of query tokens to OT operations. In our method, we randomly generate OTs from a context-free grammar. Afterwards, annotators have to write the appropriate natural language question that is represented by the OT. Finally, the annotators assign the tokens to the OT operations. We apply the method to create a new corpus OTTA (Operation Trees and Token Assignment), a large semantic parsing corpus for evaluating natural language interfaces to databases. We compare OTTA to Spider and LC-QuaD 2.0 and show that our methodology more than triples the annotation speed while maintaining the complexity of the queries. Finally, we train a state-of-the-art semantic parsing model on our data and show that our corpus is a challenging dataset and that the token alignment can be leveraged to increase the performance significantly.

database, opération, query, (16 more...)

arXiv.org Artificial Intelligence

2004.07633

Country:

Europe > France (0.04)
South America > Argentina (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (1.00)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.49)

Add feedback

Complaint-driven Training Data Debugging for Query 2.0

Wu, Weiyuan, Flokas, Lampros, Wu, Eugene, Wang, Jiannan

arXiv.org Artificial IntelligenceApr-12-2020

As the need for machine learning (ML) increases rapidly across all industry sectors, there is a significant interest among commercial database providers to support "Query 2.0", which integrates model inference into SQL queries. Debugging Query 2.0 is very challenging since an unexpected query result may be caused by the bugs in training data (e.g., wrong labels, corrupted features). In response, we propose Rain, a complaint-driven training data debugging system. Rain allows users to specify complaints over the query's intermediate or final output, and aims to return a minimum set of training examples so that if they were removed, the complaints would be resolved. To the best of our knowledge, we are the first to study this problem. A naive solution requires retraining an exponential number of ML models. We propose two novel heuristic approaches based on influence functions which both require linear retraining steps. We provide an in-depth analytical and empirical analysis of the two approaches and conduct extensive experiments to evaluate their effectiveness using four real-world datasets. Results show that Rain achieves the highest recall@k among all the baselines while still returns results interactively.

query, query 2, training record, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3318464.3389696

2004.05722

Country:

North America > United States > California > San Francisco County > San Francisco (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.05)
(22 more...)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.88)

Add feedback

Answering Complex Queries in Knowledge Graphs with Bidirectional Sequence Encoders

Kotnis, Bhushan, Lawrence, Carolin, Niepert, Mathias

arXiv.org Artificial IntelligenceApr-6-2020

Representation learning for knowledge graphs (KGs) has focused on the problem of answering simple link prediction queries. In this work we address the more ambitious challenge of predicting the answers of conjunctive queries with multiple missing entities. We propose Bi-Directional Query Embedding (\textsc{BiQE}), a method that embeds conjunctive queries with models based on bi-directional attention mechanisms. Contrary to prior work, bidirectional self-attention can capture interactions among all the elements of a query graph. We introduce a new dataset for predicting the answer of conjunctive query and conduct experiments that show \textsc{BiQE} significantly outperforming state of the art baselines.

dataset, graph, query, (15 more...)

arXiv.org Artificial Intelligence

2004.02596

Country: Europe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.65)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.36)

Add feedback