AITopics | Query Processing

Collaborating Authors

Query Processing

News Overviews Instructional Materials AI-Alerts Classics

Hadoop vs Teradata - PHP Hadoop Articles

#artificialintelligenceJul-14-2016, 10:16:14 GMT

Hadoop, therefore, doesn't have what it requires to be considered a data warehouse. Obviously, Hadoop vendors will probably be working more difficult to improve security of information access, restrict permissions, and address a broader array of data protection issues. The two major goals of the initiative should happen to increase performance and provide a rich series of SQL features like analytic functions, query optimization, and standard data types including timestamp etc.. An increasing community of Hadoop vendors provide a byzantine selection of solutions. The opportunity would be to monetise huge levels of data using tools which weren't previously offered.

data mining, hadoop vs teradata, natural language, (5 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.61)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.41)

Add feedback

Microsoft announces general availability of Azure SQL Data Warehouse - MSPoweruser

#artificialintelligenceJul-14-2016, 01:35:34 GMT

Microsoft today announced the general availability of the Azure SQL Data Warehouse, an elastic data warehouse as a service with enterprise-class features. It is a fully managed DW as a Service that you can provision in minutes and scale up to 60 times larger in seconds. With Azure SQL Data Warehouse, storage and compute scale independently. You can dynamically deploy, grow, shrink, and even pause compute, taking advantage of best-in-class price/performance. Also, SQL Data Warehouse uses the power and familiarity of T-SQL to let you easily integrate query results across relational data in your data warehouse and non-relational data in Azure blob storage.

azure sql data warehouse, data mining, natural language, (9 more...)

#artificialintelligence

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.41)
Information Technology > Data Science > Data Mining (0.39)

Add feedback

Dissociation and Propagation for Approximate Lifted Inference with Standard Relational Database Management Systems

Gatterbauer, Wolfgang, Suciu, Dan

arXiv.org Artificial IntelligenceJun-14-2016

Probabilistic inference over large data sets is a challenging data management problem since exact inference is generally #P-hard and is most often solved approximately with sampling-based methods today. This paper proposes an alternative approach for approximate evaluation of conjunctive queries with standard relational databases: In our approach, every query is evaluated entirely in the database engine by evaluating a fixed number of query plans, each providing an upper bound on the true probability, then taking their minimum. We provide an algorithm that takes into account important schema information to enumerate only the minimal necessary plans among all possible plans. Importantly, this algorithm is a strict generalization of all known PTIME self-join-free conjunctive queries: A query is in PTIME if and only if our algorithm returns one single plan. Furthermore, our approach is a generalization of a family of efficient ranking methods from graphs to hypergraphs. We also adapt three relational query optimization techniques to evaluate all necessary plans very fast. We give a detailed experimental evaluation of our approach and, in the process, provide a new way of thinking about the value of probabilistic methods over non-probabilistic methods for ranking query answers. We also note that the techniques developed in this paper apply immediately to lifted inference from statistical relational models since lifted inference corresponds to PTIME plans in probabilistic databases.

artificial intelligence, dissociation, natural language, (17 more...)

arXiv.org Artificial Intelligence

1310.6257

Country: North America > United States (0.92)

Genre: Research Report (0.63)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

Add feedback

Drill Data with Apache Drill

@machinelearnbotMay-8-2016, 17:40:04 GMT

Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Inspired by Google's Dremel, Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require. Apache Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Apache Drill is the "Drillbit" service which is responsible for accepting requests from the client, processing the queries, and returning results to the client. When a Drillbit runs on each data node in a cluster, Drill can maximize data locality during query execution without moving data over the network or between nodes.

artificial intelligence, information retrieval query processing, natural language, (12 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.63)

Add feedback

Approximations and Refinements of Certain Answers via Many-Valued Logics

Console, Marco (Sapienza Università di Roma) | Guagliardo, Paolo (University of Edinburgh) | Libkin, Leonid (University of Edinburgh)

AAAI ConferencesApr-19-2016

Computing certain answers is the preferred way of answering queries in scenarios involving incomplete data. This, however, is computationally expensive, so practical systems use efficient techniques based on a particular three-valued logic, even though this often leads to incorrect results. Our goal is to provide a general many-valued framework for correctly approximating certain answers. We do so by defining the semantics of many-valued answers and queries, following the principle that additional knowledge about the input must translate into additional knowledge about the output. This framework lets us compare query outputs and evaluation procedures in terms of their informativeness. For each many-valued logic with a knowledge ordering on its truth values, one can build a syntactic evaluation procedure for all first-order queries, that correctly approximates certain answers; additional truth values are used to refine information about certain answers. For concrete examples, we show that a recently proposed approach fixing some of the inconsistencies of SQL query evaluation is an immediate consequence of our framework, and we further refine it by adding a fourth truth value. We show that no evaluation procedure based on Boolean logic delivers correctness guarantees. Finally, we study the relative power of evaluation procedures based on the informativeness of the answers they produce.

database, evaluation procedure, information, (14 more...)

AAAI Conferences

Fifteenth International Conference on the Principles of Knowledge Representation and Reasoning

Genre: Research Report (0.46)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Factorized Databases: A Knowledge Compilation Perspective

Olteanu, Dan (University of Oxford)

AAAI ConferencesApr-12-2016

This paper overviews recent work on compilation of relational queries into lossless factorized representations. The primary motivation for this compilation is to avoid redundancy in the representation of query results and speed up their computation and subsequent analytics.

artificial intelligence, information retrieval query processing, natural language, (19 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.51)

Add feedback

700 SQL Queries per Second in Apache Spark with FiloDB

#artificialintelligenceApr-10-2016, 18:44:20 GMT

Apache Spark is increasingly thought of as the new jack-of-all-trades distributed platform for big data crunching – what with everything from traditional MapReduce-like workloads, streaming, graph computation, statistics, and machine learning all in one package. Except for Spark Streaming, with its micro-batches, Spark is focused for the most part on higher-latency, rich/complex analytics workloads. What about using Spark as an embedded, web-speed / low-latency query engine? This post will dive into using Apache Spark for low-latency, higher concurrency reporting / dashboard / SQL-like applications - up to hundreds of queries a second! Launching Spark applications on a cluster, or even on localhost, has a pretty high overhead.

artificial intelligence, machine learning, natural language, (16 more...)

#artificialintelligence

Technology:

Information Technology > Databases (0.85)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)
Information Technology > Artificial Intelligence > Machine Learning (0.35)

Add feedback

Knowledge Representation in Probabilistic Spatio-Temporal Knowledge Bases

Parisi, Francesco, Grant, John

Journal of Artificial Intelligence ResearchMar-28-2016

We represent knowledge as integrity constraints in a formalization of probabilistic spatio-temporal knowledge bases. We start by defining the syntax and semantics of a formalization called PST knowledge bases. This definition generalizes an earlier version, called SPOT, which is a declarative framework for the representation and processing of probabilistic spatio-temporal data where probability is represented as an interval because the exact value is unknown. We augment the previous definition by adding a type of non-atomic formula that expresses integrity constraints. The result is a highly expressive formalism for knowledge representation dealing with probabilistic spatio-temporal data. We obtain complexity results both for checking the consistency of PST knowledge bases and for answering queries in PST knowledge bases, and also specify tractable cases. All the domains in the PST framework are finite, but we extend our results also to arbitrarily large finite domains.

loc, probability, pst kb, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4883

AI Access Foundation

10992

Journal of Artificial Intelligence Research

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
North America > United States > Massachusetts > Norfolk County > Norwood (0.04)
(4 more...)

Genre: Research Report (0.48)

Industry: Transportation > Infrastructure & Services (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
(4 more...)

Add feedback

Utilisation of Metadata Fields and Query Expansion in Cross-Lingual Search of User-Generated Internet Video

Khwileh, Ahmad, Ganguly, Debasis, J. F. Jones, Gareth

Journal of Artificial Intelligence ResearchJan-27-2016

Recent years have seen significant efforts in the area of Cross Language Information Retrieval (CLIR) for text retrieval. This work initially focused on formally published content, but more recently research has begun to concentrate on CLIR for informal social media content. However, despite the current expansion in online multimedia archives, there has been little work on CLIR for this content. While there has been some limited work on Cross-Language Video Retrieval (CLVR) for professional videos, such as documentaries or TV news broadcasts, there has to date, been no significant investigation of CLVR for the rapidly growing archives of informal user generated (UGC) content. Key differences between such UGC and professionally produced content are the nature and structure of the textual UGC metadata associated with it, as well as the form and quality of the content itself. In this setting, retrieval effectiveness may not only suffer from translation errors common to all CLIR tasks, but also recognition errors associated with the automatic speech recognition (ASR) systems used to transcribe the spoken content of the video and with the informality and inconsistency of the associated user-created metadata for each video. This work proposes and evaluates techniques to improve CLIR effectiveness of such noisy UGC content. Our experimental investigation shows that different sources of evidence, e.g. the content from different fields of the structured metadata, significantly affect CLIR effectiveness. Results from our experiments also show that each metadata field has a varying robustness to query expansion (QE) and hence can have a negative impact on the CLIR effectiveness. Our work proposes a novel adaptive QE technique that predicts the most reliable source for expansion and shows how this technique can be effective for improving the CLIR effectiveness for UGC content.

effectiveness, query, retrieval, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4775

AI Access Foundation

10979

Journal of Artificial Intelligence Research

Country:

North America > United States > Maryland (0.04)
Europe > Ireland (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law > Statutes (0.92)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.71)

Add feedback

An Active Learning Framework using Sparse-Graph Codes for Sparse Polynomials and Graph Sketching

Li, Xiao, Ramchandran, Kannan

Neural Information Processing SystemsDec-31-2015

Let $f: \{-1,1\}^n \rightarrow \mathbb{R}$ be an $n$-variate polynomial consisting of $2^n$ monomials, in which only $s\ll 2^n$ coefficients are non-zero. The goal is to learn the polynomial by querying the values of $f$. We introduce an active learning framework that is associated with a low query cost and computational runtime. The significant savings are enabled by leveraging sampling strategies based on modern coding theory, specifically, the design and analysis of {\it sparse-graph codes}, such as Low-Density-Parity-Check (LDPC) codes, which represent the state-of-the-art of modern packet communications. More significantly, we show how this design perspective leads to exciting, and to the best of our knowledge, largely unexplored intellectual connections between learning and coding. The key is to relax the worst-case assumption with an ensemble-average setting, where the polynomial is assumed to be drawn uniformly at random from the ensemble of all polynomials (of a given size $n$ and sparsity $s$). Our framework succeeds with high probability with respect to the polynomial ensemble with sparsity up to $s={O}(2^{\delta n})$ for any $\delta\in(0,1)$, where $f$ is exactly learned using ${O}(ns)$ queries in time ${O}(n s \log s)$, even if the queries are perturbed by Gaussian noise. We further apply the proposed framework to graph sketching, which is the problem of inferring sparse graphs by querying graph cuts. By writing the cut function as a polynomial and exploiting the graph structure, we propose a sketching algorithm to learn the an arbitrary $n$-node unknown graph using only few cut queries, which scales {\it almost linearly} in the number of edges and {\it sub-linearly} in the graph size $n$. Experiments on real datasets show significant reductions in the runtime and query complexity compared with competitive schemes.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback