Query Processing
Learning Concept Embeddings for Query Expansion by Quantum Entropy Minimization
Sordoni, Alessandro (Université de Montréal) | Bengio, Yoshua (Université de Montréal) | Nie, Jian-Yun (Université de Montréal)
In web search, users queries are formulated using only few terms and term-matching retrieval functions could fail at retrieving relevant documents. Given a user query, the technique of query expansion (QE) consists in selecting related terms that could enhance the likelihood of retrieving relevant documents. Selecting such expansion terms is challenging and requires a computational framework capable of encoding complex semantic relationships. In this paper, we propose a novel method for learning, in a supervised way, semantic representations for words and phrases. By embedding queries and documents in special matrices, our model disposes of an increased representational power with respect to existing approaches adopting a vector representation. We show that our model produces high-quality query expansion terms. Our expansion increase IR measures beyond expansion from current word-embeddings models and well-established traditional QE methods.
Querying Geometric Figures Using a Controlled Language, Ontological Graphs and Dependency Lattices
Haralambous, Yannis, Quaresma, Pedro
Dynamic geometry systems (DGS) have become basic tools in many areas of geometry as, for example, in education. Geometry Automated Theorem Provers (GATP) are an active area of research and are considered as being basic tools in future enhanced educational software as well as in a next generation of mechanized mathematics assistants. Recently emerged Web repositories of geometric knowledge, like TGTP and Intergeo, are an attempt to make the already vast data set of geometric knowledge widely available. Considering the large amount of geometric information already available, we face the need of a query mechanism for descriptions of geometric constructions. In this paper we discuss two approaches for describing geometric figures (declarative and procedural), and present algorithms for querying geometric figures in declaratively and procedurally described corpora, by using a DGS or a dedicated controlled natural language for queries.
SemMemDB: In-Database Knowledge Activation
Chen, Yang (University of Florida) | Petrovic, Milenko (Florida Institute for Human and Machine Cognition) | Clark, Micah (Florida Institute for Human and Machine Cognition)
Semantic networks are a popular way of simulating human memory in ACT-R-like cognitive architectures. However, existing implementations fall short in their ability to efficiently work with very large networks required for full-scale simulations of human memories. In this paper, we present SemMemDB, an in-database realization of semantic networks and spreading activation. We describe a relational representation for semantic networks and an efficient SQL-based spreading activation algorithm. We provide a simple interface for users to invoke retrieval queries. The key benefits of our approach are: (1) Databases have mature query engines and optimizers that generate efficient query plans for memory activation and retrieval; (2) Databases can provide massive storage capacity to potentially support human-scale memories; (3) Spreading activation is implemented in SQL, a widely-used query language for big data analytics. We evaluate SemMemDB in a comprehensive experimental study using DBPedia, a web-scale ontology constructed from the Wikipedia corpus. The results show that our system runs over 500 times faster than previous works.
Natural Language Access to Enterprise Data
Waltinger, Ulli (Siemens AG) | Tecuci, Dan (Siemens Corporation) | Olteanu, Mihaela (Siemens AG) | Mocanu, Vlad (Siemens AG) | Sullivan, Sean (Siemens Energy Inc.)
This paper describes USI Answers — a natural language question answering system for enterprise data. We report on the progress towards the goal of offering easy access to enterprise data to a large number of business users, most of whom are not familiar with the specific syntax or semantics of the underlying data sources. Additional complications come from the nature of the data, which comes both as structured and unstructured. The proposed solution allows users to express questions in natural language, makes apparent the system's interpretation of the query, and allows easy query adjustment and reformulation. The application is in use by more than 1500 users from Siemens Energy. We evaluate our approach on a data set consisting of fleet data.
Data-Parallel Computing Meets STRIPS
Karpas, Erez (Technion - Israel Institute of Technology) | Sagi, Tomer (Technion - Israel Institute of Technology) | Domshlak, Carmel (Technion - Israel Institute of Technology) | Gal, Avigdor ( Technion - Israel Institute of Technology ) | Mendelson, Avi (Technion - Israel Institute of Technology ) | Tennenholtz, Moshe (Microsoft Research and Technion)
The increased demand for distributed computations on “big data” has led to solutions such as SCOPE, DryadLINQ, Pig, and Hive, which allow the user to specify queries in an SQL-like language, enriched with sets of user-defined operators. The lack of exact semantics for user-defined operators interferes with the query optimization process, thus putting the burden of suggesting, at least partial, query plans on the user. In an attempt to ease this burden, we propose a formal model that allows for data-parallel program synthesis (DPPS) in a semantically well-defined manner. We show that this model generalizes existing frameworks for data-parallel computation, while providing the flexibility of query plan generation that is currently absent from these frameworks. In particular, we show how existing, off-the-shelf, AI planning tools can be used for solving DPPS tasks.
SHARE: A Web Service Based Framework for Distributed Querying and Reasoning on the Semantic Web
Vandervalk, Ben P, McCarthy, E Luke, Wilkinson, Mark D
Here we describe the SHARE system, a web service based framework for distributed querying and reasoning on the semantic web. The main innovations of SHARE are: (1) the extension of a SPARQL query engine to perform on-demand data retrieval from web services, and (2) the extension of an OWL reasoner to test property restrictions by means of web service invocations. In addition to enabling queries across distributed datasets, the system allows for a target dataset that is significantly larger than is possible under current, centralized approaches. Although the architecture is equally applicable to all types of data, the SHARE system targets bioinformatics, due to the large number of interoperable web services that are already available in this area. SHARE is built entirely on semantic web standards, and is the successor of the BioMOBY project.
Query Expansion in Information Retrieval Systems using a Bayesian Network-Based Thesaurus
de Campos, Luis M., Fernandez-Luna, Juan M., Huete, Juan F.
Information Retrieval (IR) is concerned with the identification of documents in a collection that are relevant to a given information need, usually represented as a query containing terms or keywords, which are supposed to be a good description of what the user is looking for. IR systems may improve their effectiveness (i.e., increasing the number of relevant documents retrieved) by using a process of query expansion, which automatically adds new terms to the original query posed by an user. In this paper we develop a method of query expansion based on Bayesian networks. Using a learning algorithm, we construct a Bayesian network that represents some of the relationships among the terms appearing in a given document collection; this network is then used as a thesaurus (specific for that collection). We also report the results obtained by our method on three standard test collections.
Query Complexity of Derivative-Free Optimization
Jamieson, Kevin G., Nowak, Robert, Recht, Ben
Derivative Free Optimization (DFO) is attractive when the objective function's derivatives are not available and evaluations are costly. Moreover, if the function evaluations are noisy, then approximating gradients by finite differences is difficult. This paper gives quantitative lower bounds on the performance of DFO with noisy function evaluations, exposing a fundamental and unavoidable gap between optimization performance based on noisy evaluations versus noisy gradients. This challenges the conventional wisdom that the method of finite differences is comparable to a stochastic gradient. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it only uses Boolean-valued function comparisons, rather than evaluations. This makes the algorithm useful in an even wider range of applications, including optimization based on paired comparisons from human subjects, for example. Remarkably, we show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.
Query Complexity of Derivative-Free Optimization
Jamieson, Kevin G., Nowak, Robert D., Recht, Benjamin
This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.
Active Learning
This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks." It also touches on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice. ISBN 9781608457250, 114 pages.