AITopics

Automated topic identification is an essential component invarious information retrieval and knowledge representationtasks such as automated summary generation, categorization search and document indexing. In this paper, we present the Wikitop system to automatically generate topic trees from the input text by performing hierarchical classification using the Wikipedia Category Network (WCN). Our preliminary results over a collection of 125 articles are encouraging and show potential of a robust methodology for automated topic tree generation.

information retrieval, machine learning, natural language, (18 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > India > Tamil Nadu (0.15)

Industry: Media > News (0.91)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.31)

Boosting Complementary Hash Tables for Fast Nearest Neighbor Search

Liu, Xianglong (Beihang University) | Deng, Cheng ( Xidian University ) | Mu, Yadong (Peking University) | Li, Zhujin (Beihang University)

Hashing has been proven a promising technique for fast nearest neighbor search over massive databases. In many practical tasks it usually builds multiple hash tables for a desired level of recall performance. However, existing multi-table hashing methods suffer from the heavy table redundancy, without strong table complementarity and effective hash code learning. To address the problem, this paper proposes a multi-table learning method which pursues a specified number of complementary and informative hash tables from a perspective of ensemble learning. By regarding each hash table as a neighbor prediction model, the multi-table search procedure boils down to a linear assembly of predictions stemming from multiple tables. Therefore, a sequential updating and learning framework is naturally established in a boosting mechanism, theoretically guaranteeing the table complementarity and algorithmic convergence. Furthermore, each boosting round pursues the discriminative hash functions for each table by a discrete optimization in the binary code space. Extensive experiments carried out on two popular tasks including Euclidean and semantic nearest neighbor search demonstrate that the proposed boosted complementary hash-tables method enjoys the strong table complementarity and significantly outperforms the state-of-the-arts.

information retrieval, machine learning, natural language, (15 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(2 more...)

RQUERY: Rewriting Natural Language Queries on Knowledge Graphs to Alleviate the Vocabulary Mismatch Problem

Shekarpour, Saeedeh (Wright State University) | Marx, Edgard (Universität Leipzig) | Auer, Sören (University of Bonn) | Sheth, Amit (Wright State University)

For non-expert users, a textual query is the most popular and simple means for communicating with a retrieval or question answering system.However, there is a risk of receiving queries which do not match with the background knowledge.Query expansion and query rewriting are solutions for this problem but they are in danger of potentially yielding a large number of irrelevant words, which in turn negatively influences runtime as well as accuracy.In this paper, we propose a new method for automatic rewriting input queries on graph-structured RDF knowledge bases.We employ a Hidden Markov Model to determine the most suitable derived words from linguistic resources.We introduce the concept of triple-based co-occurrence for recognizing co-occurred words in RDF data.This model was bootstrapped with three statistical distributions.Our experimental study demonstrates the superiority of the proposed approach to the traditional n-gram model.

information retrieval, machine learning, natural language, (17 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States (1.00)
Europe (0.68)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.90)

Efficiently Mining High Quality Phrases from Texts

Li, Bing (Northeastern University, Shenyang) | Yang, Xiaochun (Northeastern University, Shenyang) | Wang, Bin (Northeastern University, Shenyang) | Cui, Wei (Northeastern University, Shenyang)

Phrase mining is a key research problem for semantic analysis and text-based information retrieval. The existing approaches based on NLP, frequency, and statistics cannot extract high quality phrases and the processing is also time consuming, which are not suitable for dynamic on-line applications. In this paper, we propose an efficient high-quality phrase mining approach (EQPM). To the best of our knowledge, our work is the first effort that considers both intra-cohesion and inter-isolation in mining phrases, which is able to guarantee appropriateness. We also propose a strategy to eliminate order sensitiveness, and ensure the completeness of phrases. We further design efficient algorithms to make the proposed model and strategy feasible. The empirical evaluations on four real data sets demonstrate that our approach achieved a considerable quality improvement and the processing time was 2.3X - 29X faster than the state-of-the-art works.

information retrieval, machine learning, segmentation, (21 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States (1.00)
Asia > China (0.69)
Europe (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

S2JSD-LSH: A Locality-Sensitive Hashing Schema for Probability Distributions

Mao, Xian-Ling (Beijing Institute of Technology) | Feng, Bo-Si (Beijing Institute of Technology) | Hao, Yi-Jing (Beijing Institute of Technology) | Nie, Liqiang (National University of Singapore) | Huang, Heyan (Beijing Institute of Technology) | Wen, Guihua (South China University of Technology)

To compare the similarity of probability distributions, the information-theoretically motivated metrics like Kullback-Leibler divergence (KL) and Jensen-Shannon divergence (JSD) are often more reasonable compared with metrics for vectors like Euclidean and angular distance. However, existing locality-sensitive hashing (LSH) algorithms cannot support the information-theoretically motivated metrics for probability distributions. In this paper, we first introduce a new approximation formula for S2JSD-distance, and then propose a novel LSH scheme adapted to S2JSD-distance for approximate nearest neighbors search in high-dimensional probability distributions. We define the specific hashing functions, and prove their local-sensitivity. Furthermore, extensive empirical evaluations well illustrate the effectiveness of the proposed hashing schema on six public image datasets and two text datasets, in terms of mean Average Precision, Precision@N and Precision-Recall curve.

data mining, information retrieval, machine learning, (18 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.68)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Gollapalli, Sujatha Das (Institute for Infocomm Research, A*STAR) | Li, Xiao-li (Institute for Infocomm Research, A*STAR) | Yang, Peng (Tencent AI Lab)

Incorporating Expert Knowledge into Keyphrase Extraction

Keyphrases that efficiently summarize a document’s content are used in various document processing and retrieval tasks. Current state-of-the-art techniques for keyphrase extraction operate at a phrase-level and involve scoring candidate phrases based on features of their component words.In this paper, we learn keyphrase taggers for research papers using token-based features incorporating linguistic, surface-form, and document-structure information through sequence labeling. We experimentally illustrate that using within document features alone, our tagger trained with ConditionalRandom Fields performs on-par with existing state-of-the-art systems that rely on information from Wikipedia and citation networks. In addition, we are also able to harness recent work on feature labeling to seamlessly incorporate expert knowledge and predictions from existing systems to enhance the extraction performance further. We highlight the modeling advantages of our keyphrase taggers and show significant performance improvements on two recently-compiled datasets of keyphrases from Computer Science research papers.

information retrieval, machine learning, natural language, (20 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > California (0.28)

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
(2 more...)

Yang, Shuo (Peking University) | Zou, Lei (Peking University and Beijing Institute of Big Data Research) | Wang, Zhongyuan (Microsoft Research Asia) | Yan, Jun (Microsoft Research Asia) | Wen, Ji-Rong (Renmin University of China and Beijing Key Laboratory of Big Data Management and Analysis Methods)

Efficiently Answering Technical Questions — A Knowledge Graph Approach

More and more users prefer to ask their technical questions online. For machines, understanding a question is nontrivial. Current approaches lack explicit background knowledge.In this paper, we introduce a novel technical question understanding approach to recommending probable solutions to users. First, a knowledge graph is constructed which contains abundant technical information, and an augmented knowledge graph is built on the basis of the knowledge graph, to link the knowledge graph and documents. Then we develop a light weight question driven mechanism to select candidate documents. To improve the online performance, we propose an index-based random walk to support the online search. We use comprehensive experiments to evaluate the effectiveness of our approach on a large scale of real-world query logs. Our system outperforms main-stream search engine and the state-of-art information retrieval methods. Meanwhile, extensive experiments confirm the efficiency of our index-based online search mechanism.

artificial intelligence, information retrieval, natural language, (15 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Dey, Palash (Indian Institute of Science, Bangalore)

Query Complexity of Tournament Solutions

A directed graph where there is exactly one edge between every pair of vertices is called a tournament. Finding the “best” set of vertices of a tournament is a well studied problem in social choice theory. A tournament solution takes a tournament as input and outputs a subset of vertices of the input tournament. However, in many applications, for example, choosing the best set of drugs from a given set of drugs, the edges of the tournament are given only implicitly and knowing the orientation of an edge is costly. In such scenarios, we would like to know the best set of vertices (according to some tournament solution) by “querying” as few edges as possible. We, in this paper, precisely study this problem for commonly used tournament solutions: given an oracle access to the edges of a tournament T , find f(T) by querying as few edges as possible, for a tournament solution f. We first show that the set of Condorcet non-losers in a tournament can be found by querying 2n−⌊log n⌋−2 edges only and this is tight in the sense that every algorithm for finding the set of Condorcet non-losers needs to query at least 2n−⌊log n⌋−2 edges in the worst case, where n is the number of vertices in the input tournament. We then move on to study other popular tournament solutions and show that any algorithm for finding the Copeland set, the Slater set, the Markov set, the bipartisan set, the uncovered set, the Banks set, and the top cycle must query Ω(n 2 ) edges in the worst case. On the positive side, we are able to circumvent our strong query complexity lower bound results by proving that, if the size of the top cycle of the input tournament is at most k, then we can find all the tournament solutions mentioned above by querying O(nk + n log n / log(1− 1 / k ) ) edges only.

artificial intelligence, natural language, tournament, (17 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)

Active Search in Intensionally Specified Structured Spaces

Oglic, Dino (University of Bonn) | Garnett, Roman (Washington University in St. Louis) | Gaertner, Thomas (The University of Nottingham)

We consider an active search problem in intensionally specified structured spaces. The ultimate goal in this setting is to discover structures from structurally different partitions of a fixed but unknown target class. An example of such a process is that of computer-aided de novo drug design. In the past 20 years several Monte Carlo search heuristics have been developed for this process. Motivated by these hand-crafted search heuristics, we devise a Metropolis--Hastings sampling scheme where the acceptance probability is given by a probabilistic surrogate of the target property, modeled with a max entropy conditional model. The surrogate model is updated in each iteration upon the evaluation of a selected structure. The proposed approach is consistent and the empirical evidence indicates that it achieves a large structural variety of discovered targets.

information retrieval, natural language, proposal generator, (16 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Europe > United Kingdom > England (0.28)
Europe > Germany (0.28)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)

Ordinal Constrained Binary Code Learning for Nearest Neighbor Search

Liu, Hong (Xiamen University) | Ji, Rongrong (Xiamen University) | Wu, Yongjian (Tencent Technology (Shanghai) Co.,Ltd ) | Huang, Feiyue (Tencent Technology (Shanghai) Co.,Ltd)

Recent years have witnessed extensive attention in binary code learning, a.k.a. hashing, for nearest neighbor search problems. It has been seen that high-dimensional data points can quantize into binary codes to give an efficient similarity approximation via Hamming distance. Among the existing schemes, ranking-based hashing is recent promising that targets at preserving ordinal relations of ranking in the Hamming space to minimize retrieval loss. However, the size of the ranking tuples that show the ordinal relations, is quadratic or cubic to the size of training samples. It is so very expensive to embed such ranking tuples in binary code learning, especially given a large-scale training data set. Besides, it remains difficult to build ranking tuples efficiently for most ranking-preserving hashing, which are deployed over an ordinal graph-based setting. To handle these problems, we propose a novel ranking-preserving hashing method, dubbed Ordinal Constraint Hashing (OCH), which efficiently learns the optimal hashing functions with a graph-based approximation to embed the ordinal relations. The core idea is to reduce the size of ordinal graph with ordinal constraint projection, which preserves the ordinal relations through a small data set (such as clusters or random samples). In particular, to learn such hash functions effectively, we further relax the discrete constraints and design a specific stochastic gradient decent algorithm for optimization. Experimental results on three large-scale visual search benchmark datasets, i.e. LabelMe, Tiny100K and GIST1M, show that the proposed OCH method can achieve superior performance over the state-of-the-arts approaches.

information retrieval, machine learning, natural language, (17 more...)

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.29)

Genre: Research Report (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)