AITopics

Ninth International AAAI Conference on Web and Social Media

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Communications > Social Media (0.80)

arXiv.org Artificial IntelligenceMar-22-2015

Construction of FuzzyFind Dictionary using Golay Coding Transformation for Searching Applications

Kowsari, Kamran, Yammahi, Maryam, Bari, Nima, Vichr, Roman, Alsaby, Faisal, Berkovich, Simon Y.

Searching through a large volume of data is very critical for companies, scientists, and searching engines applications due to time complexity and memory complexity. In this paper, a new technique of generating FuzzyFind Dictionary for text mining was introduced. We simply mapped the 23 bits of the English alphabet into a FuzzyFind Dictionary or more than 23 bits by using more FuzzyFind Dictionary, and reflecting the presence or absence of particular letters. This representation preserves closeness of word distortions in terms of closeness of the created binary vectors within Hamming distance of 2 deviations. This paper talks about the Golay Coding Transformation Hash Table and how it can be used on a FuzzyFind Dictionary as a new technology for using in searching through big data. This method is introduced by linear time complexity for generating the dictionary and constant time complexity to access the data and update by new data sets, also updating for new data sets is linear time depends on new data points. This technique is based on searching only for letters of English that each segment has 23 bits, and also we have more than 23-bit and also it could work with more segments as reference table.

data mining, information retrieval, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.14569/IJACSA.2015.060313

1503.06483

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Government (0.47)
Education (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

AAAI ConferencesMar-15-2015

Named Entity Recognition in Travel-Related Search Queries

This paper addresses the problem of named entity recognition (NER) in travel-related search queries. NER is an important step toward a richer understanding of user-generated inputs in information retrieval systems. NER in queries is challenging due to minimal context and few structural clues. NER in restricted-domain queries is useful in vertical search applications, for example following query classification in general search. This paper describes an efficient machine learning-based solution for the high-quality extraction of semantic entities from query inputs in a restricted-domain information retrieval setting. We apply a conditional random field (CRF) sequence model to travel-domain search queries and achieve high-accuracy results. Our approach yields an overall F1 score of 86.4% on a held-out test set, outperforming a baseline score of 82.0% on a CRF with standard features. The resulting NER classifier is currently in use in a real-life travel search engine.

information retrieval, natural language, travel-related search query, (3 more...)

Twenty-Seventh IAAI Conference

Industry: Consumer Products & Services > Travel (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Luo, Qun (Beijing University of Posts and Telecommunications) | Xu, Weiran (Beijing University of Posts and Telecommunications)

Learning Word Vectors Efficiently Using Shared Representations and Document Representations

We propose some better word embedding models based on vLBL model and ivLBL model by sharing representations between context and target words and using document representations. Our proposed models are much simpler which have almost half less parameters than the state-of-the-art methods. We achieve better results on word analogy task than the best ones reported before using significantly less training data and computing time.

artificial intelligence, information retrieval, natural language, (2 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia > China > Beijing > Beijing (0.35)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.76)

Lower and Upper Bounds for SPARQL Queries over OWL Ontologies

Glimm, Birte (University of Ulm) | Kazakov, Yevgeny (University of Ulm) | Kollia, Ilianna (National Technical University of Athens) | Stamou, Giorgos (National Technical University of Athens)

The paper presents an approach for optimizing the evaluation of SPARQL queries over OWL ontologies using SPARQL's OWL Direct Semantics entailment regime. The approach is based on the computation of lower and upper bounds, but we allow for much more expressive queries than related approaches. In order to optimize the evaluation of possible query answers in the upper but not in the lower bound, we present a query extension approach that uses schema knowledge from the queried ontology to extend the query with additional parts. We show that the resulting query is equivalent to the original one and we use the additional parts that are simple to evaluate for restricting the bounds of subqueries of the initial query. In an empirical evaluation we show that the proposed query extension approach can lead to a significant decrease in the query execution time of up to four orders of magnitude.

query, subquery, template, (15 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Europe > Greece > Attica > Athens (0.04)
Europe > Germany (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.88)

Kejriwal, Mayank (University of Texas at Austin)

Entity Resolution in a Big Data Framework

Entity Resolution (ER) concerns identifying logically equivalent pairs of entities that may be syntactically disparate. Although ER is a long-standing problem in the artificial intelligence community, the growth of Linked Open Data, a collection of semi-structured datasets published and inter-connected on the Web, mandates a new approach. The thesis is that building a viable Entity Resolution solution for serving Big Data needs requires simultaneously resolving challenges of automation, heterogeneity, scalability and domain independence. The dissertation aims to build such a system and evaluate it on real-world datasets published already as Linked Open Data.

data mining, entity resolution, information retrieval, (18 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas > Travis County > Austin (0.15)

Industry: Information Technology (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.82)
Information Technology > Data Science > Data Mining > Big Data (0.72)

Ordering-Sensitive and Semantic-Aware Topic Modeling

Yang, Min (The University of Hong Kong) | Cui, Tianyi (Zhejiang University) | Tu, Wenting (The University of Hong Kong)

Topic modeling of textual corpora is an important and challenging problem. In most previous work, the “bag-of-words” assumption is usually made which ignores the ordering of words. This assumption simplifies the computation, but it unrealistically loses the ordering information and the semantic of words in the context. In this paper, we present a Gaussian Mixture Neural Topic Model (GMNTM) which incorporates both the ordering of words and the semantic meaning of sentences into topic modeling. Specifically, we represent each topic as a cluster of multi-dimensional vectors and embed the corpus into a collection of vectors generated by the Gaussian mixture model. Each word is affected not only by its topic, but also by the embedding vector of its surrounding words and the context. The Gaussian mixture components and the topic of documents, sentences and words can be learnt jointly. Extensive experiments show that our model can learn better topics and more accurate word distributions for each topic. Quantitatively, comparing to state-of-the-art topic modeling approaches, GMNTM obtains significantly better performance in terms of perplexity, retrieval accuracy and classification accuracy.

information retrieval, machine learning, natural language, (21 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Jordan (0.05)
Asia > China > Hong Kong (0.04)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.89)
(3 more...)

Unsupervised Phrasal Near-Synonym Generation from Text Corpora

Gupta, Dishan (Carnegie Mellon University) | Carbonell, Jaime (Carnegie Mellon University) | Gershman, Anatole (Carnegie Mellon University) | Klein, Steve (Meaningful Machines, LLC) | Miller, David (Meaningful Machines, LLC)

Unsupervised discovery of synonymous phrases is useful in a variety of tasks ranging from text mining and search engines to semantic analysis and machine translation. This paper presents an unsupervised corpus-based conditional model: Near-Synonym System (NeSS) for finding phrasal synonyms and near synonyms that requires only a large monolingual corpus. The method is based on maximizing information-theoretic combinations of shared contexts and is parallelizable for large-scale processing. An evaluation framework with crowd-sourced judgments is proposed and results are compared with alternate methods, demonstrating considerably superior results to the literature and to thesaurus look up for multi-word phrases. Moreover, the results show that the statistical scoring functions and overall scalability of the system are more important than language specific NLP tools. The method is language-independent and practically useable due to accuracy and real-time performance via parallel decomposition.

artificial intelligence, information retrieval, natural language, (17 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)

Exploring Key Concept Paraphrasing Based on Pivot Language Translation for Question Retrieval

Zhang, Wei-Nan (Harbin Institute of Technology) | Ming, Zhao-Yan (Digipen Institute of Technology) | Zhang, Yu (Harbin Institute of Technology) | Liu, Ting (Harbin Institute of Technology) | Chua, Tat-Seng (National University of Singapore)

Question retrieval in current community-based question answering (CQA) services does not, in general, work well for long and complex queries. One of the main difficulties lies in the word mismatch between queries and candidate questions. Existing solutions try to expand the queries at word level, but they usually fail to consider concept level enrichment. In this paper, we explore a pivot language translation based approach to derive the paraphrases of key concepts. We further propose a unified question retrieval model which integrates the keyconcepts and their paraphrases for the query question. Experimental results demonstrate that the paraphrase enhanced retrieval model significantly outperforms the state-of-the-art models in question retrieval.

artificial intelligence, information retrieval, natural language, (18 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)

Mining Query Subtopics from Questions in Community Question Answering

Wu, Yu (Beihang University) | Wu, Wei (Microsoft Reasearch Asia) | Li, Zhoujun (Beihang University) | Zhou, Ming (Microsoft Reasearch Asia)

This paper proposes mining query subtopics from questions in community question answering (CQA). The subtopics are represented as a number of clusters of questions with keywords summarizing the clusters. The task is unique in that the subtopics from questions can not only facilitate user browsing in CQA search, but also describe aspects of queries from a question-answering perspective. The challenges of the task include how to group semantically similar questions and how to find keywords capable of summarizing the clusters. We formulate the subtopic mining task as a non-negative matrix factorization (NMF) problem and further extend the model of NMF to incorporate question similarity estimated from metadata of CQA into learning. Compared with existing methods, our method can jointly optimize question clustering and keyword extraction and encourage the former task to enhance the latter. Experimental results on large scale real world CQA datasets show that the proposed method significantly outperforms the existing methods in terms of keyword extraction, while achieving a comparable performance to the state-of-the-art methods for question clustering.

keyword extraction, machine learning, question answering, (16 more...)

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)