AITopics

Measuring the semantic meaning between words is an important issue because it is the basis for many applications, such as word sense disambiguation, document summarization, and so forth. Although it has been explored for several decades, most of the studies focus on improving the effectiveness of the problem, i.e., precision and recall. In this paper, we propose to address the efficiency issue, that given a collection of words, how to efficiently discover the top-k most semantic similar words to the query. This issue is very important for real applications yet the existing state-of-the-art strategies cannot satisfy users with reasonable performance. Efficient strategies on searching top-k semantic similar words are proposed. We provide an extensive comparative experimental evaluation demonstrating the advantages of the introduced strategies over the state-of-the-art approaches.

pagecount, query, similarity, (16 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Mining User Dwell Time for Personalized Web Search Re-Ranking

Xu, Songhua (Oak Ridge National Laboratory) | Jiang, Hao (The University of Hong Kong) | Lau, Francis Chi-Moon (The University of Hong Kong)

We propose a personalized re-ranking algorithm through mining user dwell times derived from a user's previously online reading or browsing activities. We acquire document level user dwell times via a customized web browser, from which we then infer concept word level user dwell times in order to understand a user's personal interest. According to the estimated concept word level user dwell times, our algorithm can estimate a user's potential dwell time over a new document, based on which personalized webpage re-ranking can be carried out. We compare the rankings produced by our algorithm with rankings generated by popular commercial search engines and a recently proposed personalized ranking algorithm. The results clearly show the superiority of our method.

algorithm, dwell time, user dwell time, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)

Predicting Epidemic Tendency through Search Behavior Analysis

The possibility that influenza activity can be generally detected through search log analysis has been explored in recent years. However, previous studies have mainly focused on influenza, and little attention has been paid to other epidemics. With an analysis of web user behavior data, we consider the problem of predicting the tendency of hand-foot -and-mouth disease (HFMD), whose out-break in 2010 resulted in a great panic in China. In addi-tion to search queries, we consider users’ interactions with search engines. Given the collected search logs, we cluster HFMD-related search queries, medical pages and news reports into the following sets: epidemic-related queries (ERQs), epidemic-related pages (ERPs) and ep-idemic-related news (ERNs). Furthermore, we count their own frequencies as different features, and we conduct a regression analysis with current HFMD occurrences. The experimental results show that these features exhibit good performances on both accuracy and timeliness.

erq, frequency, query, (16 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Santa Clara County > San Jose (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Xiang, Evan Wei (The Hong Kong University of Science and Technology) | Pan, Sinno Jialin (Institute for Infocomm Research) | Pan, Weike (The Hong Kong University of Science and Technology) | Su, Jian (Institute for Infocomm Research) | Yang, Qiang (The Hong Kong University of Science and Technology)

Source-Selection-Free Transfer Learning

Transfer learning addresses the problems that labeled training data are insufficient to produce a high-performance model. Typically, given a target learning task, most transfer learning approaches require to select one or more auxiliary tasks as sources by the designers. However, how to select the right source data to enable effective knowledge transfer automatically is still an unsolved problem, which limits the applicability of transfer learning. In this paper, we take one step ahead and propose a novel transfer learning framework, known as source-selection-free transfer learning (SSFTL), to free users from the need to select source domains. Instead of asking the users for source and target data pairs, as traditional transfer learning does, SSFTL turns to some online information sources such as World Wide Web or the Wikipedia for help. The source data for transfer learning can be hidden somewhere within this large online information source, but the users do not know where they are. Based on the online information sources, we train a large number of classifiers. Then, given a target task, a bridge is built for labels of the potential source candidates and the target domain data in SSFTL via some large online social media with tag cloud as a label translator. An added advantage of SSFTL is that, unlike many previous transfer learning approaches, which are difficult to scale up to the Web scale, SSFTL is highly scalable and can offset much of the training work to offline stage. We demonstrate the effectiveness and efficiency of SSFTL through extensive experiments on several real-world datasets in text classification.

classifier, source classifier, ssftl, (16 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Transfer Learning (1.00)

Line Orthogonality in Adjacency Eigenspace with Application to Community Partition

Wu, Leting (University of North Carolina at Charlotte) | Ying, Xiaowei (University of North Carolina at Charlotte) | Wu, Xintao (University of North Carolina at Charlotte) | Zhou, Zhi-Hua (Nanjing University)

Different from Laplacian or normal matrix, the properties of the adjacency eigenspace received much less attention. Recent work showed that nodes projected into the adjacency eigenspace exhibit an orthogonal line pattern and nodes from the same community locate along the same line. In this paper, we conduct theoretical studies based on graph perturbation to demonstrate why this line orthogonality property holds in the adjacency eigenspace and why it generally disappears in the Laplacian and normal eigenspaces. Using the orthogonality property in the adjacency eigenspace, we present a graph partition algorithm, AdjCluster, which first projects node coordinates to the unit sphere and then applies the classic k-means to find clusters. Empirical evaluations on synthetic data and real-world social networks validate our theoretical findings and show the effectiveness of our graph partition algorithm.

eigenspace, matrix, spectral coordinate, (16 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > North Carolina (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

Matching Large Ontologies Based on Reduction Anchors

Wang, Peng (Southeast University) | Zhou, Yuming (Nanjing University) | Xu, Baowen (Nanjing University)

Matching large ontologies is a challenge due to the high time complexity. This paper proposes a new matching method for large ontologies based on reduction anchors. This method has a distinct advantage over the divide-and-conquer methods because it dose not need to partition large ontologies. In particular, two kinds of reduction anchors, positive and negative reduction anchors, are proposed to reduce the time complexity in matching. Positive reduction anchors use the concept hierarchy to predict the ignorable similarity calculations. Negative reduction anchors use the locality of matching to predict the ignorable similarity calculations. Our experimental results on the real world data sets show that the proposed method is efficient for matching large ontologies.

algorithm, ontology, similarity calculation, (15 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Tang, Jintao (National University of Defense Technology) | Wang, Ting (National University of Defense Technology) | Lu, Qin (Hong Kong Polytechnic University) | Wang, Ji (National Laboratory for Parallel and Distributed Processing) | Li, Wenjie (Hong Kong Polytechnic University)

A Wikipedia Based Semantic Graph Model for Topic Tracking in Blogosphere

There are two key issues for information diffusion in blogosphere: (1) blog posts are usually short, noisy and contain multiple themes, (2) information diffusion through blogosphere is primarily driven by the “word-of-mouth” effect, thus making topics evolve very fast. This paper presents a novel topic tracking approach to deal with these issues by modeling a topic as a semantic graph in which the semantic relatedness between terms are learned from Wikipedia. For a given topic/post, the named entities, Wikipedia concepts, and the semantic relatedness are extracted to generate the graph model. Noises are filtered out through a graph clustering algorithm. To handle topic evolution, the topic model is enriched by using Wikipedia as background knowledge. Furthermore, graph edit distance is used to measure the similarity between a topic and its posts. The proposed method is tested using real-world blog data. Experimental results show the advantage of the proposed method on tracking topics in short, noisy text.

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

Asia > China > Hong Kong (0.04)
Africa > Southern Africa (0.04)
Oceania > Australia (0.04)
(8 more...)

Genre: Research Report (0.48)

Industry: Media > News (0.82)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Pan, Weike (Hong Kong University of Science and Technology) | Liu, Nathan N. (Hong Kong University of Science and Technology) | Xiang, Evan W. (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)

Transfer Learning to Predict Missing Ratings Via Heterogeneous User Feedbacks

Data sparsity due to missing ratings is a major challenge for collaborative filtering (CF) techniques in recommender systems. This is especially true for CF domains where the ratings are expressed numerically. We observe that, while we may lack the information in numerical ratings, we may have more data in the form of binary ratings. This is especially true when users can easily express themselves with their likes and dislikes for certain items. In this paper, we explore how to use the binary preference data expressed in the form of like/dislike to help reduce the impact of data sparsity of more expressive numerical ratings. We do this by transferring the rating knowledge from some auxiliary data source in binary form (that is, likes or dislikes), to a target numerical rating matrix. Our solution is to model both numerical ratings and like/dislike in a principled way, using a novel framework of Transfer by Collective Factorization (TCF). In particular, we construct the shared latent space collectively and learn the data-dependent effect separately. A major advantage of the TCF approach over previous collective matrix factorization (or bi-factorization) methods is that we are able to capture the data-dependent effect when sharing the data-independent knowledge, so as to increase the overall quality of knowledge transfer. Experimental results demonstrate the effectiveness of TCF at various sparsity levels as compared to several state-of-the-art methods.

auxiliary data, factorization, matrix factorization, (14 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country: Asia > China > Hong Kong (0.05)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

LIMES — A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data

Ngomo, Axel-Cyrille Ngonga (University of Leipzig) | Auer, Sören (University of Leipzig)

The Linked Data paradigm has evolved into a powerful enabler for the transition from the document-oriented Web into the Semantic Web. While the amount of data published as Linked Data grows steadily and has surpassed 25 billion triples, less than 5\% of these triples are links between knowledge bases. Link discovery frameworks provide the functionality necessary to discover missing links between knowledge bases. Yet, this task requires a significant amount of time, especially when it is carried out on large data sets. This paper presents and evaluates LIMES, a novel time-efficient approach for link discovery in metric spaces. Our approach utilizes the mathematical characteristics of metric spaces during the mapping process to filter out a large number of those instance pairs that do not suffice the mapping conditions. We present the mathematical foundation and the core algorithms employed in LIMES. We evaluate our algorithms with synthetic data to elucidate their behavior on small and large data sets with different configurations and compare the runtime of LIMES with another state-of-the-art link discovery tool.

exemplar, knowledge base, metric space, (17 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country: Europe > Germany > Saxony > Leipzig (0.05)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

McSherry, David (University of Ulster)

Minimally Complete Recommendations

Recent research has highlighted the benefits of completeness as a retrieval criterion in recommender systems. In complete retrieval, any subset of the constraints in a given query that can be satisfied must be satisfied by at least one of the retrieved products. Minimal completeness (i.e., always retrieving the smallest set of products needed for completeness) is also beginning to attract research interest as a way to minimize cognitive load in the approach. Other important features of a retrieval algorithm’s behavior include the diversity of the retrieved products and the order in which they are presented to the user. In this paper, we present a new algorithm for minimally complete retrieval (MCR) in which the ranking of retrieved products is primarily based on the number of constraints that they satisfy, but other measures such as similarity or utility can also be used to inform the retrieval process. We also present theoretical and empirical results that demonstrate our algorithm’s ability to minimize cognitive load while ensuring the completeness and diversity of the retrieved products.

algorithm, constraint, retrieval, (17 more...)

Twenty-Second International Joint Conference on Artificial Intelligence

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > Scotland > City of Aberdeen > Aberdeen (0.04)
Europe > United Kingdom > Northern Ireland (0.04)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.36)