Goto

Collaborating Authors

 Information Retrieval


An E-Learning Recommender That Helps Learners Find the Right Materials

AAAI Conferences

Learning materials are increasingly available on the Web making them an excellent source of information for building e-Learning recommendation systems. However, learners often have difficulty finding the right materials to support their learning goals because they lack sufficient domain knowledge to craft effective queries that convey what they wish to learn. The unfamiliar vocabulary often used by domain experts creates a semantic gap between learners and experts, and also makes it difficult to map a learner's query to relevant learning materials. We build an e-Learning recommender system that uses background knowledge extracted from a collection of teaching materials and encyclopedia sources to support the refinement of learners' queries. Our approach allows us to bridge the gap between learners and teaching experts. We evaluate our method using a collection of realistic learner queries and a dataset of Machine Learning and Data Mining documents. Evaluation results show our method to outperform benchmark approaches and demonstrates its effectiveness in assisting learners to find the right materials.


TipMaster: A Knowledge Base of Authoritative Local News Sources on Social Media

AAAI Conferences

Twitter has become an important online source for real-time news dissemination. Especially, official accounts of local government and media outlets have provided newsworthy and authoritative information, revealing local trends and breaking news. In this paper, we describe TipMaster an automatically constructed knowledge base of Twitter accounts that are likely to report local news, from government agencies to local media outlets. First, we implement classifiers for detecting these accounts by integrating heterogeneous information from the accounts' textual metadata, profile images, and their tweet messages. Next, we demonstrate two use cases for TipMaster: 1) as a platform that monitors real-time social media messages for local breaking news, and 2) as an authoritative source for verifying nascent rumors. Experimental results show that our account classification algorithms achieve both high precision and recall (around 90%). The demonstrated case studies prove that our platform is able to detect local breaking news or debunk emergent rumors faster than mainstream media sources.


PoseHD: Boosting Human Detectors Using Human Pose Information

AAAI Conferences

As most recently proposed methods for human detection have achieved a sufficiently high recall rate within a reasonable number of proposals, in this paper, we mainly focus on how to improve the precision rate of human detectors. In order to address the two main challenges in precision improvement, i.e., i) hard background instances and ii) redundant partial proposals, we propose the novel PoseHD framework, a top-down pose-based approach on the basis of an arbitrary state-of-theart human detector. In our proposed PoseHD framework, we first make use of human pose estimation (in a batch manner) and present pose heatmap classification (by a convolutional neural network) to eliminate hard negatives by extracting the more detailed structural information; then, we utilize posebased proposal clustering and reranking modules, filtering redundant partial proposals by comprehensively considering (a) Positive instances (b) Hard negative instances both holistic and part information. The experimental results on multiple pedestrian benchmark datasets validate that our proposed PoseHD framework can generally improve the overall performance of recent state-of-the-art human detectors (by 2-4% in both mAP and MR metrics). Moreover, our PoseHD framework can be easily extended to object detection with large-scale object part annotations. Finally, in this paper, we present extensive ablative analysis to compare our approach with these traditional bottom-up pose-based models and highlight (c) Redundant partial proposals (in blue box) the importance of our framework design decisions.


Learning Robust Search Strategies Using a Bandit-Based Approach

AAAI Conferences

Effective solving of constraint problems often requires choosing good or specific search heuristics. However, choosing or designing a good search heuristic is non-trivial and is often a manual process. In this paper, rather than manually choosing/designing search heuristics, we propose the use of bandit-based learning techniques to automatically select search heuristics. Our approach is online where the solver learns and selects from a set of heuristics during search. The goal is to obtain automatic search heuristics which give robust performance. Preliminary experiments show that our adaptive technique is more robust than the original search heuristics. It can also outperform the original heuristics.


A Semantic QA-Based Approach for Text Summarization Evaluation

AAAI Conferences

Many Natural Language Processing and Computational Linguistics applications involve the generation of new texts based on some existing texts, such as summarization, text simplification and machine translation. However, there has been a serious problem haunting these applications for decades, that is, how to automatically and accurately assess quality of these applications. In this paper, we will present some preliminary results on one especially useful and challenging problem in NLP system evaluation---how to pinpoint content differences of two text passages (especially for large passages such as articles and books). Our idea is intuitive and very different from existing approaches. We treat one text passage as a small knowledge base, and ask it a large number of questions to exhaustively identify all content points in it. By comparing the correctly answered questions from two text passages, we will be able to compare their content precisely. The experiment using 2007 DUC summarization corpus clearly shows promising results.


Asymmetric Deep Supervised Hashing

AAAI Conferences

Hashing has been widely used for large-scale approximate nearest neighbor search because of its storage and search efficiency. Recent work has found that deep supervised hashing can significantly outperform non-deep supervised hashing in many applications. However, most existing deep supervised hashing methods adopt a symmetric strategy to learn one deep hash function for both query points and database (retrieval) points. The training of these symmetric deep supervised hashing methods is typically time-consuming, which makes them hard to effectively utilize the supervised information for cases with large-scale database. In this paper, we propose a novel deep supervised hashing method, called asymmetric deep supervised hashing (ADSH), for large-scale nearest neighbor search. ADSH treats the query points and database points in an asymmetric way. More specifically, ADSH learns a deep hash function only for query points, while the hash codes for database points are directly learned. The training of ADSH is much more efficient than that of traditional symmetric deep supervised hashing methods. Experiments show that ADSH can achieve state-of-the-art performance in real applications.


Distributed Composite Quantization

AAAI Conferences

Approximate nearest neighbor (ANN) search is a fundamental problem in computer vision, machine learning and information retrieval. Recently, quantization-based methods have drawn a lot of attention due to their superior accuracy and comparable efficiency compared with traditional hashing techniques. However, despite the prosperity of quantization techniques, they are all designed for the centralized setting, i.e., quantization is performed on the data on a single machine. This makes it difficult to scale these techniques to large-scale datasets. Built upon the Composite Quantization, we propose a novel quantization algorithm for data dis- tributed across different nodes of an arbitrary network. The proposed Distributed Composite Quantization (DCQ) decom-poses Composite Quantization into a set of decentralized sub-problems such that each node solves its own sub-problem on its local data, meanwhile is still able to attain consistent quantizers thanks to the consensus constraint. Since there is no exchange of training data across the nodes in the learning process, the communication cost of our method is low. Ex- tensive experiments on ANN search and image retrieval tasks validate that the proposed DCQ significantly improves Composite Quantization in both efficiency and scale, while still maintaining competitive accuracy.


Leveraging Machine Learning and AI for Making Your Paid Search Engine Marketing More Effective

#artificialintelligence

SEM has undergone huge transformation due to the advent of new and intelligent technologies. Gone are the days when search engine marketers could build a site using some specific keywords, create some links, and start ranking within a short period of time. Since then, the web has undergone tremendous change. So, the methods that was once fail-proof has run out of favour now and is no longer viable. A clear example of how AI impacts search engine marketing is Google's newly launched Rank Brain algorithm that contributes to search engine results.


SERMLY - Search Engine Reputation Management Tools

#artificialintelligence

SERMLY is the world's first search engine reputation service that uses blockchain technology. It will constantly analyze the results, keep the data safe, improve the search engine reputation and provide you with the necessary analytical data based on Machine Learning and Big Data.


Building Cross-Lingual End-to-End Product Search with Tensorflow · Han Xiao Tech Blog

@machinelearnbot

Product search is one of the key components in an online retail store. Essentially, you need a system that matches a text query with a set of products in your store. A good product search can understand user's query in any language, retrieve as many relevant products as possible, and finally present the result as a list, in which the preferred products should be at the top, and the irrelevant products should be at the bottom. Google web search), products are structured data. A product is often described by a list of key-value pairs, a set of pictures and some free text. In the developers' world, Apache Solr and Elasticsearch are known as de-facto solutions for full-text search, making them a top contender for building e-commerce product search. At the core, Solr/Elasticsearch is a symbolic information retrieval (IR) system.