Information Retrieval
BBookX: Building Online Open Books for Personalized Learning
Liang, Chen (Pennsylvania State University) | Wang, Shuting (Pennsylvania State University) | Wu, Zhaohui (Pennsylvania State University) | Williams, Kyle (Pennsylvania State University) | Pursel, Bart (Pennsylvania State University) | Brautigam, Benjamin (Pennsylvania State University) | Saul, Sherwyn (Pennsylvania State University) | Williams, Hannah (Pennsylvania State University) | Bowen, Kyle (Pennsylvania State University) | Giles, C. Lee (Pennsylvania State University)
We demonstrate BBookX, a novel system that auto-matically builds in collaboration with a user online openbooks by searching open educational resources (OER).This system explores the use of retrieval technologies todynamically generate zero-cost materials such as text-books for personalized learning.
Scaling Relational Inference Using Proofs and Refutations
Mangal, Ravi (Georgia Institute of Technology) | Zhang, Xin (Georgia Institute of Technology) | Kamath, Aditya (Georgia Institute of Technology) | Nori, Aditya V. (Microsoft Research) | Naik, Mayur (Georgia Institute of Technology)
Many inference problems are naturally formulated using hard and soft constraints over relational domains: the desired solution must satisfy the hard constraints, while optimizing the objectives expressed by the soft constraints. Existing techniques for solving such constraints rely on efficiently grounding a sufficient subset of constraints that is tractable to solve. We present an eager-lazy grounding algorithm that eagerly exploits proofs and lazily refutes counterexamples. We show that our algorithm achieves significant speedup over existing approaches without sacrificing soundness for real-world applications from information retrieval and program analysis.
Exploring Multiple Feature Spaces for Novel Entity Discovery
Wu, Zhaohui (The Pennsylvania State University) | Song, Yang (Microsoft Research) | Giles, C. Lee (The Pennsylvania State University)
Continuously discovering novel entities in news and Web data is important for Knowledge Base (KB) maintenance. One of the key challenges is to decide whether an entity mention refers to an in-KB or out-of-KB entity. We propose a principled approach that learns a novel entity classifier by modeling mention and entity representation into multiple feature spaces, including contextual, topical, lexical, neural embedding and query spaces. Different from most previous studies that address novel entity discovery as a submodule of entity linking systems, our model is more a generalized approach and can be applied as a pre-filtering step of novel entities for any entity linking systems. Experiments on three real-world datasets show that our method significantly outperforms existing methods on identifying novel entities.
Identifying Search Keywords for Finding Relevant Social Media Posts
Wang, Shuai (University of Illinois at Chicago) | Chen, Zhiyuan (University of Illinois at Chicago) | Liu, Bing (University of Illinois at Chicago) | Emery, Sherry (University of Illinois at Chicago)
In almost any application of social media analysis, the user is interested in studying a particular topic or research question. Collecting posts or messages relevant to the topic from a social media source is a necessary step. Due to the huge size of social media sources (e.g., Twitter and Facebook), one has to use some topic keywords to search for possibly relevant posts. However, gathering a good set of keywords is a very tedious and time-consuming task. It often involves a lengthy iterative process of searching and manual reading. In this paper, we propose a novel technique to help the user identify topical search keywords. Our experiments are carried out on identifying such keywords for five (5) real-life application topics to be used for searching relevant tweets from the Twitter API. The results show that the proposed method is highly effective.
Progressive EM for Latent Tree Models and Hierarchical Topic Detection
Chen, Peixian (The Hong Kong University of Science and Technology) | Zhang, Nevin L. (The Hong Kong University of Science and Technology) | Poon, Leonard K. M. (The Hong Kong Institute of Education) | Chen, Zhourong (The Hong Kong University of Science and Technology)
Hierarchical latent tree analysis (HLTA) is recently proposed as a new method for topic detection. It differs fundamentally from the LDA-based methods in terms of topic definition, topic-document relationship, and learning method. It has been shown to discover significantly more coherent topics and better topic hierarchies. However, HLTA relies on the Expectation-Maximization (EM) algorithm for parameter estimation and hence is not efficient enough to deal with large datasets. In this paper, we propose a method to drastically speed up HLTA using a technique inspired by the advances in the method of moments. Empirical experiments show that our method greatly improves the efficiency of HLTA. It is as efficient as the state-of-the-art LDA-based method for hierarchical topic detection and finds substantially better topics and topic hierarchies.
Understanding Emerging Spatial Entities
Yeo, Jinyoung (Pohang University of Science and Technology) | Park, Jin-woo (Pohang University of Science and Technology) | Hwang, Seung-won (Yonsei university)
In Foursquare or Google+ Local, emerging spatial entities, such as new business or venue, are reported to grow by 1% every day. As information on such spatial entities is initially limited (e.g., only name), we need to quickly harvest related information from social media such as Flickr photos. Especially, achieving high-recall in photo population is essential for emerging spatial entities, which suffer from data sparseness (e.g., 71% restaurants of TripAdvisor in Seattle do not have any photo, as of Sep 03, 2015). Our goal is thus to address this limitation by identifying effective linking techniques for emerging spatial entities and photos. Compared with state-of-the-art baselines, our proposed approach improves recall and F1 score by up to 24% and 18%, respectively. To show the effectiveness and robustness of our approach, we have conducted extensive experiments in three different cities, Seattle, Washington D.C., and Taipei, of varying characteristics such as geographical density and language.
Supervised Hashing via Uncorrelated Component Analysis
Sohn, SungRyull (Electronics and Telecommunications Research Institute and Korea Advanced Institute of Science and Technology) | Kim, Hyunwoo (Kakao Corp.) | Kim, Junmo (Korea Advanced Institute of Science and Technology)
The Approximate Nearest Neighbor (ANN) search problem is important in applications such as information retrieval. Several hashing-based search methods that provide effective solutions to the ANN search problem have been proposed. However, most of these focus on similarity preservation and coding error minimization, and pay little attention to optimizing the precision-recall curve or receiver operating characteristic curve. In this paper, we propose a novel projection-based hashing method that attempts to maximize the precision and recall. We first introduce an uncorrelated component analysis (UCA) by examining the precision and recall, and then propose a UCA-based hashing method. The proposed method is evaluated with a variety of datasets. The results show that UCA-based hashing outperforms state-of-the-art methods, and has computationally efficient training and encoding processes.
Social Role-Aware Emotion Contagion in Image Social Networks
Yang, Yang (Tsinghua University) | Jia, Jia (Tsinghua University) | Wu, Boya (Tsinghua Univeristy) | Tang, Jie (Tsinghua University)
Psychological theories suggest that emotion represents the state of mind and instinctive responses of oneโs cognitive system (Cannon 1927). Emotions are a complex state of feeling that results in physical and psychological changes that influence our behavior. In this paper, we study an interesting problem of emotion contagion in social networks. In particular, by employing an image social network (Flickr) as the basis of our study, we try to unveil how usersโ emotional statuses influence each other and how usersโ positions in the social network affect their influential strength on emotion. We develop a probabilistic framework to formalize the problem into a role-aware contagion model. The model is able to predict usersโ emotional statuses based on their historical emotional statuses and social structures. Experiments on a large Flickr dataset show that the proposed model significantly outperforms (+31% in terms of F1-score) several alternative methods in predicting usersโ emotional status. We also discover several intriguing phenomena. For example, the probability that a user feels happy is roughly linear to the number of friends who are also happy; but taking a closer look, the happiness probability is superlinear to the number of happy friends who act as opinion leaders (Page et al. 1999) in the network and sublinear in the number of happy friends who span structural holes (Burt 2001). This offers a new opportunity to understand the underlying mechanism of emotional contagion in online social networks.
California Inc.: Anyone in the market for a slightly used search engine?
Welcome to California Inc., the weekly newsletter of the L.A. Times Business Section. Expect financial markets to face headwinds today after the Federal Reserve reported Friday that U.S. industrial production fell more than expected in March. This is the latest sign that economic growth slowed significantly in the first quarter. On the plus side, though, many economists still forecast a rebound in growth as the year plods ahead. Tax deadline: Monday is the deadline for most Americans to submit their tax returns.
EU wants Google, Microsoft to be more transparent about ads in search results
The European Union's digital chief wants search engines such as Alphabet Inc's Google and Microsoft's Bing to be more transparent about advertising in web search results but ruled out a separate law for web platforms. European Commission vice-president Andrus Ansip, who is overseeing a wide-ranging inquiry into how web platforms conduct their business, said on Friday the EU executive would not take a horizontal approach to regulating online services. "We will take a problem-driven approach," Ansip said. "It's practically impossible to regulate all the platforms with one really good single solution." Related: Do Google's'unprofessional hair' results show it is racist?