Pohang University of Science and Technology
Machine-Translated Knowledge Transfer for Commonsense Causal Reasoning
Yeo, Jinyoung (Pohang University of Science and Technology) | Wang, Geungyu (Yonsei University) | Cho, Hyunsouk (Pohang University of Science and Technology) | Choi, Seungtaek (Yonsei University) | Hwang, Seung-won (Yonsei University)
This paper studies the problem of multilingual causal reasoning in resource-poor languages. Existing approaches, translating into the most probable resource-rich language such as English, suffer in the presence of translation and language gaps between different cultural area, which leads to the loss of causality. To overcome these challenges, our goal is thus to identify key techniques to construct a new causality network of cause-effect terms, targeted for the machine-translated English, but without any language-specific knowledge of resource-poor languages. In our evaluations with three languages, Korean, Chinese, and French, our proposed method consistently outperforms all baselines, achieving up-to 69.0% reasoning accuracy, which is close to the state-of-the-art accuracy 70.2% achieved on English.
Understanding Emerging Spatial Entities
Yeo, Jinyoung (Pohang University of Science and Technology) | Park, Jin-woo (Pohang University of Science and Technology) | Hwang, Seung-won (Yonsei university)
In Foursquare or Google+ Local, emerging spatial entities, such as new business or venue, are reported to grow by 1% every day. As information on such spatial entities is initially limited (e.g., only name), we need to quickly harvest related information from social media such as Flickr photos. Especially, achieving high-recall in photo population is essential for emerging spatial entities, which suffer from data sparseness (e.g., 71% restaurants of TripAdvisor in Seattle do not have any photo, as of Sep 03, 2015). Our goal is thus to address this limitation by identifying effective linking techniques for emerging spatial entities and photos. Compared with state-of-the-art baselines, our proposed approach improves recall and F1 score by up to 24% and 18%, respectively. To show the effectiveness and robustness of our approach, we have conducted extensive experiments in three different cities, Seattle, Washington D.C., and Taipei, of varying characteristics such as geographical density and language.
Walking on Minimax Paths for k-NN Search
Kim, Kye-Hyeon (Pohang University of Science and Technology) | Choi, Seungjin (Pohang University of Science and Technology)
Link-based dissimilarity measures, such as shortest path or Euclidean commute time distance, base their distance on paths between nodes of a weighted graph. These measures are known to be better suited to data manifold with nonconvex-shaped clusters, compared to Euclidean distance, so that k -nearest neighbor (NN) search is improved in such metric spaces. In this paper we present a new link-based dissimilarity measure based on minimax paths between nodes. Two main benefits of minimax path-based dissimilarity measure are: (1) only a subset of paths is considered to make it scalable, while Euclidean commute time distance considers all possible paths; (2) it better captures nonconvex-shaped cluster structure, compared to shortest path distance. We define the total cost assigned to a path between nodes as L p norm of intermediate costs of edges involving the path, showing that minimax path emerges from our L p norm over paths framework. We also define minimax distance as the intermediate cost of the longest edge on the minimax path, then present a greedy algorithm to compute k smallest minimax distances between a query and N data points in O(log N + k log k) time. Numerical experiments demonstrate that our minimax k-NN algorithm reduce the search time by several orders of magnitude, compared to existing methods, while the quality of k -NN search is significantly improved over Euclidean distance.
Probabilistic Models for Common Spatial Patterns: Parameter-Expanded EM and Variational Bayes
Kang, Hyohyeong (Pohang University of Science and Technology) | Choi, Seungjin (Pohang University of Science and Technology)
Common spatial patterns (CSP) is a popular feature extraction method for discriminating between positive andnegative classes in electroencephalography (EEG) data.Two probabilistic models for CSP were recently developed: probabilistic CSP (PCSP), which is trained by expectation maximization (EM), and variational BayesianCSP (VBCSP) which is learned by variational approx-imation. Parameter expansion methods use auxiliaryparameters to speed up the convergence of EM or thedeterministic approximation of the target distributionin variational inference. In this paper, we describethe development of parameter-expanded algorithms forPCSP and VBCSP, leading to PCSP-PX and VBCSP-PX, whose convergence speed-up and high performanceare emphasized. The convergence speed-up in PCSP-PX and VBCSP-PX is a direct consequence of parame-ter expansion methods. The contribution of this study is the performance improvement in the case of CSP,which is a novel development. Numerical experimentson the BCI competition datasets, III IV a and IV 2ademonstrate the high performance and fast convergenceof PCSP-PX and VBCSP-PX, as compared to PCSP andVBCSP.
Towards an Intelligent Code Search Engine
Kim, Jinhan (Pohang University of Science and Technology) | Lee, Sanghoon (Pohang University of Science and Technology) | Hwang, Seung-won (Pohang University of Science and Technology) | Kim, Sunghun (Hong Kong University of Science and Technology)
Software developers increasingly rely on information from the Web, such as documents or code examples on Application Programming Interfaces (APIs), to facilitate their development processes. However, API documents often do not include enough information for developers to fully understand the API usages, while searching for good code examples requires non-trivial efforts. To address this problem, we propose a novel code search engine, combining the strength of browsing documents and searching for code examples, by returning documents embedded with high-quality code example summaries mined from the Web. Our evaluation results show that our approach provides code examples with high precision and boosts programmer productivity.