Overview
Fact Checking in Community Forums
Mihaylova, Tsvetomila (Sofia University "St. Kliment Ohridski") | Nakov, Preslav ( Qatar Computing Research Institute, HBKU ) | Màrquez, Lluís (Qatar Computing Research Institute, HBKU) | Barrón-Cedeño, Alberto (Qatar Computing Research Institute, HBKU) | Mohtarami, Mitra (Massachusetts Institute of Technology) | Karadzhov, Georgi (Sofia University "St. Kliment Ohridski") | Glass, James (Massachusetts Institute of Technology)
Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information. Unfortunately, this information is not always factual. Thus, here we explore a new dimension in the context of cQA, which has been ignored so far: checking the veracity of answers to particular questions in cQA forums. As this is a new problem, we create a specialized dataset for it. We further propose a novel multi-faceted model, which captures information from the answer content (what is said and how), from the author profile (who says it), from the rest of the community forum (where it is said), and from external authoritative sources of information (external support). Evaluation results show a MAP value of 86.54, which is 21 points absolute above the baseline.
280 Birds With One Stone: Inducing Multilingual Taxonomies From Wikipedia Using Character-Level Classification
Gupta, Amit (Ecole Polytechnique Fédérale de Lausanne) | Lebret, Rémi (Ecole Polytechnique Fédérale de Lausanne) | Harkous, Hamza (Ecole Polytechnique Fédérale de Lausanne) | Aberer, Karl (Ecole Polytechnique Fédérale de Lausanne)
We propose a novel fully-automated approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach first leverages the interlanguage links of Wikipedia to automatically construct training datasets for the isa relation in the target language. Character-level classifiers are trained on the constructed datasets, and used in an optimal path discovery framework to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the state-of-the-art, heuristics-heavy approaches for six languages. As a consequence of our work, we release presumably the largest and the most accurate multilingual taxonomic resource spanning over 280 languages.
Manipulative Elicitation — A New Attack on Elections with Incomplete Preferences
Dey, Palash (Tata Institute of Fundamental Research, Mumbai )
Lu and Boutilier proposed a novel approach based on "minimax regret" to use classical score based voting rules in the setting where preferences can be any partial (instead of complete) orders over the set of alternatives. We show here that such an approach is vulnerable to a new kind of manipulation which was not present in the classical (where preferences are complete orders) world of voting. We call this attack "manipulative elicitation." More specifically, it may be possible to (partially) elicit the preferences of the agents in a way that makes some distinguished alternative win the election who may not be a winner if we elicit every preference completely. More alarmingly, we show that the related computational task is polynomial time solvable for a large class of voting rules which includes all scoring rules, maximin, Copeland α for every α ∈ [0,1], simplified Bucklin voting rules, etc. We then show that introducing a parameter per pair of alternatives which specifies the minimum number of partial preferences where this pair of alternatives must be comparable makes the related computational task of manipulative elicitation NP-complete for all common voting rules including a class of scoring rules which includes the plurality, k -approval, k -veto, veto, and Borda voting rules, maximin, Copeland α for every α ∈ [0,1], and simplified Bucklin voting rules. Hence, in this work, we discover a fundamental vulnerability in using minimax regret based approach in partial preferential setting and propose a novel way to tackle it.
A Probabilistic Hierarchical Model for Multi-View and Multi-Feature Classification
Li, Jinxing (The Hong Kong Polytechnic University) | Yong, Hongwei (The Hong Kong Polytechnic University) | Zhang, Bob ( University of Macau ) | Li, Mu (The Hong Kong Polytechnic University) | Zhang, Lei (The Hong Kong Polytechnic University) | Zhang, David (The Hong Kong Polytechnic University)
Some recent works in classification show that the data obtained from various views with different sensors for an object contributes to achieving a remarkable performance. Actually, in many real-world applications, each view often contains multiple features, which means that this type of data has a hierarchical structure, while most of existing works do not take these features with multi-layer structure into consideration simultaneously. In this paper, a probabilistic hierarchical model is proposed to address this issue and applied for classification. In our model, a latent variable is first learned to fuse the multiple features obtained from a same view, sensor or modality. Particularly, mapping matrices corresponding to a certain view are estimated to project the latent variable from a shared space to the multiple observations. Since this method is designed for the supervised purpose, we assume that the latent variables associated with different views are influenced by their ground-truth label. In order to effectively solve the proposed method, the Expectation-Maximization (EM) algorithm is applied to estimate the parameters and latent variables. Experimental results on the extensive synthetic and two real-world datasets substantiate the effectiveness and superiority of our approach as compared with state-of-the-art.
Batchwise Patching of Classifiers
Kauschke, Sebastian (TU Darmstadt) | Fürnkranz, Johannes (TU Darmstadt)
In this work we present classifier patching, an approach for adapting an existing black-box classification model to new data. Instead of creating a new model, patching infers regions in the instance space where the existing model is error-prone by training a classifier on the previously misclassified data. It then learns a specific model to determine the error regions, which allows to patch the old model’s predictions for them. Patching relies on a strong, albeit unchangeable, existing base classifier, and the idea that the true labels of seen instances will be available in batches at some point in time after the original classification. We experimentally evaluate our approach, and show that it meets the original design goals. Moreover, we compare our approach to existing methods from the domain of ensemble stream classification in both concept drift and transfer learning situations. Patching adapts quickly and achieves high classification accuracy, outperforming state-of-the-art competitors in either adaptation speed or accuracy in many scenarios.
Adversarial Network Embedding
Dai, Quanyu (The Hong Kong Polytechnic University) | Li, Qiang (The Hong Kong Polytechnic University) | Tang, Jian (HEC Montreal, Montreal Institute for Learning Algorithms) | Wang, Dan (The Hong Kong Polytechnic University)
Learning low-dimensional representations of networks has proved effective in a variety of tasks such as node classification, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other high-order proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i.e., a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to state-of-the-art approaches on benchmark network embedding tasks.
Machine-Translated Knowledge Transfer for Commonsense Causal Reasoning
Yeo, Jinyoung (Pohang University of Science and Technology) | Wang, Geungyu (Yonsei University) | Cho, Hyunsouk (Pohang University of Science and Technology) | Choi, Seungtaek (Yonsei University) | Hwang, Seung-won (Yonsei University)
This paper studies the problem of multilingual causal reasoning in resource-poor languages. Existing approaches, translating into the most probable resource-rich language such as English, suffer in the presence of translation and language gaps between different cultural area, which leads to the loss of causality. To overcome these challenges, our goal is thus to identify key techniques to construct a new causality network of cause-effect terms, targeted for the machine-translated English, but without any language-specific knowledge of resource-poor languages. In our evaluations with three languages, Korean, Chinese, and French, our proposed method consistently outperforms all baselines, achieving up-to 69.0% reasoning accuracy, which is close to the state-of-the-art accuracy 70.2% achieved on English.
Approximately Stable Matchings With Budget Constraints
Kawase, Yasushi (Tokyo Institute of Technology) | Iwasaki, Atsushi (RIKEN AIP Center)
This paper examines two-sided matching with budget constraints where one side (a firm or hospital) can make monetary transfers (offer wages) to the other (a worker or doctor). In a standard model, while multiple doctors can be matched to a single hospital, a hospital has a maximum quota; thus, the number of doctors assigned to a hospital cannot exceed a certain limit. In our model, in contrast, a hospital has a fixed budget; that is, the total amount of wages allocated by each hospital to doctors is constrained. With budget constraints, stable matchings may fail to exist and checking for the existence is hard. To deal with the nonexistence of stable matchings, we extend the "matching with contracts" model of Hatfield and Milgrom so that it deals with approximately stable matchings where each of the hospitals' utilities after deviation can increase by a factor up to a certain amount. We then propose two novel mechanisms that efficiently return a stable matching that exactly satisfies the budget constraints. Specifically, by sacrificing strategy-proofness, our first mechanism achieves the best possible bound. We also explore a special case on which a simple mechanism is strategy-proof for doctors, while maintaining the best possible bound of the general case.
VSE-ens: Visual-Semantic Embeddings with Efficient Negative Sampling
Guo, Guibing (Northeastern University) | Zhai, Songlin (Northeastern University) | Yuan, Fajie (University of Glasgow) | Liu, Yuan (Northeastern University) | Wang, Xingwei (Northeastern University)
Jointing visual-semantic embeddings (VSE) have become a research hotpot for the task of image annotation, which suffers from the issue of semantic gap, i.e., the gap between images' visual features (low-level) and labels' semantic features (high-level). This issue will be even more challenging if visual features cannot be retrieved from images, that is, when images are only denoted by numerical IDs as given in some real datasets. The typical way of existing VSE methods is to perform a uniform sampling method for negative examples that violate the ranking order against positive examples, which requires a time-consuming search in the whole label space. In this paper, we propose a fast adaptive negative sampler that can work well in the settings of no figure pixels available. Our sampling strategy is to choose the negative examples that are most likely to meet the requirements of violation according to the latent factors of images. In this way, our approach can linearly scale up to large datasets. The experiments demonstrate that our approach converges 5.02x faster than the state-of-the-art approaches on OpenImages, 2.5x on IAPR-TCI2 and 2.06x on NUS-WIDE datasets, as well as better ranking accuracy across datasets.
Optimization Methods for Large-Scale Machine Learning
Bottou, Léon, Curtis, Frank E., Nocedal, Jorge
This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.